Data Standardization
Unified schemas for Twitter, Reddit, Bluesky, Telegram, and more.
SMDT (Social Media Data Toolkit) is a comprehensive Python library designed to streamline the ingestion, standardization, and analysis of social media data. It provides a unified interface for handling data from diverse platforms, enabling researchers to focus on analysis rather than data wrangling.
If you use SMDT in your research, please cite the following paper:
@article{smdt2026,
title={Social Media Data Toolkit: Standardization and Anonymization of Social Network Datasets},
author={Najafi, Ali and Iannucci, Letizia and Kivelä, Mikko and Varol, Onur},
journal={arXiv preprint arXiv:2604.27710},
year={2026}
}