What is TurkishBERTweet?

TurkishBERTweet is the first large-scale pre-trained language model specifically designed for Turkish social media. Built using over 894 million Turkish tweets, it shares the architecture of the RoBERTa-base model but is optimized for social media content. TurkishBERTweet excels in tasks like Sentiment Classification and Hate Speech Detection, offering better generalizability and lower inference times compared to existing models.

Why TurkishBERTweet?

Despite Turkish being widely spoken, it is still considered a low-resource language in NLP. With Turkey’s strategic role in global politics and the vast amount of Turkish content on platforms like Twitter and Instagram, there is a growing need for tools tailored to this language. TurkishBERTweet addresses this by offering a cost-effective, scalable solution for processing Turkish social media data, outperforming other models and commercial solutions like OpenAI.

Available Models on HuggingFace 🤗

Useful Links

Finetuning TurkishBERTweet

Sentiment Analysis

HateSpeech Detection