Dataset - LDM

A Corpus of Turkish Offensive Language on Social Media

The dataset is a collection of Turkish tweets containing offensive language.
- Dataset
- JSON
Turkish Tweets Dataset

A collection of Turkish tweets about three different Turkish telecommunication brands gathered over one month.
- Dataset
- JSON
WIT corpus, SETimes corpus, newsdev2016, newstest2016, and newstest2017

The dataset used in the paper is the WIT corpus, SETimes corpus, newsdev2016, newstest2016, and newstest2017.
- Dataset
- JSON
Turkish-English and Uyghur-Chinese machine translation tasks

The dataset used in the paper is the Turkish-English and Uyghur-Chinese machine translation tasks.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

4 datasets found