-
Multilingual CommonsenseQA
Multilingual CommonsenseQA (mCSQA) is a dataset for evaluating the common sense reasoning capabilities of multilingual LMs. -
ConceptNet 5.5
The ConceptNet 5.5 dataset is an open multilingual graph of general knowledge. -
Historical texts for spelling normalization
A dataset of historical texts in eight languages, used for historical spelling normalization. -
XArgMining dataset
A multilingual stance detection dataset XArgMining from the IBM Debater project contains human-authored data points for stance detection in English, as well as such data points... -
BELEBELE Benchmark
A multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. -
BabelSememe
The BabelSememe dataset is a multilingual sememe knowledge base built on BabelNet, containing over 15 thousand BabelNet synsets manually annotated with sememes. -
Multilingual dataset
A high-quality dataset in English and 12 other languages, augmented with rhyme schema at the paragraph level. -
A dataset and baselines for multilingual reply suggestion
A dataset and baselines for multilingual reply suggestion. -
xDial-Eval
A multilingual open-domain dialogue evaluation benchmark featuring 14930 annotated turns and 8691 dialogues in 10 languages. -
Multilingual Offensive Language Identification Dataset (OLID)
The dataset is a multilingual offensive language identification dataset for social media, containing posts from Arabic, Danish, English, Greek, and Turkish. -
Multilingual Eye-movement Corpus (MECO)
The Multilingual Eye-movement Corpus (MECO) is a collection of eye-tracking data that has been collected from participants reading texts in 13 languages. -
DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction
DiS-ReX: A multilingual dataset for distantly supervised relation extraction. -
MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset
Relation extraction (RE) is a fundamental task in information extraction, whose extension to multilingual settings has been hindered by the lack of supervised resources... -
Xl-sum: Large-scale multilingual abstractive summarization
The Xl-sum dataset for multilingual abstractive summarization -
Cross-Lingual Ability of Multilingual BERT
The Cross-Lingual Ability of Multilingual BERT dataset