-
MTG: A Benchmark Suite for Multilingual Text Generation
MTG is a multilingual multiway text generation benchmark suite. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data... -
PMIndiaSum
The PMIndiaSum dataset contains multilingual and cross-lingual headline summarization for languages in India. -
Wikipedia as multilingual source of comparable corpora
Wikipedia as multilingual source of comparable corpora. -
SemEval-2023 Task 1: Visual Word Sense Disambiguation
The SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task dataset consists of a silver dataset with 12,869 V-WSD instances. Each sample is a 4-tuple ⟨f, c, I, i∗ ∈ I⟩ where... -
PAN Profiling Fake News Spreader Task
The PAN Profiling Fake News Spreader Task contains a dataset in English, whose samples were collected from Twitter. -
PAN Profiling Hate Speech Spreader Task
The PAN Profiling Hate Speech Spreader Task contains a dataset in English and Spanish, whose samples were collected from Twitter. -
BABEL-Pashto
The BABEL-Pashto dataset is a multilingual speech recognition dataset containing Pashto speech recordings. -
A Multilingual African Embedding for FAQ Chatbots
A multilingual African embedding for FAQ chatbots -
OSCAR corpus
The dataset used in this study is the OSCAR corpus, which is a multilingual corpus that is obtained by filtering of the Common Crawl corpus. -
SemEval-2024 task 1: Semantic textual relatedness for African and Asian langu...
A collection of semantic textual relatedness datasets for African and Asian languages. -
SemRel2024
A collection of semantic textual relatedness datasets for 14 languages. -
TransMuCoRes
Translated dataset for Multilingual Coreference Resolution (TransMuCoRes) in 31 South Asian languages. -
DBP15KZH-EN, DBP15KJA-EN, and DBP15KFR-EN datasets
The DBP15KZH-EN, DBP15KJA-EN, and DBP15KFR-EN datasets are used for cross-lingual entity alignment. The datasets contain entities, relations, and attributes, and are used to... -
Multilingual Text Classification Dataset
Multilingual text classification dataset with 17 different languages