-
Multilingual Context-Based Pronunciation Learning for Text-to-Speech
Multilingual pronunciation learning for Text-to-Speech systems. Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. -
Experiments with multilingual and language-specific pre-trained masked langua...
The datasets used in the experiments are annotated according to the Unimorph schema guidelines. -
MuST-C v1.0
MuST-C v1.0 is a multilingual corpus for end-to-end speech translation, containing 8 language pairs. -
Europarl-ST
Europarl-ST is a multilingual speech corpus that contains transcriptions of parliamentary debates in multiple languages. -
MTG: A Benchmark Suite for Multilingual Text Generation
MTG is a multilingual multiway text generation benchmark suite. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data... -
PMIndiaSum
The PMIndiaSum dataset contains multilingual and cross-lingual headline summarization for languages in India. -
Wikipedia as multilingual source of comparable corpora
Wikipedia as multilingual source of comparable corpora. -
SemEval-2023 Task 1: Visual Word Sense Disambiguation
The SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task dataset consists of a silver dataset with 12,869 V-WSD instances. Each sample is a 4-tuple ⟨f, c, I, i∗ ∈ I⟩ where... -
PAN Profiling Fake News Spreader Task
The PAN Profiling Fake News Spreader Task contains a dataset in English, whose samples were collected from Twitter. -
PAN Profiling Hate Speech Spreader Task
The PAN Profiling Hate Speech Spreader Task contains a dataset in English and Spanish, whose samples were collected from Twitter. -
BABEL-Pashto
The BABEL-Pashto dataset is a multilingual speech recognition dataset containing Pashto speech recordings. -
A Multilingual African Embedding for FAQ Chatbots
A multilingual African embedding for FAQ chatbots -
OSCAR corpus
The dataset used in this study is the OSCAR corpus, which is a multilingual corpus that is obtained by filtering of the Common Crawl corpus. -
SemEval-2024 task 1: Semantic textual relatedness for African and Asian langu...
A collection of semantic textual relatedness datasets for African and Asian languages. -
SemRel2024
A collection of semantic textual relatedness datasets for 14 languages.