Dataset - LDM

FLORES-101 Evaluation Benchmark

A machine translation benchmark for low-resource and multilingual machine translation.
- Dataset
- JSON
Europarl-ST

Europarl-ST is a multilingual speech corpus that contains transcriptions of parliamentary debates in multiple languages.
- Dataset
- JSON
MASSIVE

The MASSIVE dataset is a comprehensive collection of approximately one million annotated utterances for various natural language understanding tasks such as slot-filling, intent...
- Dataset
- JSON
WikiANN

The WikiANN dataset is a multilingual dataset for named entity recognition.
- Dataset
- JSON
MuST-C: a Multilingual Speech Translation Corpus

MuST-C is a multilingual speech translation corpus.
- Dataset
- JSON
XED

The dataset used in the paper is a multilingual dataset for sentiment analysis, specifically the XED dataset.
- Dataset
- JSON
MuST-C

MuST-C is a multilingual speech translation dataset, which contains at least 385 hours of audio recordings from TED Talks, with their manual transcriptions and translations at...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

7 datasets found