Dataset - LDM

Europarl-ST

Europarl-ST is a multilingual speech corpus that contains transcriptions of parliamentary debates in multiple languages.
- Dataset
- JSON
WikiANN

The WikiANN dataset is a multilingual dataset for named entity recognition.
- Dataset
- JSON
MuST-C

MuST-C is a multilingual speech translation dataset, which contains at least 385 hours of audio recordings from TED Talks, with their manual transcriptions and translations at...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

3 datasets found