Dataset - LDM

Speech-translation TED corpus

The dataset used in the paper is the Speech-translation TED corpus.
- Dataset
- JSON
TED

The dataset is used for document-level neural machine translation. It contains 0.23M training sentences, 0.31M development sentences, and 0.21M test sentences.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

Before browse our site, please accept our cookies policy