-
MuST-C v1.0
MuST-C v1.0 is a multilingual corpus for end-to-end speech translation, containing 8 language pairs. -
Europarl-ST
Europarl-ST is a multilingual speech corpus that contains transcriptions of parliamentary debates in multiple languages. -
TED LIUM corpus
The dataset used in the paper is the TED LIUM corpus. -
Speech-translation TED corpus
The dataset used in the paper is the Speech-translation TED corpus. -
Fisher and Callhome Spanish-English Speech Translation Corpus
The dataset used in the paper is the Fisher and Callhome Spanish-English Speech Translation Corpus. -
IWSLT2018 Speech Translation Task
The dataset used in the paper is the IWSLT2018 speech translation task, which consists of five parts: TED corpus, Speech-translation TED corpus, TED LIUM corpus, WMT18 data and... -
TED2012 ASR and MT dataset
The dataset used in the paper is a collection of English ASR hypotheses from the eight submissions on the tst2012 test set in the IWSLT 2013 TED talk ASR track, along with...