-
TED LIUM corpus
The dataset used in the paper is the TED LIUM corpus. -
Speech-translation TED corpus
The dataset used in the paper is the Speech-translation TED corpus. -
Fisher and Callhome Spanish-English Speech Translation Corpus
The dataset used in the paper is the Fisher and Callhome Spanish-English Speech Translation Corpus. -
IWSLT2018 Speech Translation Task
The dataset used in the paper is the IWSLT2018 speech translation task, which consists of five parts: TED corpus, Speech-translation TED corpus, TED LIUM corpus, WMT18 data and... -
OpenSubtitles dataset
Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,... -
TED2012 ASR and MT dataset
The dataset used in the paper is a collection of English ASR hypotheses from the eight submissions on the tst2012 test set in the IWSLT 2013 TED talk ASR track, along with... -
MuST-C: a Multilingual Speech Translation Corpus
MuST-C is a multilingual speech translation corpus. -
Librispeech
The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.