Speech Translation - Groups

TED LIUM corpus

The dataset used in the paper is the TED LIUM corpus.

Dataset
JSON

Speech-translation TED corpus

The dataset used in the paper is the Speech-translation TED corpus.

Dataset
JSON

Fisher and Callhome Spanish-English Speech Translation Corpus

The dataset used in the paper is the Fisher and Callhome Spanish-English Speech Translation Corpus.

Dataset
JSON

IWSLT2018 Speech Translation Task

The dataset used in the paper is the IWSLT2018 speech translation task, which consists of five parts: TED corpus, Speech-translation TED corpus, TED LIUM corpus, WMT18 data and...

Dataset
JSON

OpenSubtitles dataset

Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,...

Dataset
JSON

TED2012 ASR and MT dataset

The dataset used in the paper is a collection of English ASR hypotheses from the eight submissions on the tst2012 test set in the IWSLT 2013 TED talk ASR track, along with...

Dataset
JSON

MuST-C: a Multilingual Speech Translation Corpus

MuST-C is a multilingual speech translation corpus.

Dataset
JSON

MuST-C

MuST-C is a multilingual speech translation dataset, which contains at least 385 hours of audio recordings from TED Talks, with their manual transcriptions and translations at...

Dataset
JSON

Librispeech

The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.

Dataset
JSON

9 datasets found