Speech Translation - Groups

IWSLT2018 Speech Translation Task

The dataset used in the paper is the IWSLT2018 speech translation task, which consists of five parts: TED corpus, Speech-translation TED corpus, TED LIUM corpus, WMT18 data and...

Dataset
JSON

OpenSubtitles dataset

Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,...

Dataset
JSON

TED2012 ASR and MT dataset

The dataset used in the paper is a collection of English ASR hypotheses from the eight submissions on the tst2012 test set in the IWSLT 2013 TED talk ASR track, along with...

Dataset
JSON

Librispeech

The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.

Dataset
JSON

4 datasets found

IWSLT2018 Speech Translation Task

OpenSubtitles dataset

TED2012 ASR and MT dataset

Librispeech