MuST-C

MuST-C is a multilingual speech translation dataset, which contains at least 385 hours of audio recordings from TED Talks, with their manual transcriptions and translations at the sentence level.

BibTex: