MuST-C

doi:doi:10.57702/nhiwpvh5

MuST-C

MuST-C is a multilingual speech translation dataset, which contains at least 385 hours of audio recordings from TED Talks, with their manual transcriptions and translations at the sentence level.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Marco Gaido, Mauro Cettolo, Matteo Negri, Marco Turchi (2024). Dataset: MuST-C. https://doi.org/10.57702/nhiwpvh5

DOI retrieved: November 25, 2024

Additional Info

Field	Value
Created	November 25, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2104.11710
Citation	https://doi.org/10.48550/arXiv.2102.01578 https://doi.org/10.48550/arXiv.2012.04964 https://doi.org/10.48550/arXiv.2109.07439 https://doi.org/10.48550/arXiv.2109.07368 https://doi.org/10.48550/arXiv.2012.04955
Author	Marco Gaido
More Authors	Mauro Cettolo Matteo Negri Marco Turchi
Homepage	https://ict.fbk.eu/