Multilingual Speech Recognition - Groups

Streaming end-to-end bilingual ASR systems with joint language identiﬁcation

Multilingual ASR technology simplifies model training and deployment, but its accuracy is known to depend on the availability of language information at runtime.

Dataset
JSON

Europarl-ST

Europarl-ST is a multilingual speech corpus that contains transcriptions of parliamentary debates in multiple languages.

Dataset
JSON

Mozilla Commonvoice

Mozilla Commonvoice is a multilingual speech corpus that contains transcriptions of conversations in multiple languages.

Dataset
JSON

MLS

MLS: A large-scale multilingual dataset for speech research.

Dataset
JSON

CommonVoice

The sequence-to-sequence approach is widely used in speech recognition (SR) nowadays, and many research works are dedicated to show that their capabilities relying on a single...

Dataset
JSON

Dictation dataset

The dictation dataset across 39 locales, including Latin (Albanian, Icelandic, Slovak), Arabic (Levant, Maghrebi), Cyrillic (Macedonian, Kazakh), Devanagari (Nepali), etc.

Dataset
JSON

6 datasets found

Streaming end-to-end bilingual ASR systems with joint language identiﬁcation

Europarl-ST

Mozilla Commonvoice

MLS

CommonVoice

Dictation dataset