Dataset - LDM

SEAME

The dataset used for the code-switched speech recognition task, which consists of Mandarin-English code-switched corpora.
- Dataset
- JSON
ArzEnSEG corpus

The ArzEnSEG corpus is a morphologically annotated dataset for code-switched Egyptian Arabic-English.
- Dataset
- JSON
ArzEn parallel corpus

The ArzEn parallel corpus consists of speech transcriptions gathered through informal interviews with bilingual Egyptian Arabic-English speakers, as well as their English...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

3 datasets found