Corpora - Groups - LDM

SEAME

The dataset used for the code-switched speech recognition task, which consists of Mandarin-English code-switched corpora.
- Dataset
- JSON
CommonCrawl

CommonCrawl is a non-profit organization that provides a large corpus of web pages for research and development purposes.
- Dataset
- JSON

Before browse our site, please accept our cookies policy