Dataset - LDM

Corpus of Spoken Dutch

The Corpus of Spoken Dutch (CGN) is a dataset of spoken Dutch recordings.
- Dataset
- JSON
Language Models of Spoken Dutch

The dataset consists of subtitles of television shows provided by the Flemish public-service broadcaster VRT. The dataset is used to train language models of spoken Dutch.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

2 datasets found