Dataset - LDM

M4

The M4 dataset consists of human-written texts from several data sources, including Wikipedia, Reddit, and arXiv in the English subset of the dataset. It pairs the human-written...
- Dataset
- JSON
LibriLight

The dataset used in this paper is a large-scale production ASR system, which includes multi-domain (MD) data sets in English. The MD data sets include medium-form (MF) and...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

2 datasets found