The M4 dataset consists of human-written texts from several data sources, including Wikipedia, Reddit, and arXiv in the English subset of the dataset. It pairs the human-written...
The dataset used in this paper is a large-scale production ASR system, which includes multi-domain (MD) data sets in English. The MD data sets include medium-form (MF) and...