UNSUPERVISED SPEECH RECOGNITION WITH N-SKIPGRAM AND POSITIONAL UNIGRAM MATCHING

Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To tackle these challenges, we introduce a novel ASR system, ESPUM. This system harnesses the power of lower-order N-skipgrams (up to N = 3) combined with positional unigram statistics gathered from a small batch of samples.

Data and Resources

Cite this as

Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo (2025). Dataset: UNSUPERVISED SPEECH RECOGNITION WITH N-SKIPGRAM AND POSITIONAL UNIGRAM MATCHING. https://doi.org/10.57702/5vd2dq1u

DOI retrieved: January 2, 2025

Additional Info

Field Value
Created January 2, 2025
Last update January 2, 2025
Defined In https://doi.org/10.48550/arXiv.2310.02382
Author Liming Wang
More Authors
Mark Hasegawa-Johnson
Chang D. Yoo
Homepage https://github.com/lwang114/GraphUnsupASR