A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition

The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion. The proposed technique has the following highlights: In Stage-1, a single-stream model is trained using all data for better model generalization. In Stage-2, a set of high-level UFE features are used to train a multi-stream model without requiring highly-parameterized parallel encoders.

Data and Resources

Cite this as

Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky (2025). Dataset: A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition. https://doi.org/10.57702/ejhan9lf

DOI retrieved: January 3, 2025

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.1910.10671
Author Ruizhi Li
More Authors
Gregory Sell
Xiaofei Wang
Shinji Watanabe
Hynek Hermansky