A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition

doi:doi:10.57702/ejhan9lf

A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition

The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion. The proposed technique has the following highlights: In Stage-1, a single-stream model is trained using all data for better model generalization. In Stage-2, a set of high-level UFE features are used to train a multi-stream model without requiring highly-parameterized parallel encoders.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky (2025). Dataset: A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition. https://doi.org/10.57702/ejhan9lf

DOI retrieved: January 3, 2025

Additional Info

Field	Value
Created	January 3, 2025
Last update	January 3, 2025
Defined In	https://doi.org/10.48550/arXiv.1910.10671
Author	Ruizhi Li
More Authors	Gregory Sell Xiaofei Wang Shinji Watanabe Hynek Hermansky