A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition

doi:doi:10.57702/ejhan9lf

A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition

The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion. The proposed technique has the following highlights: In Stage-1, a single-stream model is trained using all data for better model generalization. In Stage-2, a set of high-level UFE features are used to train a multi-stream model without requiring highly-parameterized parallel encoders.

BibTex:

@dataset{Ruizhi_Li_and_Gregory_Sell_and_Xiaofei_Wang_and_Shinji_Watanabe_and_Hynek_Hermansky_2025,
    abstract = {The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion. The proposed technique has the following highlights: In Stage-1, a single-stream model is trained using all data for better model generalization. In Stage-2, a set of high-level UFE features are used to train a multi-stream model without requiring highly-parameterized parallel encoders.},
    author = {Ruizhi Li and Gregory Sell and Xiaofei Wang and Shinji Watanabe and Hynek Hermansky},
    doi = {10.57702/ejhan9lf},
    institution = {No Organization},
    keyword = {'attention-based model', 'end-to-end speech recognition', 'multi-stream'},
    month = {jan},
    publisher = {TIB},
    title = {A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition},
    url = {https://service.tib.eu/ldmservice/dataset/a-practical-two-stage-training-strategy-for-multi-stream-end-to-end-speech-recognition},
    year = {2025}
}