A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition

The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion. The proposed technique has the following highlights: In Stage-1, a single-stream model is trained using all data for better model generalization. In Stage-2, a set of high-level UFE features are used to train a multi-stream model without requiring highly-parameterized parallel encoders.

BibTex: