-
HYPOTHESIS STITCHER FOR END-TO-END SPEAKER-ATTRIBUTED ASR ON LONG-FORM MULTI-...
An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR) model was proposed recently to jointly perform speaker counting, speech recognition and speaker... -
WSJ and Switchboard datasets
The 80-hour WSJ and 300-hour Switchboard datasets are used for end-to-end speech recognition.