Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
The proposed Deep Visual Forced Alignment (DVFA) for time-aligning the input transcription with the input talking face video without using speech audio.
BibTex:
Before browse our site, please accept our cookies policy