Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video

doi:doi:10.57702/pyhelu8l

Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video

The proposed Deep Visual Forced Alignment (DVFA) for time-aligning the input transcription with the input talking face video without using speech audio.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Minsu Kim, Chae Won Kim, Yong Man Ro (2024). Dataset: Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video. https://doi.org/10.57702/pyhelu8l

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2303.08670
Author	Minsu Kim
More Authors	Chae Won Kim Yong Man Ro