Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 4 datasets found Tags: audio-visual Filter Results VGGSound The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files. Dataset JSON Visually Indicated Sounds A dataset of audio-visual pairs where the audio is visually indicated. Dataset JSON Vggsound: A large-scale audio-visual dataset A large-scale audio-visual dataset containing audio-visual pairs. Dataset JSON CREMA-D The CREMA-D dataset is an audio-visual dataset for emotion recognition task, each video in which consists of both facial and acoustic emotional expressions. Dataset JSON