Audio-Visual - Groups

VGGSound

The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.

Dataset
JSON

CREMA-D

The CREMA-D dataset is an audio-visual dataset for emotion recognition task, each video in which consists of both facial and acoustic emotional expressions.

Dataset
JSON

VoxCeleb

Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature...

Dataset
JSON

3 datasets found

VGGSound

CREMA-D

VoxCeleb