Audio-Visual Datasets - Groups

Voxceleb2

The Voxceleb2 dataset is a large-scale speaker recognition dataset, containing 2442 hours raw speech from 6112 speakers.
- Dataset
- JSON
MEAD and HDTF datasets

MEAD and HDTF datasets are used for training and testing the proposed SAAS model.
- Dataset
- JSON
HDTF

The dataset used in the paper for 3D head avatar reconstruction from monocular RGB videos.
- Dataset
- JSON
MEAD

The MEAD dataset is a large-scale, high-quality emotional audio-visual dataset, which consists of 60 actors, including 8 basic emotions and 3 different emotional-intensity...
- Dataset
- JSON

4 datasets found