2 datasets found

Tags: Visual

Filter Results
  • Voxceleb2

    The Voxceleb2 dataset is a large-scale speaker recognition dataset, containing 2442 hours raw speech from 6112 speakers.
  • VGGSound

    The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
You can also access this registry using the API (see API Docs).