2 datasets found

Tags: Audio-Visual Learning

Filter Results
  • VGGSound

    The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
  • Extended VGG-SS/SoundNet-Flickr

    The Extended VGG-SS/SoundNet-Flickr dataset is an extension of the VGG-SS dataset, containing additional samples and non-audible frames.
You can also access this registry using the API (see API Docs).