1 dataset found

Groups: Audio-Visual Speech Recognition Formats: JSON

Filter Results
  • VGGSound

    The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.