You're currently viewing an old version of this dataset. To see the current version, click here.

VGGSound

The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.

Data and Resources

This dataset has no data

Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman (2024). Dataset: VGGSound. https://doi.org/10.57702/9irceuon

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2402.17723
Citation	https://doi.org/10.48550/arXiv.2012.10852 https://doi.org/10.48550/arXiv.2402.15985 https://doi.org/10.48550/arXiv.2401.08415 https://doi.org/10.48550/arXiv.2308.05037 https://doi.org/10.48550/arXiv.2308.09300 https://doi.org/10.48550/arXiv.2110.04599 https://doi.org/10.48550/arXiv.2311.04066
Author	Honglie Chen
More Authors	Weidi Xie Andrea Vedaldi Andrew Zisserman
Homepage	https://www.robots.ox.ac.uk/~vgg/data/vggsound/