-
Heartheflow: Optical Flow-Based Self-Supervised Visual Sound Source Localization
Learning to localize the sound source in videos without explicit annotations is a novel area of audio-visual research. Existing work in this area focuses on creating attention... -
Extended VGG-SS/SoundNet-Flickr
The Extended VGG-SS/SoundNet-Flickr dataset is an extension of the VGG-SS dataset, containing additional samples and non-audible frames.