Dataset - LDM

Cross-modal Cognitive Consensus guided Network (C3N) for Audio-Visual Segment...

Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame, which is represented by a pixel-wise segmentation mask for application scenarios such as...
- Dataset
- JSON
AVSBench

Audio-visual segmentation (AVS) aims to segment sound sources in the video sequence, requiring a pixel-level understanding of audio-visual correspondence.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

2 datasets found