Cross-modal Cognitive Consensus guided Network (C3N) for Audio-Visual Segmentation

Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame, which is represented by a pixel-wise segmentation mask for application scenarios such as multi-modal video editing, augmented reality, and intelligent robot systems.

Data and Resources

Cite this as

Zhaofeng Shi, Qingbo Wu, Fanman Meng, Hongliang Li (2024). Dataset: Cross-modal Cognitive Consensus guided Network (C3N) for Audio-Visual Segmentation. https://doi.org/10.57702/aifu6v21

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2310.06259
Author Zhaofeng Shi
More Authors
Qingbo Wu
Fanman Meng
Hongliang Li
Homepage https://github.com/ZhaofengSHI/AVS-C3N