You're currently viewing an old version of this dataset. To see the current version, click here.

Cross-modal Cognitive Consensus guided Network (C3N) for Audio-Visual Segmentation

Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame, which is represented by a pixel-wise segmentation mask for application scenarios such as multi-modal video editing, augmented reality, and intelligent robot systems.

Data and Resources

This dataset has no data

Cite this as

Zhaofeng Shi, Qingbo Wu, Fanman Meng, Hongliang Li (2024). Dataset: Cross-modal Cognitive Consensus guided Network (C3N) for Audio-Visual Segmentation. https://doi.org/10.57702/aifu6v21

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2310.06259
Author Zhaofeng Shi
More Authors
Qingbo Wu
Fanman Meng
Hongliang Li
Homepage https://github.com/ZhaofengSHI/AVS-C3N