-
Cross-modal Cognitive Consensus guided Network (C3N) for Audio-Visual Segment...
Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame, which is represented by a pixel-wise segmentation mask for application scenarios such as...