Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 1 dataset found Groups: Multimodal Learning Filter Results End-to-End Referring Video Object Segmentation with Multimodal Transformers The referring video object segmentation task (RVOS) involves segmentation of a text-referred object instance in the frames of a given video. Dataset JSON