Hybrid-S2S: Video Object Segmentation with Recurrent Networks and Correspondence Matching

One-shot Video Object Segmentation (VOS) is the task of pixel-wise tracking an object of interest within a video sequence, where the segmentation mask of the first frame is given at inference time.

BibTex: