Davis and WebVid datasets

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used 26 text-video pairs from the public DAVIS and WebVid datasets.

BibTex: