Video Understanding - Groups

FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos

Fine-grained adaptation of the popular CLIP model across multiple datasets.

Dataset
JSON

Motion-Guided Masking for Spatiotemporal Representation Learning

The authors used several video benchmarks, including Kinetics-400 and Something-Something V2, to evaluate their proposed motion-guided masking algorithm.

Dataset
JSON

Kinetics-400, UCF101, HMDB51, Something-Something V1, and Something-Something V2

The Kinetics-400, UCF101, HMDB51, Something-Something V1, and Something-Something V2 datasets are used for evaluating the performance of the Bi-Calibration Networks.

Dataset
JSON

HMDB-51

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming....

Dataset
JSON

Kinetics-400