-
General-context dataset
General-context dataset containing diverse image-text pairs (top three rows), and DVP presented images with targeted translation of the RoI (bottom two rows). -
Something-Something
The Something-Something dataset consists of 174 fine-grained action categories that depict humans performing everyday actions with common objects. Recognizing actions in the...