OpenImages

Large-scale vision-and-language models trained on curated and web-scrapped data have led to significant improvements over task-specific models when transferred to downstream tasks in terms of aggregate metrics.

BibTex: