Large-scale vision-and-language models trained on curated and web-scrapped data have led to significant improvements over task-specific models when transferred to downstream tasks in terms of aggregate metrics.
BibTex:
Before browse our site, please accept our cookies policy