-
Visual ChatGPT
Visual ChatGPT is a system that integrates different Visual Foundation Models to understand visual information and generation corresponding answers. -
VisualBERT
The VisualBERT dataset is a pre-trained model for vision-and-language tasks, which is built on top of PyTorch. -
Task Driven Image Understanding Challenge (TDIUC)
The Task Driven Image Understanding Challenge (TDIUC) dataset is a large VQA dataset with 12 more fine-grained categories proposed to compensate for the bias in distribution of... -
Common objects in context
Common objects in context. -
Cityscapes dataset for semantic urban scene understanding
The Cityscapes dataset is a large-scale urban scene dataset containing over 25,000 images. -
BigEarthNet
BigEarthNet is a large-scale Sentinel-2 dataset collected from a total of 125 Sentinel-2 tiles covering areas of 10 countries in Europe. The dataset was prepared with data from... -
Microsoft COCO
The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and...