20 datasets found

Filter Results
  • Corel5k

    The dataset used in this paper for image annotation, consisting of 4,999 annotated images with a vocabulary of up to 200 keywords.
  • SPMDataset

    A dataset of images annotated with semantic tuples, including predicates, actors, and locatives.
  • Microsoft COCO 2017 dataset

    This dataset contains images paired with multiple human-annotated descriptions in the form of sentences.
  • Heudiasyc dataset

    A dataset for autonomous driving.
  • ApolloScape Dataset

    The ApolloScape dataset is a large-scale dataset for autonomous driving, containing images and annotations.
  • BLIP2

    A vision-language pre-training dataset, BLIP2, which consists of 100 million image-text pairs.
  • Inria Aerial Image Labeling dataset

    Inria Aerial Image Labeling dataset contains aerial orthorectified color imagery of 5000 × 5000 pixels with a spatial resolution of 0.3 m.
  • AICrowd Mapping Challenge dataset

    AICrowd Mapping Challenge dataset contains 300 × 300 pixels RGB images and corresponding annotations in MS-COCO format.
  • MSRC

    The dataset used in the paper is a multi-view clustering dataset, which contains 6 views of 2000 samples each. The dataset is used to evaluate the performance of the proposed...
  • ReferItGame

    Visual grounding is the task of localizing a language query in an image. The output is often a bounding box as drawn in the yellow color.
  • Flickr30K Entities

    The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K...
  • Broden

    The dataset used in the paper is Broden, a dataset containing pixel-level concept annotations.
  • Pothole Detection Dataset

    A dataset of images with pothole annotations from various sources, including Google Earth Pro, AUTOPILOT videos, and GoPro camera images.
  • RefCOCOg

    The RefCOCOg dataset is a reconstructed dataset of the MS-COCO dataset, containing 85,474 referring expressions for 54,822 objects in 26,711 images.
  • RefCOCO

    The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images.
  • LabelMe dataset

    The LabelMe dataset is a natural scene dataset used for testing the performance of the IBTM model on image classification tasks.
  • COCO Dataset

    The COCO dataset is a large-scale dataset for object detection, semantic segmentation, and captioning. It contains 80 object categories and 1,000 image instances per category,...
  • Visual Genome

    The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.
  • Cityscapes

    The Cityscapes dataset is a large and famous city street scene semantic segmentation dataset. 19 classes of which 30 classes of this dataset are considered for training and...
  • COCO

    Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...