-
CrowdHuman dataset
The CrowdHuman dataset is a benchmark dataset for human detection, consisting of 15,000 images, 4,370 images for validation, and 5,000 images for testing. -
COCO Captions
Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect. -
Pascal VOC
Semantic segmentation is a crucial and challenging task for image understanding. It aims to predict a dense labeling map for the input image, which assigns each pixel a unique... -
PSU dataset
The PSU dataset was collected from two sources: an open dataset of aerial images available on Github and our own images acquired after flying a 3DR SOLO drone equipped with a... -
Stanford dataset
The Stanford dataset consists of a large-scale collection of aerial images and videos of a university campus containing various agents (cars, buses, bicycles, golf carts,... -
MS COCO captions
The MS COCO captions dataset contains captions for images in the Microsoft COCO dataset. -
PASCAL VOC 2010
The PASCAL VOC 2010 dataset is an extension of the PASCAL VOC dataset, containing additional images and categories. -
Faster-LTN: a neuro-symbolic, end-to-end object detection architecture
The detection of semantic relationships between objects represented in an image is one of the fundamental challenges in image interpretation. Neural-Symbolic techniques, such as... -
ImageNet Dataset
Object recognition is arguably the most important problem at the heart of computer vision. Recently, Barbu et al. introduced a dataset called ObjectNet which includes objects in... -
COCO Dataset
The COCO dataset is a large-scale dataset for object detection, semantic segmentation, and captioning. It contains 80 object categories and 1,000 image instances per category,... -
OpenImages
Large-scale vision-and-language models trained on curated and web-scrapped data have led to significant improvements over task-specific models when transferred to downstream... -
Visual Genome
The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships. -
COCO, PASCAL VOC, Cityscapes, and LVIS
The dataset used in the paper for instance segmentation, which includes COCO, PASCAL VOC, Cityscapes, and LVIS datasets. -
Cityscapes
The Cityscapes dataset is a large and famous city street scene semantic segmentation dataset. 19 classes of which 30 classes of this dataset are considered for training and...