-
CIFAR-100 and ImageNet-1k
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the CIFAR-100 and ImageNet-1k datasets for image classification and semantic... -
PASCAL Visual Object Classes Challenge
The PASCAL Visual Object Classes Challenge (VOC) is a benchmark dataset for object detection and semantic segmentation. -
ImageNet, CIFAR-10, and Cityscapes
The dataset used in this paper is ImageNet and CIFAR-10 for image classification, and Cityscapes for semantic segmentation. -
Pascal VOC 2012
The dataset used in the paper is the Pascal VOC 2012 dataset, which is a benchmark for instance segmentation. The dataset consists of 1464 images with 20 class categories and... -
ImageNet-1K, ADE20K, and COCO 2017
The dataset used in the paper is ImageNet-1K, ADE20K, and COCO 2017. -
COCO Stuff
COCO Stuff dataset is an extension of the COCO dataset, 164,000 images covering 171 classes are annotated with segmentation masks. -
PASCAL VOC 2007
Multi-label image recognition is a practical and challenging task compared to single-label image classification. -
ImageNet, MS COCO, and Pascal VOC datasets
The dataset used in the paper is ImageNet, MS COCO, and Pascal VOC datasets. -
Pascal VOC
Semantic segmentation is a crucial and challenging task for image understanding. It aims to predict a dense labeling map for the input image, which assigns each pixel a unique... -
COCO Dataset
The COCO dataset is a large-scale dataset for object detection, semantic segmentation, and captioning. It contains 80 object categories and 1,000 image instances per category,... -
Cityscapes
The Cityscapes dataset is a large and famous city street scene semantic segmentation dataset. 19 classes of which 30 classes of this dataset are considered for training and... -
Microsoft COCO
The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and... -
ImageNet-1k
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used it for language modeling and image classification tasks.