-
Dogs vs Cats
The dataset used for image recognition using deep convolutional neural networks. -
Deep residual learning for image recognition
The ResNet-50 and ResNet-101 are used as the backbone image feature extractor. -
mini-ImageNet
The mini-ImageNet dataset is a subset of the ImageNet dataset, containing 60,000 images from 100 classes. -
Conformer: Local Features Coupling Global Representations
Conformer is a dual network structure that combines CNN-based local features with transformer-based global representations for enhanced representation learning. -
PASCAL VOC Dataset
The PASCAL VOC dataset contains 20 classes, including person, animal, vehicle, and indoor, with 9,963 images containing 24,640 annotated objects. -
WIDER-Attribute
Human Attribute Recognition (HAR) is a challenging task due to large variations of body gestures, external occlusions, lighting conditions, image resolutions and blurrinesses. -
MNIST and MNIST-rot
The MNIST dataset is a large dataset of handwritten digits, and the MNIST-rot dataset is a rotated version of the MNIST dataset. -
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP is an enhanced version of CLIP with an auxiliary alpha channel to suggest attentive regions and fine-tuned with constructed millions of RGBA region-text pairs. -
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with ...
The dataset used in the paper is a large-scale dataset for unsupervised learning on diffusion-generated images. -
Ship Discrimination Dataset
The ship discrimination dataset contains SAR images of ships and non-ships -
MSTAR Dataset
The MSTAR dataset contains SAR images of ten classes of ground vehicles -
Graph Neural Network for Accurate and Low-complexity SAR ATR
The proposed GNN model for SAR automatic target recognition -
CIFAR10 dataset
The dataset used in this paper is the CIFAR10 dataset, which contains 60,000 32x32 color images in 10 classes, with 6,000 images per class. -
ImageNet Large Scale Visual Recognition Challenge 2012
This dataset is used to evaluate the performance of a Convolutional Neural Network (CNN) on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC2012). -
Visual and Semantic Similarity in ImageNet
This dataset is used to evaluate the performance of a Convolutional Neural Network (CNN) on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC2012). -
MIT Indoor Scene Recognition
The MIT Indoor Scene Recognition dataset contains 67 categories of indoor scenes. -
VGG Network E
The dataset used in this paper is the VGG Network E, a deep convolutional neural network for image recognition. -
Sprites dataset
The dataset consists of binary images of sprites with variations in the shape (oval, square, and heart) and four geometric factors: scale (6 variation modes), rotation (40), and... -
Deep Image: Scaling up image recognition
Deep Image: Scaling up image recognition