-
ScanNet-v2
Learning from bounding-boxes annotations has shown great potential in weakly-supervised 3D point cloud instance segmentation. However, we observed that existing methods would... -
Youtube Faces (YTF)
The Youtube Faces (YTF) dataset contains 3,424 videos belonging to 1,595 different identities. -
Labeled Face in the Wild (LFW)
The Labeled Face in the Wild (LFW) dataset contains 13,233 facial images belonging to 5,749 different individuals. -
CIFAR-10, Tiny ImageNet, and ImageNet
The dataset used in the paper is CIFAR-10, Tiny ImageNet, and ImageNet. -
REalistic Single Image DEhazing (RESIDE) dataset
The RESIDE dataset is a large-scale dataset for benchmarking single image dehazing algorithms, and it includes both indoor and outdoor hazy images. -
Pyramid VisionLLaMA: A versatile backbone for dense prediction without convol...
Pyramid VisionLLaMA: A versatile backbone for dense prediction without convolutions. -
Conditional positional encodings for vision transformers
Conditional positional encodings for vision transformers. -
Twins: Revisiting the design of spatial attention in vision transformers
Twins: Revisiting the design of spatial attention in vision transformers. -
Mobilevlm: A fast, reproducible and strong vision language assistant for mobi...
Mobilevlm: A fast, reproducible and strong vision language assistant for mobile devices. -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
VisionLLaMA is a unified and generic modeling framework for solving most vision tasks. -
Argoverse2
Argoverse2 is an open-source evolution of the original Argoverse -
Video Object of Interest Segmentation
A new computer vision task named video object of interest segmentation (VOIS). Given a video and a target image of interest, the objective is to simultaneously segment and track... -
SUN Attribute Dataset
The SUN attribute dataset is a collection of images of scenes. -
Multi-source multi-scale counting in extremely dense crowd images
The UCF CC 50 dataset contains 50 images collected from publicly available web images. -
MNIST, USPS, and CIFAR10
The dataset used in this paper is MNIST, USPS, and CIFAR10. The dataset is used for privacy-preserving CNN training. -
SUN database
SUN database: Large-scale scene recognition from abbey to zoo. -
CARLA simulator datasets for pedestrian detection
Three datasets: training, calibration, and evaluation datasets for pedestrian detection task.