Yahoo and Yelp corpora
The Yahoo and Yelp corpora dataset contains 100k sentences with greater average length. -
Youtube-VIS 2019
Unsupervised video object segmentation has made significant progress in recent years, but the manual annotation of video mask datasets is expensive and limits the diversity of... -
Video object segmentation is a crucial task in computer vision that involves segmenting primary objects in a video sequence. -
UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via S...
Unsupervised video object segmentation has made significant progress in recent years, but the manual annotation of video mask datasets is expensive and limits the diversity of... -
LV intraventricular septum (IVS), internal diameter (LVID), and posterior wal...
LV intraventricular septum (IVS), internal diameter (LVID), and posterior wall (LVPW) dimensions were annotated in parasternal long axis 2DE scans. -
PubMed Central Open Access Subset
PubMed Central Open Access Subset is a collection of biomedical papers. -
BioMedClip: A CLIP model pretrained on image-text pairs extracted from PubMed Central repository. -
Training CLIP models on Data from Scientific Papers
Contrastive Language-Image Pretraining (CLIP) models are trained with datasets extracted from web crawls, which are of large quantity but limited quality. This paper explores... -
Temporal action localization (TAL) is a prevailing task due to its great application potential. Existing works in this field mainly suffer from two weaknesses: (1) They often... -
TemporalMaxer: Maximize Temporal Context with only Max Pooling
Temporal action localization (TAL) is a challenging task in video understanding that aims to identify and localize actions within a video sequence. -
Swedish traffic-sign dataset (STSD)
The Swedish traffic-sign dataset (STSD) contains 10 categories of traffic signs. -
DFG traffic-sign dataset
The DFG traffic-sign dataset consists of 200 categories including large number of traffic signs with high intra-category appearance variations. -
LSUN Churches
The dataset used for training and testing the Conditionally-Independent Pixel Synthesis (CIPS) generator. -
Flickr-Faces-HQ contains 70,000 face images at 1024 × 1024 resolution, which were originally crawled from Flickr, manually checked to discard low-quality samples, and then... -
MVSEC dataset
A real-world dataset collected in indoor and outdoor scenarios with sparse optical flow labels. -
Multi-Density Rendered (MDR) event optical flow dataset
A rendered event-flow dataset created using computer graphics models, with accurate events and flow labels. -
Symptom Evolution in Cancer Patients
The dataset used in the paper for predicting the evolution of symptoms for cancer patients. -
Deep High-Resolution Representation Learning for Human Pose Estimation
Human pose estimation dataset