-
Audiovision-MNIST
The Audiovision-MNIST dataset is a multi-modal dataset consisting of 1500 samples of audio and image files, with images for digits 0 to 9 and audio files with mel-frequency... -
M-HalDetect
M-HalDetect is a dataset for hallucination detection in large vision language models. -
Flexible-Modal Face Anti-Spoofing: A Benchmark
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems from presentation attacks. -
Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking
A large multi-modal benchmark for UAV tracking, containing high-quality and high-definition video sequences of both RGB and IR, each annotated with bounding boxes, attributes,... -
CASIA-SURF CeFA
The dataset used in the paper for face anti-spoofing task, which includes multi-modal data. -
CASIA-SURF
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems from presentation attacks. -
FB15K-YAGO15K
The FB15K-YAGO15K dataset is a benchmark for multi-modal entity alignment. -
FB15K-DB15K
The FB15K-DB15K dataset is an entity alignment dataset of FB15K and DB15K MMKGs. -
Multi Visual Modality Fall Detection Dataset (MUVIM)
The Multi Visual Modality Fall Detection Dataset (MUVIM) was used for anomaly detection of falls. It contains (6) vision-based sensors of different modalities including thermal,... -
SSL4EO-S12
SSL4EO-S12: A large-scale, globally distributed, multi-temporal and multi-sensor dataset for self-supervised learning in Earth observation.