No Organization - Organizations

Visual Commonsense Reasoning (VCR)

VCR consists of 290k questions derived from 110k movie scenes, focusing on visual commonsense reasoning.

Dataset
JSON

US-CT Dataset

A synthetic dataset developed for ultrasound and CT image registration experiments, leveraging CT images to simulate ultrasound data for matching and localization.

Dataset
JSON

Human Face Database

A human face dataset used for evaluating image alignment techniques, containing altered and deformed images of human faces for testing alignment accuracy.

Dataset
JSON

MNIST Handwritten Digits Dataset

The MNIST handwritten digits dataset is a widely used benchmark dataset that consists of 60,000 training images and 10,000 testing images of handwritten digits, allowing...

Dataset
JSON

PF-PASCAL Benchmark

The PF-PASCAL benchmark is comprised of 1,351 image pairs over 20 object categories with keypoint annotations for evaluating semantic correspondence.

Dataset
JSON

PF-WILLOW Benchmark

The PF-WILLOW benchmark contains 10 object sub-classes, each with 10 keypoint annotations for performance evaluation in semantic correspondence tasks.

Dataset
JSON

TSS Benchmark

The TSS benchmark consists of 400 image pairs divided into three groups for evaluating semantic correspondence methods.

Dataset
JSON

English Wikipedia

The English Wikipedia is widely used as a text corpus for NLP tasks.

Dataset
JSON

BooksCorpus

The BooksCorpus dataset consists of 11,038 books and has been used for text-only training.

Dataset
JSON

Visual Question Answering

Visual Question Answering (VQA) requires a model to answer open-ended questions regarding images.

Dataset
JSON

Image-Grounded Conversations

Image-Grounded Conversations (IGC) consists of dialogues between human participants over images.

Dataset
JSON

Image Chat

Image Chat involves complete dialogues grounded on images, enabling a natural conversation by introducing styles.

Dataset
JSON

Personality Captions

Personality Captions dataset contains image-caption pairs with attributes describing 215 different speech styles.

Dataset
JSON

Instagram Images

A dataset of 3.5 billion Instagram images collected to explore the limits of weakly supervised pretraining.

Dataset
JSON

Frequent Russian Words Dataset

This dataset represents the top 10000 and 100000 most frequent words used in the training of word embedding models for the Russian language, derived from Wikipedia and other...

Dataset
JSON

Word Embedding Models for Russian Language

The dataset consists of publicly available word embedding models for the Russian language, including RusVectores, fastText, and Russian Distributional Thesaurus.

Dataset
JSON

Tox21 Toxicity Dataset

The Tox21 dataset contains information about the toxicity of various compounds, used for toxicity prediction tasks.

Dataset
JSON

ChEMBL Bioactivity Dataset

The ChEMBL dataset is used for drug bioactivity prediction across multiple tasks involving human protein targets.

Dataset
JSON

13C NMR Spectra Dataset

The dataset consists of 13C NMR spectra from NMRShiftDB, used for predicting NMR peaks for carbon atoms in various molecules.

Dataset
JSON

Covid-Chestxray-Dataset

The Covid-Chestxray-Dataset contains a collection of chest X-ray images of COVID-19 patients, which were used for training and testing purposes in the study.

Dataset
JSON

24,167 datasets found