No Organization - Organizations

BooksCorpus

The BooksCorpus dataset consists of 11,038 books and has been used for text-only training.

Dataset
JSON

Visual Question Answering

Visual Question Answering (VQA) requires a model to answer open-ended questions regarding images.

Dataset
JSON

Image-Grounded Conversations

Image-Grounded Conversations (IGC) consists of dialogues between human participants over images.

Dataset
JSON

Image Chat

Image Chat involves complete dialogues grounded on images, enabling a natural conversation by introducing styles.

Dataset
JSON

Personality Captions

Personality Captions dataset contains image-caption pairs with attributes describing 215 different speech styles.

Dataset
JSON

Instagram Images

A dataset of 3.5 billion Instagram images collected to explore the limits of weakly supervised pretraining.

Dataset
JSON

Frequent Russian Words Dataset

This dataset represents the top 10000 and 100000 most frequent words used in the training of word embedding models for the Russian language, derived from Wikipedia and other...

Dataset
JSON

Word Embedding Models for Russian Language

The dataset consists of publicly available word embedding models for the Russian language, including RusVectores, fastText, and Russian Distributional Thesaurus.

Dataset
JSON

Tox21 Toxicity Dataset

The Tox21 dataset contains information about the toxicity of various compounds, used for toxicity prediction tasks.

Dataset
JSON

ChEMBL Bioactivity Dataset

The ChEMBL dataset is used for drug bioactivity prediction across multiple tasks involving human protein targets.

Dataset
JSON

13C NMR Spectra Dataset

The dataset consists of 13C NMR spectra from NMRShiftDB, used for predicting NMR peaks for carbon atoms in various molecules.

Dataset
JSON

Covid-Chestxray-Dataset

The Covid-Chestxray-Dataset contains a collection of chest X-ray images of COVID-19 patients, which were used for training and testing purposes in the study.

Dataset
JSON

Indoor Object Dataset

A synthetic dataset consisting of rendered indoor scene images with masked objects as the foreground and real-world photographs to validate whether ST-GAN generalizes to real...

Dataset
JSON

Authentic Paintings and Sketches Dataset

A dataset of realistic face images and sketches collected from art galleries to evaluate the method's robustness to various styles.

Dataset
JSON

Synthesized Stylized Face and Ground Truth Dataset

The dataset consists of pairs of stylized face images and their corresponding ground truth photorealistic faces, created using the CelebA dataset and various style transfer...

Dataset
JSON

WMT 2014 English → German

The WMT 2014 dataset contains 4.5M sentence pairs for machine translation from English to German.

Dataset
JSON

WMT 2016 Romanian → English

The WMT 2016 dataset comprises 600K sentence pairs for machine translation from Romanian to English.

Dataset
JSON

KFTT Japanese → English

The KFTT dataset includes 300K sentence pairs for machine translation from Japanese to English.

Dataset
JSON

IWSLT 2017 German → English

The IWSLT 2017 dataset consists of 200K sentence pairs for machine translation from German to English.

Dataset
JSON

Cadaver X-ray Images

The dataset includes 10 real X-ray images collected from a cadaver specimen, with ground truth poses obtained by injecting metallic BBs into the surface of the bone, manually...

Dataset
JSON

20,499 datasets found