No Organization - Organizations

DRCD

DRCD is a span-extraction machine reading comprehension dataset that is written in Traditional Chinese.

Dataset
JSON

CMRC 2018

CMRC 2018 is a span-extraction machine reading comprehension dataset similar to SQuAD, which requires the extraction of a passage span for given questions.

Dataset
JSON

DSTC7

The DSTC7 dataset is also referenced for evaluating model performance in the context of audio visual scene-aware dialog, challenging the generation of appropriate responses...

Dataset
JSON

DSTC8

The DSTC8 dataset is used for addressing the audio visual scene-aware dialog task, specifically involving generating responses based on multimodal inputs including video, audio,...

Dataset
JSON

WMT19 News Translation Dataset

The dataset includes authentic parallel data with and without document boundaries, as well as back-translated data to enhance the training of document-level translation models.

Dataset
JSON

NIST Chinese-English Test Dataset

NIST test sets used as evaluation benchmarks for Chinese to English translation performance.

Dataset
JSON

WMT14 English-French and English-German Dataset

WMT14 dataset consisting of English to French and English to German translations used as test sets for evaluating the robustness of the machine translation systems.

Dataset
JSON

Parallel Translation Dataset for NMT

The dataset includes parallel translation data used to train victim models for evaluating adversarial attacks in neural machine translation tasks.

Dataset
JSON

GOCS Technology for Geostationary Orbit Complex Satellite

This dataset pertains to geostationary orbit complex satellite technology, comprising valid patents that have undergone expert validation.

Dataset
JSON

MRRG Technology for Micro Radar Rain Gauge

This dataset includes technology focused on micro radar rain gauge systems, with a thorough filtering process to identify valid patents.

Dataset
JSON

1MWDFS Technology for 1MW Dual Frequency System

A dataset detailing the technology for 1MW dual frequency systems, containing valid patents that have been curated based on expert recommendations.

Dataset
JSON

MPUART Marine Plant Using Augmented Reality Technology

A dataset focused on marine plant technologies using augmented reality. It includes a comprehensive list of patents related to this technology, filtered for validity based on...

Dataset
JSON

PAWS-X

PAWS-X is a cross-lingual adversarial dataset for paraphrase identification consisting of 23,659 human translated pairs in six languages (French, Spanish, German, Chinese,...

Dataset
JSON

CUB-200-2011 Dataset

CUB-200-2011 is a fine-grained image dataset containing 11,788 images of birds across 200 species, used for few-shot learning and fine-grained classification.

Dataset
JSON

Yahoo Reviews Dataset

Yahoo dataset is used for building models that require textual review data, specifically for user-generated reviews.

Dataset
JSON

Stanford Natural Language Inference (SNLI)

The SNLI (Stanford Natural Language Inference) dataset is used for evaluating language understanding tasks and is comprised of sentence pairs annotated for their entailment...

Dataset
JSON

WMT English-German dataset

The WMT English-German dataset is used for evaluating translation models, focused on machine translation tasks.

Dataset
JSON

Filtered OpenSubtitles (fOST)

Filtered OpenSubtitles dataset contains high coherence context-response pairs extracted from the main OpenSubtitles corpus, aimed at ensuring better qualities in conversational...

Dataset
JSON

OpenSubtitles

The OpenSubtitles corpus is used for training and evaluating the conversational response generation models, providing context-response pairs from dialogue turn segments.

Dataset
JSON

Stochastic Sequential MNIST (ssMNIST)

The Stochastic Sequential MNIST (ssMNIST) dataset consists of higher-order sequences of randomly chosen MNIST digits that are drawn according to a predetermined list of labels,...

Dataset
JSON

20,499 datasets found