-
WMT19 News Translation Dataset
The dataset includes authentic parallel data with and without document boundaries, as well as back-translated data to enhance the training of document-level translation models. -
NIST Chinese-English Test Dataset
NIST test sets used as evaluation benchmarks for Chinese to English translation performance. -
WMT14 English-French and English-German Dataset
WMT14 dataset consisting of English to French and English to German translations used as test sets for evaluating the robustness of the machine translation systems. -
Parallel Translation Dataset for NMT
The dataset includes parallel translation data used to train victim models for evaluating adversarial attacks in neural machine translation tasks. -
GOCS Technology for Geostationary Orbit Complex Satellite
This dataset pertains to geostationary orbit complex satellite technology, comprising valid patents that have undergone expert validation. -
MRRG Technology for Micro Radar Rain Gauge
This dataset includes technology focused on micro radar rain gauge systems, with a thorough filtering process to identify valid patents. -
1MWDFS Technology for 1MW Dual Frequency System
A dataset detailing the technology for 1MW dual frequency systems, containing valid patents that have been curated based on expert recommendations. -
MPUART Marine Plant Using Augmented Reality Technology
A dataset focused on marine plant technologies using augmented reality. It includes a comprehensive list of patents related to this technology, filtered for validity based on... -
CUB-200-2011 Dataset
CUB-200-2011 is a fine-grained image dataset containing 11,788 images of birds across 200 species, used for few-shot learning and fine-grained classification. -
Yahoo Reviews Dataset
Yahoo dataset is used for building models that require textual review data, specifically for user-generated reviews. -
Stanford Natural Language Inference (SNLI)
The SNLI (Stanford Natural Language Inference) dataset is used for evaluating language understanding tasks and is comprised of sentence pairs annotated for their entailment... -
WMT English-German dataset
The WMT English-German dataset is used for evaluating translation models, focused on machine translation tasks. -
Filtered OpenSubtitles (fOST)
Filtered OpenSubtitles dataset contains high coherence context-response pairs extracted from the main OpenSubtitles corpus, aimed at ensuring better qualities in conversational... -
OpenSubtitles
The OpenSubtitles corpus is used for training and evaluating the conversational response generation models, providing context-response pairs from dialogue turn segments. -
Stochastic Sequential MNIST (ssMNIST)
The Stochastic Sequential MNIST (ssMNIST) dataset consists of higher-order sequences of randomly chosen MNIST digits that are drawn according to a predetermined list of labels,...