No Organization - Organizations

LAPTOP dataset

The LAPTOP dataset is used for aspect-based sentiment analysis, containing review sentences along with gold standard aspect sentiment annotations.

Dataset
JSON

OCNLI

OCNLI is a dataset for natural language inference adapted for Chinese language, consisting of premise-hypothesis pairs.

Dataset
JSON

BQ Corpus

BQ Corpus is a large-scale dataset for sentence semantic equivalence identification in Chinese.

Dataset
JSON

LCQMC

LCQMC is a large-scale Chinese question matching corpus used for determining the semantic equivalence of question pairs.

Dataset
JSON

TNEWS

TNEWS is a short text classification dataset consisting of news titles and keywords requiring classification into one of 15 classes.

Dataset
JSON

THUCNews

THUCNews is a dataset used for news categorization tasks in different genres, containing 50K news articles in ten domains.

Dataset
JSON

ChnSentiCorp

ChnSentiCorp is a dataset used for sentiment classification in Chinese documents, where the text is classified into positive or negative labels.

Dataset
JSON

CJRC

CJRC is a dataset for machine reading comprehension specializing in Chinese legal judgments, containing yes/no questions, no-answer questions, and span-extraction questions.

Dataset
JSON

DRCD

DRCD is a span-extraction machine reading comprehension dataset that is written in Traditional Chinese.

Dataset
JSON

CMRC 2018

CMRC 2018 is a span-extraction machine reading comprehension dataset similar to SQuAD, which requires the extraction of a passage span for given questions.

Dataset
JSON

DSTC7

The DSTC7 dataset is also referenced for evaluating model performance in the context of audio visual scene-aware dialog, challenging the generation of appropriate responses...

Dataset
JSON

DSTC8

The DSTC8 dataset is used for addressing the audio visual scene-aware dialog task, specifically involving generating responses based on multimodal inputs including video, audio,...

Dataset
JSON

WMT19 News Translation Dataset

The dataset includes authentic parallel data with and without document boundaries, as well as back-translated data to enhance the training of document-level translation models.

Dataset
JSON

NIST Chinese-English Test Dataset

NIST test sets used as evaluation benchmarks for Chinese to English translation performance.

Dataset
JSON

WMT14 English-French and English-German Dataset

WMT14 dataset consisting of English to French and English to German translations used as test sets for evaluating the robustness of the machine translation systems.

Dataset
JSON

Parallel Translation Dataset for NMT

The dataset includes parallel translation data used to train victim models for evaluating adversarial attacks in neural machine translation tasks.

Dataset
JSON

GOCS Technology for Geostationary Orbit Complex Satellite

This dataset pertains to geostationary orbit complex satellite technology, comprising valid patents that have undergone expert validation.

Dataset
JSON

MRRG Technology for Micro Radar Rain Gauge

This dataset includes technology focused on micro radar rain gauge systems, with a thorough filtering process to identify valid patents.

Dataset
JSON

1MWDFS Technology for 1MW Dual Frequency System

A dataset detailing the technology for 1MW dual frequency systems, containing valid patents that have been curated based on expert recommendations.

Dataset
JSON

MPUART Marine Plant Using Augmented Reality Technology

A dataset focused on marine plant technologies using augmented reality. It includes a comprehensive list of patents related to this technology, filtered for validity based on...

Dataset
JSON

24,167 datasets found