24,167 datasets found

Organizations: No Organization Formats: JSON

Filter Results
  • LAPTOP dataset

    The LAPTOP dataset is used for aspect-based sentiment analysis, containing review sentences along with gold standard aspect sentiment annotations.

    OCNLI is a dataset for natural language inference adapted for Chinese language, consisting of premise-hypothesis pairs.
  • BQ Corpus

    BQ Corpus is a large-scale dataset for sentence semantic equivalence identification in Chinese.

    LCQMC is a large-scale Chinese question matching corpus used for determining the semantic equivalence of question pairs.

    TNEWS is a short text classification dataset consisting of news titles and keywords requiring classification into one of 15 classes.
  • THUCNews

    THUCNews is a dataset used for news categorization tasks in different genres, containing 50K news articles in ten domains.
  • ChnSentiCorp

    ChnSentiCorp is a dataset used for sentiment classification in Chinese documents, where the text is classified into positive or negative labels.
  • CJRC

    CJRC is a dataset for machine reading comprehension specializing in Chinese legal judgments, containing yes/no questions, no-answer questions, and span-extraction questions.
  • DRCD

    DRCD is a span-extraction machine reading comprehension dataset that is written in Traditional Chinese.
  • CMRC 2018

    CMRC 2018 is a span-extraction machine reading comprehension dataset similar to SQuAD, which requires the extraction of a passage span for given questions.
  • DSTC7

    The DSTC7 dataset is also referenced for evaluating model performance in the context of audio visual scene-aware dialog, challenging the generation of appropriate responses...
  • DSTC8

    The DSTC8 dataset is used for addressing the audio visual scene-aware dialog task, specifically involving generating responses based on multimodal inputs including video, audio,...
  • WMT19 News Translation Dataset

    The dataset includes authentic parallel data with and without document boundaries, as well as back-translated data to enhance the training of document-level translation models.
  • NIST Chinese-English Test Dataset

    NIST test sets used as evaluation benchmarks for Chinese to English translation performance.
  • WMT14 English-French and English-German Dataset

    WMT14 dataset consisting of English to French and English to German translations used as test sets for evaluating the robustness of the machine translation systems.
  • Parallel Translation Dataset for NMT

    The dataset includes parallel translation data used to train victim models for evaluating adversarial attacks in neural machine translation tasks.
  • GOCS Technology for Geostationary Orbit Complex Satellite

    This dataset pertains to geostationary orbit complex satellite technology, comprising valid patents that have undergone expert validation.
  • MRRG Technology for Micro Radar Rain Gauge

    This dataset includes technology focused on micro radar rain gauge systems, with a thorough filtering process to identify valid patents.
  • 1MWDFS Technology for 1MW Dual Frequency System

    A dataset detailing the technology for 1MW dual frequency systems, containing valid patents that have been curated based on expert recommendations.
  • MPUART Marine Plant Using Augmented Reality Technology

    A dataset focused on marine plant technologies using augmented reality. It includes a comprehensive list of patents related to this technology, filtered for validity based on...