20,499 datasets found

Filter Results
  • Coronary Arteriography Reports

    The dataset consists of coronary arteriography reports collected from Shuguang Hospital, including five types of entities and five relations relevant to medical text processing.
  • Stanford Natural Language Inference Corpus (SNLI)

    The Stanford Natural Language Inference Corpus (SNLI) dataset is used for natural language inference tasks.
  • Stanford Sentiment Treebank (SST-5)

    The SST-5 dataset is a sentiment analysis dataset consisting of movie reviews with five labels for sentiment classification.
  • WNUT16 NER

    WNUT16 is a shared task dataset for named entity recognition over Twitter, consisting of annotated tweets used for identifying named entities in informal digital text.

    The GENIA NER dataset consists of annotated Medline abstracts that contain information on biological entities such as proteins and genes, used for named entity recognition in...
  • CoNLL 2003 NER dataset

    The CoNLL 2003 shared task dataset is focused on named entity recognition tasks.
  • CoNLL 2000 chunking dataset

    The CoNLL 2000 shared task dataset is used for chunking tasks in natural language processing.
  • Universal Dependencies v. 1.3

    This dataset contains part-of-speech tags for English, derived from the first 500 sentences of the Universal Dependencies corpus, reducing the training set to increase difficulty.
  • ACE Entities/Events

    The ACE 2005 dataset consists of annotated documents for event and entity detection, with a focus on various domains including newswire and blogs.
  • MSRA

    MSRA dataset comes from the news domain and is widely used for Chinese Named Entity Recognition.
  • Weibo

    Weibo NER was built based on text in Chinese social media, containing various types of named entities.
  • IMDB Movie Reviews

    The IMDB dataset consists of 54000 movie reviews intended as a background corpus for evaluating spell correction models, containing a larger vocabulary for robust word recognition.
  • Stanford Sentiment Treebank (SST)

    The Stanford Sentiment Treebank (SST) dataset contains 8544 movie reviews used for evaluating the spell correctors focusing on sentiment classification tasks.
  • NewsQA

    NewsQA is a machine comprehension dataset featuring questions and answers derived from news articles, aimed at developing models that can understand and reason about the content.
  • QA-RE

    QA-RE is a dataset that formats relation extraction as question-answer pairs, allowing research on how to leverage QA data for extracting relational information.
  • Large QA-SRL

    Large QA-SRL dataset is a large-scale dataset designed for semantic role labeling, capturing a diverse set of question-answer pairs that are representative of predicate-argument...
  • QAMR

    The QAMR dataset captures a broad range of questions and answers that relate to predicate-argument structures, focusing on implicit arguments and inferred relations not captured...

    The WNED-CWEB dataset is a benchmark for named entity disambiguation.

    The WNED-WIKI dataset is designed for evaluating entity disambiguation systems.
  • ACE2004

    The ACE2004 dataset is used for various information extraction tasks including entity recognition and disambiguation.