20,499 datasets found

Filter Results
  • Coronary Arteriography Reports

    The dataset consists of coronary arteriography reports collected from Shuguang Hospital, including five types of entities and five relations relevant to medical text processing.
  • Stanford Natural Language Inference Corpus (SNLI)

    The Stanford Natural Language Inference Corpus (SNLI) dataset is used for natural language inference tasks.
  • Stanford Sentiment Treebank (SST-5)

    The SST-5 dataset is a sentiment analysis dataset consisting of movie reviews with five labels for sentiment classification.
  • WNUT16 NER

    WNUT16 is a shared task dataset for named entity recognition over Twitter, consisting of annotated tweets used for identifying named entities in informal digital text.
  • GENIA NER

    The GENIA NER dataset consists of annotated Medline abstracts that contain information on biological entities such as proteins and genes, used for named entity recognition in...
  • CoNLL 2003 NER dataset

    The CoNLL 2003 shared task dataset is focused on named entity recognition tasks.
  • CoNLL 2000 chunking dataset

    The CoNLL 2000 shared task dataset is used for chunking tasks in natural language processing.
  • Universal Dependencies v. 1.3

    This dataset contains part-of-speech tags for English, derived from the first 500 sentences of the Universal Dependencies corpus, reducing the training set to increase difficulty.
  • ACE Entities/Events

    The ACE 2005 dataset consists of annotated documents for event and entity detection, with a focus on various domains including newswire and blogs.
  • MSRA

    MSRA dataset comes from the news domain and is widely used for Chinese Named Entity Recognition.
  • Weibo

    Weibo NER was built based on text in Chinese social media, containing various types of named entities.
  • IMDB Movie Reviews

    The IMDB dataset consists of 54000 movie reviews intended as a background corpus for evaluating spell correction models, containing a larger vocabulary for robust word recognition.
  • Stanford Sentiment Treebank (SST)

    The Stanford Sentiment Treebank (SST) dataset contains 8544 movie reviews used for evaluating the spell correctors focusing on sentiment classification tasks.
  • NewsQA

    NewsQA is a machine comprehension dataset featuring questions and answers derived from news articles, aimed at developing models that can understand and reason about the content.
  • QA-RE

    QA-RE is a dataset that formats relation extraction as question-answer pairs, allowing research on how to leverage QA data for extracting relational information.
  • Large QA-SRL

    Large QA-SRL dataset is a large-scale dataset designed for semantic role labeling, capturing a diverse set of question-answer pairs that are representative of predicate-argument...
  • QAMR

    The QAMR dataset captures a broad range of questions and answers that relate to predicate-argument structures, focusing on implicit arguments and inferred relations not captured...
  • WNED-CWEB

    The WNED-CWEB dataset is a benchmark for named entity disambiguation.
  • WNED-WIKI

    The WNED-WIKI dataset is designed for evaluating entity disambiguation systems.
  • ACE2004

    The ACE2004 dataset is used for various information extraction tasks including entity recognition and disambiguation.