110 datasets found

Tags: Text Classification

Filter Results
  • Text8

    Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.
  • PubMed, ArXiv, and Movies datasets

    The dataset used in the paper is PubMed, ArXiv, and Movies. PubMed is a medical dataset consisting of research articles from the PubMed repository. The articles' subheadings...
  • 20NewsGroups

    The dataset used in this paper is a collection of documents from various domains, including news, articles, and emails.
  • CORD-19 Research Challenge

    COVID-19 research challenge dataset
  • Penn Treebank

    The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.
  • Wikitext-103

    The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.
  • SNLI

    The dataset used in the paper is the Stanford Natural Language Inference (SNLI) dataset, which consists of 549,367 premise-hypothesis pairs for train/dev/test sets and target...
  • IMDB

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors tested the proposed method on three real data sets for the most relevant security...
  • TREC

    The dataset used for sentiment analysis, question type classification, and subjectivity classification tasks.
  • Training Language Models to Perform Tasks

    A dataset for training language models to perform tasks such as question answering and text classification.
You can also access this registry using the API (see API Docs).