110 datasets found

Tags: Text Classification

Filter Results
  • SlimPajama

    The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification.
  • 20-Newsgroups dataset

    The 20-Newsgroups dataset is a collection of text documents.
  • REDDIT-BINARY dataset

    The REDDIT-BINARY dataset contains 2,000 graphs labeled as question/answer-based or discussion-based community in the content-aggregation website Reddit.
  • Yahoo

    The Yahoo dataset used for training and testing the proposed model, containing leaked passwords.
  • BERT

    The dataset used in this paper is a pre-trained BERT model trained on English Wikipedia and Books datasets.
  • Reuters-21578

    Text classification problem has long been an interesting research field, the aim of text classification is to develop algorithm to find the categories of given documents.
  • Amazon Review

    The Amazon Review dataset is a widely used benchmark dataset for cross-domain sentiment analysis.
  • Text Classification based on Multiple Block Convolutional Highways

    Text classification based on Multiple Block Convolutional Highways
  • OpenWebText Corpus

    A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words.
  • SQuAD

    The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...
  • COPD

    The dataset used in the paper for missing value imputation using feature-specific generative adversarial networks.
  • Disin dataset

    The Disin dataset is a fake news dataset on Kaggle, including 12,600 fake news articles and 12,600 truthful news articles.
  • Natural Questions

    The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.
  • TriviaQA

    The TriviaQA dataset is a collection of questions sourced from Quiz League websites, with sentence-level supporting facts annotation.
  • SST-2

    The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and find that a model having higher AUC does not necessarily...
  • Clothing Dataset

    The Clothing dataset contains metadata, text descriptions, and images of the clothing items, with the review score as the label.
  • COVID-19 Research Articles Classification

    The dataset used for text classification to support Epistemonikos' effort to filter and categorize research articles related to COVID-19.
  • Stanford Alpaca

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and ImageNet-100...
  • AG News

    The dataset used in the paper is a language domain dataset, specifically for sentiment classification, named AG News. The dataset is used to evaluate the performance of...
  • AGNews Dataset

    The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.).
You can also access this registry using the API (see API Docs).