211 datasets found

Groups: Text Classification

Filter Results
  • rcv1

    The rcv1 dataset is a multiclass text classification dataset.
  • WebKB

    The dataset used in this paper is a probabilistic logic programming dataset, which is a probabilistic version of the WebKB dataset.
  • Reuters-8

    The Reuters-8 dataset is a collection of news articles from Reuters.
  • 20Newsgrp

    The 20Newsgrp dataset is a collection of news articles from 20 different newsgroups.
  • iPosts dataset

    The independently posted tweets dataset (henceforth: iPosts) that we used for contradiction detection between independently emerging claim-initiating tweets.
  • Threads RTE dataset

    The dataset on which the authors run disagreement reply detection (henceforth: Threads) was converted by us to RTE format based on the threaded conversations labeled in this...
  • Wikipedia Neutrality Corpus

    This dataset is used to test the ability of large language models to detect and correct biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy.
  • Yelp reviews polarity dataset

    Yelp reviews polarity dataset
  • News

    The News dataset consists of 5000 randomly sampled news articles from the NY Times corpus. It simulates the opinions of media consumers on news items. The units are different...
  • X-FORMAL

    X-FORMAL dataset contains pairs of formal and informal texts in four languages: Brazilian Portuguese, French, Italian, and English.
  • GYAFC

    The GYAFC dataset is a formality transfer dataset for English that contains aligned formal and informal sentences from two domains: Entertainment & Music and Family &...
  • MARC

    The MARC dataset is a multilingual text classification dataset that contains 6 languages.
  • Yahoo Answer and Yelp15 review

    Two large scale document classification datasets: Yahoo Answer and Yelp15 review, representing topic classification and sentiment classification data sets respectively.
  • Cnews dataset

    The Cnews dataset is a collection of news articles from Sina News, filtered from 2005 to 2011. The dataset contains 10 categories of news, including sports, entertainment, home...
  • M10

    The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
  • 20 NewsGroups

    The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
  • MR, Subj, SST-1, SST-2, MPQA

    The dataset used in this paper for text classification task.
  • 20NEWS Dataset

    The dataset used in the paper is the 20NEWS dataset, consisting of 18,845 text documents with 20 topic labels.
  • TEL-NLP

    The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
  • Yelp Dataset

    The Yelp Dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses; 481K business attributes, such as hours, parking availability, ambience; and check-ins for...