182 datasets found

Formats: JSON

Filter Results
  • X-FORMAL

    X-FORMAL dataset contains pairs of formal and informal texts in four languages: Brazilian Portuguese, French, Italian, and English.
  • GYAFC

    The GYAFC dataset is a formality transfer dataset for English that contains aligned formal and informal sentences from two domains: Entertainment & Music and Family &...
  • MARC

    The MARC dataset is a multilingual text classification dataset that contains 6 languages.
  • Yahoo Answer and Yelp15 review

    Two large scale document classification datasets: Yahoo Answer and Yelp15 review, representing topic classification and sentiment classification data sets respectively.
  • Cnews dataset

    The Cnews dataset is a collection of news articles from Sina News, filtered from 2005 to 2011. The dataset contains 10 categories of news, including sports, entertainment, home...
  • M10

    The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
  • 20 NewsGroups

    The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
  • MR, Subj, SST-1, SST-2, MPQA

    The dataset used in this paper for text classification task.
  • 20NEWS Dataset

    The dataset used in the paper is the 20NEWS dataset, consisting of 18,845 text documents with 20 topic labels.
  • TEL-NLP

    The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
  • Yelp Dataset

    The Yelp Dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses; 481K business attributes, such as hours, parking availability, ambience; and check-ins for...
  • IMDB Sentiment Classification

    The IMDB sentiment classification dataset is used for text classification tasks.
  • CNN/DailyMail

    A bus driver who was seriously injured when he was hit by a steam engine is making good progress, his wife has said.
  • Ren-CECps

    Multi-label text classification dataset Ren-CECps
  • RCV1-v2

    Multi-label text classification dataset RCV1-v2, Reuters Corpus Volume I
  • 20-Newsgroups dataset

    The 20-Newsgroups dataset is a collection of text documents.
  • Twitter and Pinterest dataset

    The dataset used for the experiments on Twitter and Pinterest.
  • REDDIT-BINARY dataset

    The REDDIT-BINARY dataset contains 2,000 graphs labeled as question/answer-based or discussion-based community in the content-aggregation website Reddit.
  • Full

    The dataset used for sentiment analysis and topic classification tasks.
  • Polarity

    The dataset used for sentiment analysis and topic classification tasks.