8 datasets found

Tags: text analysis

Filter Results
  • NeurIPS dataset

    The NeurIPS dataset is a collection of 7241 papers published in NeurIPS from 1987 to 2016.
  • 20News

    Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging.
  • 20Newsgroups dataset

    The 20Newsgroups data set is a dataset of 18,846 instances of newsgroup documents.
  • News

    The News dataset consists of 5000 randomly sampled news articles from the NY Times corpus. It simulates the opinions of media consumers on news items. The units are different...
  • Yelp Dataset

    The Yelp Dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses; 481K business attributes, such as hours, parking availability, ambience; and check-ins for...
  • Yelp Dataset Challenge

    The Yelp dataset challenge contains reviews and images of restaurants, with the goal of recommending images for each review.
  • Penn Treebank

    The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.
  • BookCorpus

    The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text.