19 datasets found

Tags: topic modeling

Filter Results
  • DBLP papers

    The dataset used in this paper is a collection of papers from the DBLP conferences between 2004 and 2014.
  • NIPS papers

    The dataset used in this paper is a collection of papers from the NIPS conferences between 1987 and 1999.
  • CAL500

    Text categorization, a document may be associated with a range of topics, such as science, entertainment, and news.
  • NIPS dataset

    NIPS dataset is used to test the proposed Hierarchical Latent Word Clustering algorithm.
  • Topic Labeling with Images

    The dataset consists of 300 topics generated using Wikipedia articles and news articles taken from the New York Times. Each topic is represented by ten terms with the highest...
  • Search Snippets dataset

    The Search Snippets dataset is a collection of search snippets from Google.
  • Pascal Flickr dataset

    The Pascal Flickr dataset is a collection of captions for images from Flickr.
  • Tweet dataset

    The dataset used in this paper is a collection of short texts, including tweets, Pascal Flickr captions, and search snippets.
  • New York Times and 20Newsgroups datasets

    The dataset used in the paper is the New York Times dataset and the 20Newsgroups dataset.
  • 20News

    Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging.
  • 20Newsgroups dataset

    The 20Newsgroups data set is a dataset of 18,846 instances of newsgroup documents.
  • Japanese Election Manifesto Data

    The Japanese election manifesto data contains texts of Japanese election manifestos.
  • Congressional Bills Project

    The Congressional bills project dataset contains texts of congressional bills.
  • AGNews Dataset

    The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.).
  • GoogleNews

    The dataset used in this paper is a collection of news articles from Google News.
  • Wiki20K

    The dataset used in this paper is a collection of English Wikipedia abstracts from DBpedia.
  • 20NewsGroups

    The dataset used in this paper is a collection of documents from various domains, including news, articles, and emails.
  • Wikitext-103

    The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.
  • Reuters RCV1-v2

    The Reuters RCV1-v2 contains 804,414 newswire articles. There are 103 topics which form a tree hierarchy. Thus documents typically have multiple labels. The data was randomly...
You can also access this registry using the API (see API Docs).