Dataset - LDM

DBLP papers

The dataset used in this paper is a collection of papers from the DBLP conferences between 2004 and 2014.
- Dataset
- JSON
NIPS papers

The dataset used in this paper is a collection of papers from the NIPS conferences between 1987 and 1999.
- Dataset
- JSON
CAL500

Text categorization, a document may be associated with a range of topics, such as science, entertainment, and news.
- Dataset
- JSON
NIPS dataset

NIPS dataset is used to test the proposed Hierarchical Latent Word Clustering algorithm.
- Dataset
- JSON
Topic Labeling with Images

The dataset consists of 300 topics generated using Wikipedia articles and news articles taken from the New York Times. Each topic is represented by ten terms with the highest...
- Dataset
- JSON
Search Snippets dataset

The Search Snippets dataset is a collection of search snippets from Google.
- Dataset
- JSON
Pascal Flickr dataset

The Pascal Flickr dataset is a collection of captions for images from Flickr.
- Dataset
- JSON
Tweet dataset

The dataset used in this paper is a collection of short texts, including tweets, Pascal Flickr captions, and search snippets.
- Dataset
- JSON
New York Times and 20Newsgroups datasets

The dataset used in the paper is the New York Times dataset and the 20Newsgroups dataset.
- Dataset
- JSON
20News

Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging.
- Dataset
- JSON
20Newsgroups dataset

The 20Newsgroups data set is a dataset of 18,846 instances of newsgroup documents.
- Dataset
- JSON
Japanese Election Manifesto Data

The Japanese election manifesto data contains texts of Japanese election manifestos.
- Dataset
- JSON
Congressional Bills Project

The Congressional bills project dataset contains texts of congressional bills.
- Dataset
- JSON
AGNews Dataset

The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.).
- Dataset
- JSON
GoogleNews

The dataset used in this paper is a collection of news articles from Google News.
- Dataset
- JSON
Wiki20K

The dataset used in this paper is a collection of English Wikipedia abstracts from DBpedia.
- Dataset
- JSON
20NewsGroups

The dataset used in this paper is a collection of documents from various domains, including news, articles, and emails.
- Dataset
- JSON
Wikitext-103

The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.
- Dataset
- JSON
Reuters RCV1-v2

The Reuters RCV1-v2 contains 804,414 newswire articles. There are 103 topics which form a tree hierarchy. Thus documents typically have multiple labels. The data was randomly...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

19 datasets found