-
NIPS dataset
NIPS dataset is used to test the proposed Hierarchical Latent Word Clustering algorithm. -
Topic Labeling with Images
The dataset consists of 300 topics generated using Wikipedia articles and news articles taken from the New York Times. Each topic is represented by ten terms with the highest... -
Search Snippets dataset
The Search Snippets dataset is a collection of search snippets from Google. -
Pascal Flickr dataset
The Pascal Flickr dataset is a collection of captions for images from Flickr. -
Tweet dataset
The dataset used in this paper is a collection of short texts, including tweets, Pascal Flickr captions, and search snippets. -
New York Times and 20Newsgroups datasets
The dataset used in the paper is the New York Times dataset and the 20Newsgroups dataset. -
20Newsgroups dataset
The 20Newsgroups data set is a dataset of 18,846 instances of newsgroup documents. -
Japanese Election Manifesto Data
The Japanese election manifesto data contains texts of Japanese election manifestos. -
Congressional Bills Project
The Congressional bills project dataset contains texts of congressional bills. -
AGNews Dataset
The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.). -
GoogleNews
The dataset used in this paper is a collection of news articles from Google News. -
20NewsGroups
The dataset used in this paper is a collection of documents from various domains, including news, articles, and emails. -
Wikitext-103
The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles. -
Reuters RCV1-v2
The Reuters RCV1-v2 contains 804,414 newswire articles. There are 103 topics which form a tree hierarchy. Thus documents typically have multiple labels. The data was randomly...