-
NIPS dataset
NIPS dataset is used to test the proposed Hierarchical Latent Word Clustering algorithm. -
Topic Labeling with Images
The dataset consists of 300 topics generated using Wikipedia articles and news articles taken from the New York Times. Each topic is represented by ten terms with the highest... -
Search Snippets dataset
The Search Snippets dataset is a collection of search snippets from Google. -
Pascal Flickr dataset
The Pascal Flickr dataset is a collection of captions for images from Flickr. -
Tweet dataset
The dataset used in this paper is a collection of short texts, including tweets, Pascal Flickr captions, and search snippets. -
New York Times and 20Newsgroups datasets
The dataset used in the paper is the New York Times dataset and the 20Newsgroups dataset. -
20Newsgroups dataset
The 20Newsgroups data set is a dataset of 18,846 instances of newsgroup documents. -
Japanese Election Manifesto Data
The Japanese election manifesto data contains texts of Japanese election manifestos. -
Congressional Bills Project
The Congressional bills project dataset contains texts of congressional bills. -
AGNews Dataset
The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.).