Topic Modeling - Groups

LDA2Net: Digging under the surface of COVID-19 topics in literature

The LDA2Net dataset is a collection of COVID-19 related research papers, including peer-reviewed and pre-peer-reviewed articles.
- Dataset
- JSON
DBLP papers

The dataset used in this paper is a collection of papers from the DBLP conferences between 2004 and 2014.
- Dataset
- JSON
NIPS papers

The dataset used in this paper is a collection of papers from the NIPS conferences between 1987 and 1999.
- Dataset
- JSON
Novel Word Detection in Separable Topic Models

The dataset used in this paper for novel word detection in separable topic models.
- Dataset
- JSON
Exact slice sampler for Hierarchical Dirichlet Processes

Hierarchical Dirichlet Process (HDP) mixture model for modeling the hierarchy of groups of data.
- Dataset
- JSON
NOISE dataset

The NOISE dataset is a semi-synthetic dataset constructed from the matrix A∗, where the data is generated from y = A∗x + ζ, where ζ is the noise.
- Dataset
- JSON
NEG dataset

The NEG dataset is a semi-synthetic dataset constructed from the matrix A∗, where the entries of A∗ are i.i.d. samples from the uniform distribution on [−0.5, 0.5).
- Dataset
- JSON
CTM dataset

The CTM dataset is a semi-synthetic dataset constructed from the matrix X, whose columns are drawn from the logistic normal prior in the correlated topic model.
- Dataset
- JSON
DIR dataset

The DIR dataset is a semi-synthetic dataset constructed from the matrix X, whose columns are from a Dirichlet distribution with parameters (0.05, 0.05,..., 0.05).
- Dataset
- JSON
NIPS dataset

NIPS dataset is used to test the proposed Hierarchical Latent Word Clustering algorithm.
- Dataset
- JSON
Topic Labeling with Images

The dataset consists of 300 topics generated using Wikipedia articles and news articles taken from the New York Times. Each topic is represented by ten terms with the highest...
- Dataset
- JSON
Search Snippets dataset

The Search Snippets dataset is a collection of search snippets from Google.
- Dataset
- JSON
Pascal Flickr dataset

The Pascal Flickr dataset is a collection of captions for images from Flickr.
- Dataset
- JSON
Tweet dataset

The dataset used in this paper is a collection of short texts, including tweets, Pascal Flickr captions, and search snippets.
- Dataset
- JSON
Enrico

Enrico: A dataset for topic modeling of mobile UI designs
- Dataset
- JSON
New York Times and 20Newsgroups datasets

The dataset used in the paper is the New York Times dataset and the 20Newsgroups dataset.
- Dataset
- JSON
20Newsgroups dataset

The 20Newsgroups data set is a dataset of 18,846 instances of newsgroup documents.
- Dataset
- JSON
Japanese Election Manifesto Data

The Japanese election manifesto data contains texts of Japanese election manifestos.
- Dataset
- JSON
Congressional Bills Project

The Congressional bills project dataset contains texts of congressional bills.
- Dataset
- JSON
SearchSnippets

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON

30 datasets found