-
LDA2Net: Digging under the surface of COVID-19 topics in literature
The LDA2Net dataset is a collection of COVID-19 related research papers, including peer-reviewed and pre-peer-reviewed articles. -
DBLP papers
The dataset used in this paper is a collection of papers from the DBLP conferences between 2004 and 2014. -
NIPS papers
The dataset used in this paper is a collection of papers from the NIPS conferences between 1987 and 1999. -
Novel Word Detection in Separable Topic Models
The dataset used in this paper for novel word detection in separable topic models. -
Exact slice sampler for Hierarchical Dirichlet Processes
Hierarchical Dirichlet Process (HDP) mixture model for modeling the hierarchy of groups of data. -
NOISE dataset
The NOISE dataset is a semi-synthetic dataset constructed from the matrix A∗, where the data is generated from y = A∗x + ζ, where ζ is the noise. -
NEG dataset
The NEG dataset is a semi-synthetic dataset constructed from the matrix A∗, where the entries of A∗ are i.i.d. samples from the uniform distribution on [−0.5, 0.5). -
CTM dataset
The CTM dataset is a semi-synthetic dataset constructed from the matrix X, whose columns are drawn from the logistic normal prior in the correlated topic model. -
DIR dataset
The DIR dataset is a semi-synthetic dataset constructed from the matrix X, whose columns are from a Dirichlet distribution with parameters (0.05, 0.05,..., 0.05). -
NIPS dataset
NIPS dataset is used to test the proposed Hierarchical Latent Word Clustering algorithm. -
Topic Labeling with Images
The dataset consists of 300 topics generated using Wikipedia articles and news articles taken from the New York Times. Each topic is represented by ten terms with the highest... -
Search Snippets dataset
The Search Snippets dataset is a collection of search snippets from Google. -
Pascal Flickr dataset
The Pascal Flickr dataset is a collection of captions for images from Flickr. -
Tweet dataset
The dataset used in this paper is a collection of short texts, including tweets, Pascal Flickr captions, and search snippets. -
New York Times and 20Newsgroups datasets
The dataset used in the paper is the New York Times dataset and the 20Newsgroups dataset. -
20Newsgroups dataset
The 20Newsgroups data set is a dataset of 18,846 instances of newsgroup documents. -
Japanese Election Manifesto Data
The Japanese election manifesto data contains texts of Japanese election manifestos. -
Congressional Bills Project
The Congressional bills project dataset contains texts of congressional bills. -
SearchSnippets
The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.