LDA2Net: Digging under the surface of COVID-19 topics in literature
The LDA2Net dataset is a collection of COVID-19 related research papers, including peer-reviewed and pre-peer-reviewed articles. -
Novel Word Detection in Separable Topic Models
The dataset used in this paper for novel word detection in separable topic models. -
Exact slice sampler for Hierarchical Dirichlet Processes
Hierarchical Dirichlet Process (HDP) mixture model for modeling the hierarchy of groups of data. -
NOISE dataset
The NOISE dataset is a semi-synthetic dataset constructed from the matrix A∗, where the data is generated from y = A∗x + ζ, where ζ is the noise. -
NEG dataset
The NEG dataset is a semi-synthetic dataset constructed from the matrix A∗, where the entries of A∗ are i.i.d. samples from the uniform distribution on [−0.5, 0.5). -
CTM dataset
The CTM dataset is a semi-synthetic dataset constructed from the matrix X, whose columns are drawn from the logistic normal prior in the correlated topic model. -
DIR dataset
The DIR dataset is a semi-synthetic dataset constructed from the matrix X, whose columns are from a Dirichlet distribution with parameters (0.05, 0.05,..., 0.05). -
The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models. -
20 NewsGroups
The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models. -
20NEWS Dataset
The dataset used in the paper is the 20NEWS dataset, consisting of 18,845 text documents with 20 topic labels. -
Graphbtm Dataset
The Graphbtm dataset is a biterm topic model. -
RCV1 Dataset
The RCV1 dataset is a corpus of Reuters news articles. -
Reddit News Topical Interactions
The dataset used in this study has been gathered from the Pushshift Reddit repository, containing archives of the entirety of Reddit posts and comments up to June 2021. -
News Articles Dataset
The dataset used in this paper is a collection of news articles from an international news website, covering a time span from September 2012 to April 2014. -
The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models. -
Wikipedia Comparable Corpora
Multilingual dataset for topic modeling based on aligned Wikipedia articles extracted from Wikipedia Comparable Corpora -
Synthetic Dataset
The dataset used in this work is a custom synthetic dataset generated using the liquid-dsp library, containing 600000 examples of each of 13.8 million examples, with SNRs...