Dataset - LDM

AGNews

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
- Dataset
- JSON
Exact slice sampler for Hierarchical Dirichlet Processes

Hierarchical Dirichlet Process (HDP) mixture model for modeling the hierarchy of groups of data.
- Dataset
- JSON
NOISE dataset

The NOISE dataset is a semi-synthetic dataset constructed from the matrix A∗, where the data is generated from y = A∗x + ζ, where ζ is the noise.
- Dataset
- JSON
NEG dataset

The NEG dataset is a semi-synthetic dataset constructed from the matrix A∗, where the entries of A∗ are i.i.d. samples from the uniform distribution on [−0.5, 0.5).
- Dataset
- JSON
CTM dataset

The CTM dataset is a semi-synthetic dataset constructed from the matrix X, whose columns are drawn from the logistic normal prior in the correlated topic model.
- Dataset
- JSON
DIR dataset

The DIR dataset is a semi-synthetic dataset constructed from the matrix X, whose columns are from a Dirichlet distribution with parameters (0.05, 0.05,..., 0.05).
- Dataset
- JSON
Enrico

Enrico: A dataset for topic modeling of mobile UI designs
- Dataset
- JSON
SearchSnippets

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
M10

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
20 NewsGroups

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
20NEWS Dataset

The dataset used in the paper is the 20NEWS dataset, consisting of 18,845 text documents with 20 topic labels.
- Dataset
- JSON
Graphbtm Dataset

The Graphbtm dataset is a biterm topic model.
- Dataset
- JSON
RCV1 Dataset

The RCV1 dataset is a corpus of Reuters news articles.
- Dataset
- JSON
Reddit News Topical Interactions

The dataset used in this study has been gathered from the Pushshift Reddit repository, containing archives of the entirety of Reddit posts and comments up to June 2021.
- Dataset
- JSON
News Articles Dataset

The dataset used in this paper is a collection of news articles from an international news website, covering a time span from September 2012 to April 2014.
- Dataset
- JSON
StackOverﬂow

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
Wikipedia Comparable Corpora

Multilingual dataset for topic modeling based on aligned Wikipedia articles extracted from Wikipedia Comparable Corpora
- Dataset
- JSON
Synthetic Dataset

The dataset used in this work is a custom synthetic dataset generated using the liquid-dsp library, containing 600000 examples of each of 13.8 million examples, with SNRs...
- Dataset
- JSON
Topic modeling of multimodal data: an autoregressive approach

Topic modeling of multimodal data: an autoregressive approach
- Dataset
- JSON
Subjectivity Dataset

The Subjectivity dataset is a dataset provided by [Pang and Lee, 2004].
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

20 datasets found