Dataset - LDM

SSD-LM

Semi-autoregressive simplex-based diffusion language model for text generation and modular control
- Dataset
- JSON
Wikitext-103

The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.
- Dataset
- JSON
SeqDiffuSeq

The dataset used in the SeqDiffuSeq paper for sequence-to-sequence text generation.
- Dataset
- JSON
BookCorpus

The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text.
- Dataset
- JSON
PatentEval Dataset

The PatentEval dataset is a comprehensive dataset for evaluating patent text generation.
- Dataset
- JSON
WikiText-103 dataset

The dataset used in this paper is the WikiText-103 dataset, which contains a large corpus of text.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

26 datasets found