Text Generation - Groups

WebText

The dataset used in this paper is the WebText dataset, which is a widely used dataset for natural language processing tasks.
- Dataset
- JSON
News-to-Report Dataset

A dataset for automatically generating macro research reports from economic news.
- Dataset
- JSON
DrawTextExt

The dataset is used to train the GlyphDraw model for visual text generation. It contains 792k images with 3.3M characters in images and more than 4.8k common unique Chinese...
- Dataset
- JSON
Content Preserving Text Generation with Attribute Controls

The dataset used in this paper for text generation with attribute controls.
- Dataset
- JSON
Linear-time minimum Bayes risk decoding with reference aggregation

Linear-time minimum Bayes risk decoding with reference aggregation
- Dataset
- JSON
BERTScore: Evaluating text generation with BERT

BERTScore: Evaluating text generation with BERT
- Dataset
- JSON
Improving Minimum Bayes Risk Decoding with Multi-Prompt

Multi-prompt decoding for conditional text generation
- Dataset
- JSON
TextLogo3K

TextLogo3K dataset is a large-scale dataset of text logos, consisting of 3,470 text logo images with various styles and annotated with pixel-level segmentation, bounding boxes,...
- Dataset
- JSON
Vicuna dataset

Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced...
- Dataset
- JSON
DOLLY dataset

Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced...
- Dataset
- JSON
RateMyProfessor Dataset

RateMyProfessor dataset, a dataset of student-written reviews for professors.
- Dataset
- JSON
Bias in Bios Dataset

Bias in Bios dataset, a personal biography dataset with information extracted from Wikipedia.
- Dataset
- JSON
Reference Letter Dataset

Reference letter dataset generated under the Context-Based Generation (CBG) setting.
- Dataset
- JSON
AI Wiki

A dataset of AI Wiki, used for testing the author-stylized text generation model.
- Dataset
- JSON
Mark Twain Books

A dataset of Mark Twain's books, used for testing the author-stylized text generation model.
- Dataset
- JSON
Opinosis Review Dataset

A dataset of Opinosis Review dataset, used for testing the author-stylized text generation model.
- Dataset
- JSON
Wikipedia Corpus

The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...
- Dataset
- JSON
Gutenberg Corpus

A dataset of 2,857 books written by 141 authors, used for pre-training and fine-tuning a language model for author-stylized text generation.
- Dataset
- JSON
ChatGPT model data

ChatGPT model data, used to generate text
- Dataset
- JSON
Adding A Filter Based on The Discriminator to Improve Unconditional Text Gene...

The dataset is used for unconditional text generation, and the authors propose a novel mechanism to improve the generator by adding a filter which has the same input as the...
- Dataset
- JSON

43 datasets found