Text Generation - Groups

Posterior Control of Blackbox Generation

Text generation often requires high-precision output that obeys task-specific rules. This fine-grained control is difficult to enforce with off-the-shelf deep learning models.
- Dataset
- JSON
SimCTG

Open-domain dialogue generation task on LCCC and DailyDialog datasets.
- Dataset
- JSON
Pt-Corpus-Instruct

The dataset used for training the TeenyTinyLlama pair consists of a concatenation of open-source Brazilian Portuguese datasets, including Wikipedia, CulturaX, OSCAR, Common...
- Dataset
- JSON
Pt-Corpus

The dataset used for training the TeenyTinyLlama pair consists of a concatenation of open-source Brazilian Portuguese datasets, including Wikipedia, CulturaX, OSCAR, Common...
- Dataset
- JSON
Rotowire

The dataset used in the paper for Rotowire
- Dataset
- JSON
WebText

The dataset used in this paper is the WebText dataset, which is a widely used dataset for natural language processing tasks.
- Dataset
- JSON
News-to-Report Dataset

A dataset for automatically generating macro research reports from economic news.
- Dataset
- JSON
DrawTextExt

The dataset is used to train the GlyphDraw model for visual text generation. It contains 792k images with 3.3M characters in images and more than 4.8k common unique Chinese...
- Dataset
- JSON
Content Preserving Text Generation with Attribute Controls

The dataset used in this paper for text generation with attribute controls.
- Dataset
- JSON
Linear-time minimum Bayes risk decoding with reference aggregation

Linear-time minimum Bayes risk decoding with reference aggregation
- Dataset
- JSON
BERTScore: Evaluating text generation with BERT

BERTScore: Evaluating text generation with BERT
- Dataset
- JSON
Improving Minimum Bayes Risk Decoding with Multi-Prompt

Multi-prompt decoding for conditional text generation
- Dataset
- JSON
TextLogo3K

TextLogo3K dataset is a large-scale dataset of text logos, consisting of 3,470 text logo images with various styles and annotated with pixel-level segmentation, bounding boxes,...
- Dataset
- JSON
Vicuna dataset

Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced...
- Dataset
- JSON
DOLLY dataset

Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced...
- Dataset
- JSON
RateMyProfessor Dataset

RateMyProfessor dataset, a dataset of student-written reviews for professors.
- Dataset
- JSON
Bias in Bios Dataset

Bias in Bios dataset, a personal biography dataset with information extracted from Wikipedia.
- Dataset
- JSON
Reference Letter Dataset

Reference letter dataset generated under the Context-Based Generation (CBG) setting.
- Dataset
- JSON
AI Wiki

A dataset of AI Wiki, used for testing the author-stylized text generation model.
- Dataset
- JSON
Mark Twain Books

A dataset of Mark Twain's books, used for testing the author-stylized text generation model.
- Dataset
- JSON

48 datasets found