-
Posterior Control of Blackbox Generation
Text generation often requires high-precision output that obeys task-specific rules. This fine-grained control is difficult to enforce with off-the-shelf deep learning models. -
Pt-Corpus-Instruct
The dataset used for training the TeenyTinyLlama pair consists of a concatenation of open-source Brazilian Portuguese datasets, including Wikipedia, CulturaX, OSCAR, Common... -
News-to-Report Dataset
A dataset for automatically generating macro research reports from economic news. -
DrawTextExt
The dataset is used to train the GlyphDraw model for visual text generation. It contains 792k images with 3.3M characters in images and more than 4.8k common unique Chinese... -
Content Preserving Text Generation with Attribute Controls
The dataset used in this paper for text generation with attribute controls. -
Linear-time minimum Bayes risk decoding with reference aggregation
Linear-time minimum Bayes risk decoding with reference aggregation -
BERTScore: Evaluating text generation with BERT
BERTScore: Evaluating text generation with BERT -
Improving Minimum Bayes Risk Decoding with Multi-Prompt
Multi-prompt decoding for conditional text generation -
TextLogo3K
TextLogo3K dataset is a large-scale dataset of text logos, consisting of 3,470 text logo images with various styles and annotated with pixel-level segmentation, bounding boxes,... -
Vicuna dataset
Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced... -
DOLLY dataset
Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced... -
RateMyProfessor Dataset
RateMyProfessor dataset, a dataset of student-written reviews for professors. -
Bias in Bios Dataset
Bias in Bios dataset, a personal biography dataset with information extracted from Wikipedia. -
Reference Letter Dataset
Reference letter dataset generated under the Context-Based Generation (CBG) setting. -
Mark Twain Books
A dataset of Mark Twain's books, used for testing the author-stylized text generation model.