-
News-to-Report Dataset
A dataset for automatically generating macro research reports from economic news. -
DrawTextExt
The dataset is used to train the GlyphDraw model for visual text generation. It contains 792k images with 3.3M characters in images and more than 4.8k common unique Chinese... -
Content Preserving Text Generation with Attribute Controls
The dataset used in this paper for text generation with attribute controls. -
Linear-time minimum Bayes risk decoding with reference aggregation
Linear-time minimum Bayes risk decoding with reference aggregation -
BERTScore: Evaluating text generation with BERT
BERTScore: Evaluating text generation with BERT -
Improving Minimum Bayes Risk Decoding with Multi-Prompt
Multi-prompt decoding for conditional text generation -
TextLogo3K
TextLogo3K dataset is a large-scale dataset of text logos, consisting of 3,470 text logo images with various styles and annotated with pixel-level segmentation, bounding boxes,... -
Vicuna dataset
Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced... -
DOLLY dataset
Diffusion-based language models are emerg-ing as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced... -
RateMyProfessor Dataset
RateMyProfessor dataset, a dataset of student-written reviews for professors. -
Bias in Bios Dataset
Bias in Bios dataset, a personal biography dataset with information extracted from Wikipedia. -
Reference Letter Dataset
Reference letter dataset generated under the Context-Based Generation (CBG) setting. -
Mark Twain Books
A dataset of Mark Twain's books, used for testing the author-stylized text generation model. -
Opinosis Review Dataset
A dataset of Opinosis Review dataset, used for testing the author-stylized text generation model. -
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,... -
Gutenberg Corpus
A dataset of 2,857 books written by 141 authors, used for pre-training and fine-tuning a language model for author-stylized text generation. -
ChatGPT model data
ChatGPT model data, used to generate text -
Adding A Filter Based on The Discriminator to Improve Unconditional Text Gene...
The dataset is used for unconditional text generation, and the authors propose a novel mechanism to improve the generator by adding a filter which has the same input as the...