-
Diverse and Specific Clarification Question Generation with Keywords
Product descriptions on e-commerce websites often suffer from missing important aspects. Clarification question generation (CQ-Gen) can be a promising approach to help alleviate... -
DrawTextExt
The dataset is used to train the GlyphDraw model for visual text generation. It contains 792k images with 3.3M characters in images and more than 4.8k common unique Chinese... -
Linear-time minimum Bayes risk decoding with reference aggregation
Linear-time minimum Bayes risk decoding with reference aggregation -
BERTScore: Evaluating text generation with BERT
BERTScore: Evaluating text generation with BERT -
Improving Minimum Bayes Risk Decoding with Multi-Prompt
Multi-prompt decoding for conditional text generation -
TextLogo3K
TextLogo3K dataset is a large-scale dataset of text logos, consisting of 3,470 text logo images with various styles and annotated with pixel-level segmentation, bounding boxes,... -
Reference Letter Dataset
Reference letter dataset generated under the Context-Based Generation (CBG) setting. -
Mark Twain Books
A dataset of Mark Twain's books, used for testing the author-stylized text generation model. -
Opinosis Review Dataset
A dataset of Opinosis Review dataset, used for testing the author-stylized text generation model. -
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,... -
Gutenberg Corpus
A dataset of 2,857 books written by 141 authors, used for pre-training and fine-tuning a language model for author-stylized text generation. -
ChatGPT model data
ChatGPT model data, used to generate text -
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Diffusion models have emerged as a power-ful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models... -
Text-to-image generation via masked generative transformers
Text-to-image generation via masked generative transformers. -
OpenWebText Corpus
A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words. -
One Billion Words Dataset
A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words.