-
DrawTextExt
The dataset is used to train the GlyphDraw model for visual text generation. It contains 792k images with 3.3M characters in images and more than 4.8k common unique Chinese... -
Linear-time minimum Bayes risk decoding with reference aggregation
Linear-time minimum Bayes risk decoding with reference aggregation -
BERTScore: Evaluating text generation with BERT
BERTScore: Evaluating text generation with BERT -
Improving Minimum Bayes Risk Decoding with Multi-Prompt
Multi-prompt decoding for conditional text generation -
TextLogo3K
TextLogo3K dataset is a large-scale dataset of text logos, consisting of 3,470 text logo images with various styles and annotated with pixel-level segmentation, bounding boxes,... -
Reference Letter Dataset
Reference letter dataset generated under the Context-Based Generation (CBG) setting. -
Mark Twain Books
A dataset of Mark Twain's books, used for testing the author-stylized text generation model. -
Opinosis Review Dataset
A dataset of Opinosis Review dataset, used for testing the author-stylized text generation model. -
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,... -
Gutenberg Corpus
A dataset of 2,857 books written by 141 authors, used for pre-training and fine-tuning a language model for author-stylized text generation. -
ChatGPT model data
ChatGPT model data, used to generate text -
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Diffusion models have emerged as a power-ful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models... -
Wikitext-103
The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles. -
SeqDiffuSeq
The dataset used in the SeqDiffuSeq paper for sequence-to-sequence text generation. -
BookCorpus
The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text.