-
Crowd Video Captioning Dataset
A crowd video captioning dataset based on the WorldExpo'10 dataset, with 98 videos selected and captions generated for them. -
Famous Keyword Twitter Replies
The Famous Keyword Twitter Replies dataset is a comprehensive collection of Twitter data that focuses on popular keywords and their associated replies. -
Visual Storytelling Dataset (VIST)
The Visual Storytelling Dataset (VIST) consists of 10,117 Flickr albums and 210,819 unique images. Each sample is one sequence of 5 photos selected from the same album paired... -
1-billion-word
1-billion-word dataset -
Chinese poetry generation
Chinese poetry generation dataset -
Diffusion-LM Improves Controllable Text Generation
Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. We develop a new non-autoregressive language model... -
IMAGINE: An Imagination-Based Automatic Evaluation Metric for Natural Languag...
Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with the text references. This is different from...