Dataset - LDM

Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs fro...

The dataset used in this paper is a collection of natural language generation tasks, including general knowledge, biology and medicine, general domain questions from Google...
- Dataset
- JSON
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy...
- Dataset
- JSON
SciNews

A dataset for scientific news report generation, comprising a parallel compilation of academic publications and their corresponding scientific news reports across nine disciplines.
- Dataset
- JSON
Crowd Video Captioning Dataset

A crowd video captioning dataset based on the WorldExpo'10 dataset, with 98 videos selected and captions generated for them.
- Dataset
- JSON
Famous Keyword Twitter Replies

The Famous Keyword Twitter Replies dataset is a comprehensive collection of Twitter data that focuses on popular keywords and their associated replies.
- Dataset
- JSON
Visual Storytelling Dataset (VIST)

The Visual Storytelling Dataset (VIST) consists of 10,117 Flickr albums and 210,819 unique images. Each sample is one sequence of 5 photos selected from the same album paired...
- Dataset
- JSON
1-billion-word

1-billion-word dataset
- Dataset
- JSON
CMU-SE

CMU-SE dataset
- Dataset
- JSON
Chinese poetry generation

Chinese poetry generation dataset
- Dataset
- JSON
Diffusion-LM Improves Controllable Text Generation

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. We develop a new non-autoregressive language model...
- Dataset
- JSON
IMAGINE: An Imagination-Based Automatic Evaluation Metric for Natural Languag...

Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with the text references. This is different from...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

11 datasets found