Dataset - LDM

Towards Efficient Dialogue Pre-training with Transferable and Interpretable L...

This paper proposes a novel dialogue model with a latent structure that is easily transferable from the general domain to downstream tasks in a lightweight and transparent way.
- Dataset
- JSON
Low-resource knowledge-grounded dialogue generation

The dataset is used for low-resource knowledge-grounded dialogue generation, where the goal is to generate responses to context based on external knowledge.
- Dataset
- JSON
Incremental Transformer with Deliberation Decoder for Document Grounded Conve...

The dataset is used for document-grounded conversation, where the goal is to generate responses to context based on external knowledge.
- Dataset
- JSON
Topical-Chat: Towards knowledge-grounded open-domain conversations

The dataset is used for knowledge-grounded dialogue generation, where the goal is to generate responses to context based on external knowledge.
- Dataset
- JSON
Wizard of Wikipedia: Knowledge-powered conversational agents

The dataset is used for knowledge-grounded dialogue generation, where the goal is to generate responses to context based on external knowledge.
- Dataset
- JSON
Zero-Resource Knowledge-Grounded Dialogue Generation

The dataset is used for knowledge-grounded dialogue generation, where the goal is to generate responses to context based on external knowledge.
- Dataset
- JSON
OpenSubtitles dataset

Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,...
- Dataset
- JSON
ConvAI2

The ConvAI2 dialogue corpus is a dataset of personalized dialogues with corresponding persona descriptions.
- Dataset
- JSON
Wizard of Wikipedia

Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from...
- Dataset
- JSON
DailyDialog

The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each.
- Dataset
- JSON
Commonsense Conversation Dataset

The Commonsense Conversation Dataset (CCD) is a dialogue generation dataset.
- Dataset
- JSON
Anthropic's HH-RLHF and OpenAI's summarization datasets

The dataset used in the paper is the Anthropic's HH-RLHF and OpenAI's summarization datasets.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

12 datasets found