-
Ubuntu Corpus
The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat. -
Baidu TieBa Corpus
The dataset used for context-oriented response selecting task, which is considered as a binary classification problem. -
Colors in Context (CIC) dataset
The Colors in Context (CIC) dataset is a referential communication task where participants describe items in a visual display using a free-form chat interface. -
Cornell Movie Dialogue Corpus
The Cornell Movie Dialogue Corpus -
Reddit Politics
Reddit Politics -
Topical-Chat
The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers). -
SummEval and Topical-Chat
This paper uses SummEval and Topical-Chat datasets for evaluating the quality of summaries and responses. -
Reddit conversation corpus
Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics. -
Schema-Guided Dialogue
The Schema-Guided Dialogue (SGD) dataset contains over 20,000 multi-domain conversations between a human and a virtual assistant. -
Action-Based Conversations Dataset
The Action-Based Conversations Dataset (ABCD) contains over 10,000 human-to-human customer service dialogues across multiple domains. -
Schema-Guided Dataset (SGD)
Schema-Guided Dataset (SGD) is the official dataset for the schema-guided state tracking challenge at DSTC8. The schema-guided dataset consists of 20 domains with a total of 45... -
OpenSubtitles and DailyDialog
Open-domain dialogue datasets: OpenSubtitles and DailyDialog. OpenSubtitles is a collection of movie subtitles and originally contains over 2 billion utterances. DailyDialog... -
Colors in Context (CIC) task
The CIC dataset is a referential communication task where a director identifies a target color patch to a matcher. The dataset is used to analyze the communication trade-offs... -
PersonaChat dataset
The PersonaChat dataset is a large persona-conditioned chit-chat style dialogue dataset. -
Blended Skill Talk (BST) dataset
Datasets used for training and testing dialogue models -
PersonaChat
Persona-Chat is sourced from authentic conversations between human annotators who are randomly matched and assigned persona information.