-
Learning to speak and act in a fantasy text adventure game
A dataset of text-adventure game dialogues, including fantasy and horror games. -
Doctor-Patient Conversations Corpus
The dataset used in this paper is a corpus of nearly 7,000 doctor-patient conversations. -
Dialogue Dataset for Detecting Sentences that Do Not Require Factual Correctn...
A dialogue dataset annotated with fact-check-needed label (DDFC) for detecting sentences that do not require factual correctness judgment -
Ubuntu Dialogue Corpus (UDC)
The Ubuntu Dialogue Corpus (UDC) dataset was extracted from the Ubuntu Relay Chat Channel. Although the topics in the dataset are not as diverse as in the MTC, the dataset is... -
Movie Triples Corpus (MTC)
The Movie Triples Corpus (MTC) dataset was derived from the Movie-DiC dataset by Banchs (2012). Although this dataset spans a wide range of topics with few spelling mistakes,... -
Ubuntu Corpus
The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat. -
Baidu TieBa Corpus
The dataset used for context-oriented response selecting task, which is considered as a binary classification problem. -
Colors in Context (CIC) dataset
The Colors in Context (CIC) dataset is a referential communication task where participants describe items in a visual display using a free-form chat interface. -
Cornell Movie Dialogue Corpus
The Cornell Movie Dialogue Corpus -
Reddit Politics
Reddit Politics -
Topical-Chat
The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers). -
SummEval and Topical-Chat
This paper uses SummEval and Topical-Chat datasets for evaluating the quality of summaries and responses. -
Reddit conversation corpus
Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics. -
Schema-Guided Dialogue
The Schema-Guided Dialogue (SGD) dataset contains over 20,000 multi-domain conversations between a human and a virtual assistant. -
Action-Based Conversations Dataset
The Action-Based Conversations Dataset (ABCD) contains over 10,000 human-to-human customer service dialogues across multiple domains. -
Schema-Guided Dataset (SGD)
Schema-Guided Dataset (SGD) is the official dataset for the schema-guided state tracking challenge at DSTC8. The schema-guided dataset consists of 20 domains with a total of 45...