-
Google Assistant Dataset
The dataset used in this paper is a large-scale conversation dataset, generated by human evaluators, with a total of 20K conversations. -
DBDC4 Japanese
The DBDC4 Japanese dataset contains dialogues from three dialogue systems named DCM, DIT, and IRS, and five other dialogue systems (IRS, MMK, MRK, TRF, and ZNK) which... -
Ubuntu Dialogue Corpus (UDC)
The Ubuntu Dialogue Corpus (UDC) dataset was extracted from the Ubuntu Relay Chat Channel. Although the topics in the dataset are not as diverse as in the MTC, the dataset is... -
Movie Triples Corpus (MTC)
The Movie Triples Corpus (MTC) dataset was derived from the Movie-DiC dataset by Banchs (2012). Although this dataset spans a wide range of topics with few spelling mistakes,... -
Ubuntu Corpus
The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat. -
Baidu TieBa Corpus
The dataset used for context-oriented response selecting task, which is considered as a binary classification problem. -
Topical-Chat
The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers). -
Reddit conversation corpus
Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics. -
E-commerce Dialogue Corpus
The dataset is used for training and testing response selection models for multi-turn conversations. -
Douban Conversation Corpus
The dataset is used for training and testing response selection models for multi-turn conversations. -
ConvAI2 persona-chat dataset
The ConvAI2 persona-chat dataset is an extended version of the persona-chat dataset, which contains conversations obtained from crowdworkers who were randomly paired and asked... -
Wizard of Wikipedia
Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from... -
DailyDialog
The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each. -
EmpatheticDialogues
The EmpatheticDialogues dataset is a text dataset for training empathetic AI chatbots, consisting of 25k conversations grounded in emotional situations with emotion labels. -
Ubuntu Dialogue Corpus
The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu...