Dialogue Systems - Groups

Google Assistant Dataset

The dataset used in this paper is a large-scale conversation dataset, generated by human evaluators, with a total of 20K conversations.

Dataset
JSON

DBDC4 Japanese

The DBDC4 Japanese dataset contains dialogues from three dialogue systems named DCM, DIT, and IRS, and five other dialogue systems (IRS, MMK, MRK, TRF, and ZNK) which...

Dataset
JSON

DBDC4

The Fourth Dialogue Breakdown Detection Challenge (DBDC4) dataset contains dialogues from a dialogue system named IRIS and six other dialogue systems (anonymised as Bot001 to...

Dataset
JSON

Ubuntu Dialogue Corpus (UDC)

The Ubuntu Dialogue Corpus (UDC) dataset was extracted from the Ubuntu Relay Chat Channel. Although the topics in the dataset are not as diverse as in the MTC, the dataset is...

Dataset
JSON

Movie Triples Corpus (MTC)

The Movie Triples Corpus (MTC) dataset was derived from the Movie-DiC dataset by Banchs (2012). Although this dataset spans a wide range of topics with few spelling mistakes,...

Dataset
JSON

Ubuntu Corpus

The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat.

Dataset
JSON

Baidu TieBa Corpus

The dataset used for context-oriented response selecting task, which is considered as a binary classification problem.

Dataset
JSON

Topical-Chat

The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers).

Dataset
JSON

Reddit conversation corpus

Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics.

Dataset
JSON

STC

The STC dataset is a short-text conversation dataset collected from Sina Weibo, a Chinese social platform.

Dataset
JSON

E-commerce Dialogue Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.

Dataset
JSON

Douban Conversation Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.

Dataset
JSON

ConvAI2 persona-chat dataset

The ConvAI2 persona-chat dataset is an extended version of the persona-chat dataset, which contains conversations obtained from crowdworkers who were randomly paired and asked...

Dataset
JSON

Wizard of Wikipedia

Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from...

Dataset
JSON

DailyDialog

The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each.

Dataset
JSON

EmpatheticDialogues

The EmpatheticDialogues dataset is a text dataset for training empathetic AI chatbots, consisting of 25k conversations grounded in emotional situations with emotion labels.

Dataset
JSON

Ubuntu Dialogue Corpus

The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu...

Dataset
JSON

17 datasets found