Dialogue Systems - Groups

Ubuntu Corpus

The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat.

Dataset
JSON

Baidu TieBa Corpus

The dataset used for context-oriented response selecting task, which is considered as a binary classification problem.

Dataset
JSON

Topical-Chat

The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers).

Dataset
JSON

Reddit conversation corpus

Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics.

Dataset
JSON

STC

The STC dataset is a short-text conversation dataset collected from Sina Weibo, a Chinese social platform.

Dataset
JSON

E-commerce Dialogue Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.

Dataset
JSON

Douban Conversation Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.

Dataset
JSON

ConvAI2 persona-chat dataset

The ConvAI2 persona-chat dataset is an extended version of the persona-chat dataset, which contains conversations obtained from crowdworkers who were randomly paired and asked...

Dataset
JSON

Wizard of Wikipedia

Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from...

Dataset
JSON

DailyDialog

The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each.

Dataset
JSON

EmpatheticDialogues

The EmpatheticDialogues dataset is a text dataset for training empathetic AI chatbots, consisting of 25k conversations grounded in emotional situations with emotion labels.

Dataset
JSON

Ubuntu Dialogue Corpus

The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu...

Dataset
JSON

12 datasets found