Dialogue Systems - Groups

Ubuntu Dialogue Corpus (UDC)

The Ubuntu Dialogue Corpus (UDC) dataset was extracted from the Ubuntu Relay Chat Channel. Although the topics in the dataset are not as diverse as in the MTC, the dataset is...

Dataset
JSON

Movie Triples Corpus (MTC)

The Movie Triples Corpus (MTC) dataset was derived from the Movie-DiC dataset by Banchs (2012). Although this dataset spans a wide range of topics with few spelling mistakes,...

Dataset
JSON

Ubuntu Corpus

The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat.

Dataset
JSON

Baidu TieBa Corpus

The dataset used for context-oriented response selecting task, which is considered as a binary classification problem.

Dataset
JSON

Topical-Chat

The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers).

Dataset
JSON

Reddit conversation corpus

Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics.

Dataset
JSON

ConvAI2 persona-chat dataset

The ConvAI2 persona-chat dataset is an extended version of the persona-chat dataset, which contains conversations obtained from crowdworkers who were randomly paired and asked...

Dataset
JSON

DailyDialog

The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each.

Dataset
JSON

EmpatheticDialogues

The EmpatheticDialogues dataset is a text dataset for training empathetic AI chatbots, consisting of 25k conversations grounded in emotional situations with emotion labels.

Dataset
JSON

9 datasets found