Dialogue Systems - Groups

DIASPORA

The DIASPORA dataset is a human-human conversation dataset annotated with lexical aspect.

Dataset
JSON

DialogZoo

A large-scale dialogue dataset with rich task diversity, collected to pre-train a unified dialogue foundation model.

Dataset
JSON

GoRecDial

A conversational recommendation dataset released by Kang et al. This dataset was constructed using ParlAI to interface with Amazon Mechanical Turk (AMT) to reflect the movie...

Dataset
JSON

Learning to speak and act in a fantasy text adventure game

A dataset of text-adventure game dialogues, including fantasy and horror games.

Dataset
JSON

Doctor-Patient Conversations Corpus

The dataset used in this paper is a corpus of nearly 7,000 doctor-patient conversations.

Dataset
JSON

Dialogue Dataset for Detecting Sentences that Do Not Require Factual Correctn...

A dialogue dataset annotated with fact-check-needed label (DDFC) for detecting sentences that do not require factual correctness judgment

Dataset
JSON

Ubuntu Dialogue Corpus (UDC)

The Ubuntu Dialogue Corpus (UDC) dataset was extracted from the Ubuntu Relay Chat Channel. Although the topics in the dataset are not as diverse as in the MTC, the dataset is...

Dataset
JSON

Movie Triples Corpus (MTC)

The Movie Triples Corpus (MTC) dataset was derived from the Movie-DiC dataset by Banchs (2012). Although this dataset spans a wide range of topics with few spelling mistakes,...

Dataset
JSON

Ubuntu Corpus

The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat.

Dataset
JSON

Baidu TieBa Corpus

The dataset used for context-oriented response selecting task, which is considered as a binary classification problem.

Dataset
JSON

Colors in Context (CIC) dataset

The Colors in Context (CIC) dataset is a referential communication task where participants describe items in a visual display using a free-form chat interface.

Dataset
JSON

Cornell Movie Dialogue Corpus

The Cornell Movie Dialogue Corpus

Dataset
JSON

Reddit Politics

Dataset
JSON

Topical-Chat

The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers).

Dataset
JSON

SummEval and Topical-Chat

This paper uses SummEval and Topical-Chat datasets for evaluating the quality of summaries and responses.

Dataset
JSON

Reddit conversation corpus

Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics.

Dataset
JSON

Schema-Guided Dialogue

The Schema-Guided Dialogue (SGD) dataset contains over 20,000 multi-domain conversations between a human and a virtual assistant.

Dataset
JSON

Action-Based Conversations Dataset

The Action-Based Conversations Dataset (ABCD) contains over 10,000 human-to-human customer service dialogues across multiple domains.

Dataset
JSON

BERT-DST

BERT-DST: scalable end-to-end dialogue state tracking with bidirectional encoder representations from transformer.

Dataset
JSON

Schema-Guided Dataset (SGD)

Schema-Guided Dataset (SGD) is the official dataset for the schema-guided state tracking challenge at DSTC8. The schema-guided dataset consists of 20 domains with a total of 45...

Dataset
JSON

38 datasets found