-
Generation-based Conversation Dataset
The dataset used in the paper is another dataset containing 1.6 million query-reply pairs for generation-based conversation systems. -
Retrieval-based Conversation Dataset
The dataset used in the paper is a large database of query-reply pairs for retrieval-based conversation systems, containing 7 million query-reply pairs. -
Ubuntu Corpus
The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat. -
Baidu TieBa Corpus
The dataset used for context-oriented response selecting task, which is considered as a binary classification problem. -
Colors in Context (CIC) dataset
The Colors in Context (CIC) dataset is a referential communication task where participants describe items in a visual display using a free-form chat interface. -
Cornell Movie Dialogue Corpus
The Cornell Movie Dialogue Corpus -
Reddit Politics
Reddit Politics -
BURCHAK corpus
A new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. -
Improving multi-turn dialogue modeling with utterance ReWriter
Improving multi-turn dialogue modeling with utterance ReWriter. -
RAST: Domain-robust dialogue rewriting as sequence tagging
RAST: Domain-robust dialogue rewriting as sequence tagging. -
Improving open-domain dialogue systems via multi-turn incomplete utterance re...
Improving open-domain dialogue systems via multi-turn incomplete utterance restoration. -
Mining Clues from Incomplete Utterance: A Query-enhanced Network for Incomple...
Incomplete utterance rewriting has recently raised wide attention. However, previous works do not consider the semantic structural information between incomplete utterance and... -
RedPajama Dataset
The RedPajama dataset is used for single-turn dialogue task. -
Stanford Multi-turn, Multi-domain Dialogue Dataset
The Stanford Multi-turn, Multi-domain Dialogue Dataset is a dataset for language understanding in task-oriented dialogue systems. It contains a large number of training... -
Airline Travel Information System dataset (ATIS)
The Airline Travel Information System dataset (ATIS) is a dataset for language understanding in task-oriented dialogue systems. It contains 4978 training utterances from Class A... -
Topical-Chat
The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers). -
SummEval and Topical-Chat
This paper uses SummEval and Topical-Chat datasets for evaluating the quality of summaries and responses. -
Reddit conversation corpus
Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics. -
Airline Travel Information System (ATIS) dataset
The ATIS dataset contains a set of flight schedule query sessions, each of which consists of a sequence of spoken queries (utterances). Each query contains automatic speech...