17 datasets found

Groups: Conversational AI Organizations: No Organization

Filter Results
  • Google Assistant Dataset

    The dataset used in this paper is a large-scale conversation dataset, generated by human evaluators, with a total of 20K conversations.
  • DBDC4 Japanese

    The DBDC4 Japanese dataset contains dialogues from three dialogue systems named DCM, DIT, and IRS, and five other dialogue systems (IRS, MMK, MRK, TRF, and ZNK) which...
  • DBDC4

    The Fourth Dialogue Breakdown Detection Challenge (DBDC4) dataset contains dialogues from a dialogue system named IRIS and six other dialogue systems (anonymised as Bot001 to...
  • Ubuntu Dialogue Corpus (UDC)

    The Ubuntu Dialogue Corpus (UDC) dataset was extracted from the Ubuntu Relay Chat Channel. Although the topics in the dataset are not as diverse as in the MTC, the dataset is...
  • Movie Triples Corpus (MTC)

    The Movie Triples Corpus (MTC) dataset was derived from the Movie-DiC dataset by Banchs (2012). Although this dataset spans a wide range of topics with few spelling mistakes,...
  • Ubuntu Corpus

    The dataset used in the paper is the Ubuntu Corpus, which consists of dialogues from the Ubuntu technical support chat.
  • Baidu TieBa Corpus

    The dataset used for context-oriented response selecting task, which is considered as a binary classification problem.
  • Topical-Chat

    The Topical-Chat dataset is a knowledge-grounded open-domain conversational dataset, which consists of dialogues between two Mechanical Turk workers (a.k.a. Turkers).
  • Reddit conversation corpus

    Reddit conversation corpus, consisting of data extracted from 95 top-ranked subreddits that discuss various topics such as sports, news, education and politics.
  • STC

    The STC dataset is a short-text conversation dataset collected from Sina Weibo, a Chinese social platform.
  • E-commerce Dialogue Corpus

    The dataset is used for training and testing response selection models for multi-turn conversations.
  • Douban Conversation Corpus

    The dataset is used for training and testing response selection models for multi-turn conversations.
  • ConvAI2 persona-chat dataset

    The ConvAI2 persona-chat dataset is an extended version of the persona-chat dataset, which contains conversations obtained from crowdworkers who were randomly paired and asked...
  • Wizard of Wikipedia

    Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from...
  • DailyDialog

    The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each.
  • EmpatheticDialogues

    The EmpatheticDialogues dataset is a text dataset for training empathetic AI chatbots, consisting of 25k conversations grounded in emotional situations with emotion labels.
  • Ubuntu Dialogue Corpus

    The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu...