-
Reddit Comments dataset
The Reddit Comments dataset is constructed from publicly available user comments on submissions on the Reddit website. -
ATIS Intent Classification dataset
The dataset used in this paper is a noisy annotated dataset obtained from a zero-shot learner based module. -
Conversational dataset
The conversational dataset is used to evaluate the performance of the proposed algorithms. The dataset consists of 20,000 questions and answers, where each question is answered... -
Empathetic Dialogue dataset
The Empathetic Dialogue dataset is a dataset of conversations related to daily life, each with an emotion label, a situation described in text, and a short two-party dialogue. -
SpeechBrain 1.0
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker... -
Image-Chat: Engaging Grounded Conversations
Image-Chat dataset -
Polaris: A Safety-focused LLM Constellation for Healthcare
The Polaris dataset is a collection of conversations between a patient and a healthcare agent, with the goal of developing a safety-focused Large Language Model (LLM)... -
SIMMC: Situated Interactive Multi-Modal Conversational Data Collection and Ev...
SIMMC is an extension to ParlAI for multi-modal conversational data collection and system evaluation. It simulates an immersive setup, where crowd workers interact with... -
DailyDialog
The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each. -
EmpatheticDialogues
The EmpatheticDialogues dataset is a text dataset for training empathetic AI chatbots, consisting of 25k conversations grounded in emotional situations with emotion labels. -
Ubuntu Dialogue Corpus
The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu...