Dataset - LDM

Reddit Comments dataset

The Reddit Comments dataset is constructed from publicly available user comments on submissions on the Reddit website.
- Dataset
- JSON
ATIS Intent Classification dataset

The dataset used in this paper is a noisy annotated dataset obtained from a zero-shot learner based module.
- Dataset
- JSON
Conversational dataset

The conversational dataset is used to evaluate the performance of the proposed algorithms. The dataset consists of 20,000 questions and answers, where each question is answered...
- Dataset
- JSON
LAMBADA

The dataset used in the paper is a corpus of text containing approximately 10,000 examples, each a sequence of sentences extracted from books.
- Dataset
- JSON
Empathetic Dialogue dataset

The Empathetic Dialogue dataset is a dataset of conversations related to daily life, each with an emotion label, a situation described in text, and a short two-party dialogue.
- Dataset
- JSON
SpeechBrain 1.0

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker...
- Dataset
- JSON
Image-Chat: Engaging Grounded Conversations

Image-Chat dataset
- Dataset
- JSON
Polaris: A Safety-focused LLM Constellation for Healthcare

The Polaris dataset is a collection of conversations between a patient and a healthcare agent, with the goal of developing a safety-focused Large Language Model (LLM)...
- Dataset
- JSON
ESConv

The ESConv dataset is a collection of emotional support conversations, where the agent plays the role of a supporter and the user plays the role of a seeker. The dataset is used...
- Dataset
- JSON
SIMMC: Situated Interactive Multi-Modal Conversational Data Collection and Ev...

SIMMC is an extension to ParlAI for multi-modal conversational data collection and system evaluation. It simulates an immersive setup, where crowd workers interact with...
- Dataset
- JSON
DailyDialog

The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each.
- Dataset
- JSON
EmpatheticDialogues

The EmpatheticDialogues dataset is a text dataset for training empathetic AI chatbots, consisting of 25k conversations grounded in emotional situations with emotion labels.
- Dataset
- JSON
Ubuntu Dialogue Corpus

The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

13 datasets found