Conversational AI - Groups

DICES-350

The DICES-350 dataset is a curated sample of 8k multi-turn conversation corpus generated by human agents interacting with a generative AI-chatbot (Thoppilan et al., 2022) in an...

Dataset
JSON

ChatGPT: A conversational AI model

The dataset used in the paper ChatGPT: A conversational AI model.

Dataset
JSON

Mutual: A dataset for multi-turn dialogue reasoning

A dataset for multi-turn dialogue reasoning.

Dataset
JSON

Enhancing chat language models by scaling high-quality instructional conversa...

Enhancing chat language models by scaling high-quality instructional conversations.

Dataset
JSON

Polaris: A Safety-focused LLM Constellation for Healthcare

The Polaris dataset is a collection of conversations between a patient and a healthcare agent, with the goal of developing a safety-focused Large Language Model (LLM)...

Dataset
JSON

Chatbot Arena

The dataset used in this paper is a large-scale dataset for evaluating LLMs, which is used to train and evaluate the Chatbot Arena model.

Dataset
JSON

Arena-Hard

The dataset used in this paper is a large-scale dataset for evaluating LLMs, which is used to train and evaluate the Arena-Hard model.

Dataset
JSON

LMSYS ChatBot Arena

The dataset used in this paper is a large-scale real-world LLM conversation dataset, which is used to train and evaluate the LMSYS ChatBot Arena model.

Dataset
JSON

WizardArena

The dataset used in this paper is a large-scale conversational data, which is used to train and evaluate the WizardLM-β model.

Dataset
JSON

LSDSCC

The dataset is a large-scale conversational corpus for response generation with diversity oriented evaluation metrics.

Dataset
JSON

OpenAssistant Conversations– Democratizing Large Language Model Alignment

Dataset
JSON

E-commerce Dialogue Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.

Dataset
JSON

Douban Conversation Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.

Dataset
JSON

ConvAI2 persona-chat dataset

The ConvAI2 persona-chat dataset is an extended version of the persona-chat dataset, which contains conversations obtained from crowdworkers who were randomly paired and asked...

Dataset
JSON

User Reported Scenarios (URS) dataset

The User Reported Scenarios (URS) dataset is a collection of real-world use cases with 15 LLMs from a user study with 712 participants from 23 countries.

Dataset
JSON

ConvAI2

The ConvAI2 dialogue corpus is a dataset of personalized dialogues with corresponding persona descriptions.

Dataset
JSON

Wizard of Wikipedia

Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from...

Dataset
JSON

STC dataset

The STC dataset is a short text conversation dataset used for evaluating the performance of conversation response generation models.

Dataset
JSON

OpenAssistant

The authors used the OpenAssistant dataset to construct evaluation datasets for their attacks.

Dataset
JSON

SIMMC: Situated Interactive Multi-Modal Conversational Data Collection and Ev...

SIMMC is an extension to ParlAI for multi-modal conversational data collection and system evaluation. It simulates an immersive setup, where crowd workers interact with...

Dataset
JSON

43 datasets found