Dataset - LDM

Douban Conversation Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.
- Dataset
- JSON
Multi-Turn Dialogue Reasoning

A dataset for multi-turn dialogue reasoning
- Dataset
- JSON
DialogConv: A Lightweight Fully Convolutional Network for Multi-view Response...

A lightweight fully convolutional network for multi-view response selection
- Dataset
- JSON
BigBench

The BigBench dataset is a collection of 12 challenging language model reasoning tasks.
- Dataset
- JSON
TEL-NLP

The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
- Dataset
- JSON
ReferItGame

Visual grounding is the task of localizing a language query in an image. The output is often a bounding box as drawn in the yellow color.
- Dataset
- JSON
Flickr30K Entities

The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K...
- Dataset
- JSON
Vision-and-Language Navigation

The Vision-and-Language Navigation (VLN) task gives a global natural sentence I = {w0,..., wl} as an instruction, where wi is a word token while the l is the length of the...
- Dataset
- JSON
From Detection of Toxic Spans in Online Discussions to Analysis of Toxic-to-C...

The ToxicSpans dataset is a subset of the Civil Comments dataset, containing toxic spans.
- Dataset
- JSON
Hate Speech Detection using Large Language Models

The dataset used for probing LLMs for hate speech detection, including HateXplain, implicit hate, and ToxicSpans datasets.
- Dataset
- JSON
Visual instruction tuning

Visual instruction tuning.
- Dataset
- JSON
Flamingo: a visual language model for few-shot learning

Flamingo: a visual language model for few-shot learning.
- Dataset
- JSON
Audio-visual scene-aware dialog

Audio-visual scene-aware dialog.
- Dataset
- JSON
ChatBridge

ChatBridge is a multimodal language model capable of perceiving real-world multimodal information, as well as following instructions, thinking, and interacting with humans in...
- Dataset
- JSON
ClinicalLab: A Comprehensive Clinical Diagnosis Agent Alignment Suite

Large language models (LLMs) have achieved significant performance progress in various natural language processing applications. However, LLMs still struggle to meet the strict...
- Dataset
- JSON
TruthX: Alleviating Hallucinations by Editing Large Language Models

TruthX: Alleviating Hallucinations by Editing Large Language Models
- Dataset
- JSON
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure

This work presents StrAE: a Structured Autoencoder framework that through strict adherence to explicit structure, and use of a novel contrastive objective over tree-structured...
- Dataset
- JSON
ALFRED

The ALFRED benchmark includes 25,743 trajectory-instruction pairs, covering 7 different task types with varying levels of complexity.
- Dataset
- JSON
ShapeNeRF–Text

The ShapeNeRF–Text dataset consists of 40K paired NeRFs and language annotations for ShapeNet objects.
- Dataset
- JSON
Wikitext-2

The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used the Wikitext-2 dataset for text generation tasks.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

420 datasets found