Natural Language Processing - Groups

CIMT Argument Concreteness Dataset

The dataset is used for the evaluation of argument quality classification tasks, including concreteness, validity, and novelty.
- Dataset
- JSON
German Reviews Dataset

A dataset for sentiment analysis on German reviews.
- Dataset
- JSON
English Reviews Dataset

A dataset for sentiment analysis on English reviews.
- Dataset
- JSON
Spanish Reviews Dataset

A dataset for sentiment analysis on Spanish reviews.
- Dataset
- JSON
Universal and Unsupervised Sentiment Analysis

A novel model for universal and unsupervised sentiment analysis driven by a set of syntactic rules for semantic composition.
- Dataset
- JSON
ROCStories

The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations.
- Dataset
- JSON
Crowd-sourced Language Annotations Dataset

The dataset consists of 5,600 episode-instruction pairs, where each episode is labeled with two hindsight instructions each.
- Dataset
- JSON
Data-driven Instruction Augmentation for Language-conditioned Control

Data-driven Instruction Augmentation for Language-conditioned Control (DIAL) is a method that uses pre-trained vision-language models (VLMs) to label offline datasets for...
- Dataset
- JSON
MuTual

A dataset for research in multi-turn dialogue systems
- Dataset
- JSON
E-commerce Dialogue Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.
- Dataset
- JSON
Douban Conversation Corpus

The dataset is used for training and testing response selection models for multi-turn conversations.
- Dataset
- JSON
Multi-Turn Dialogue Reasoning

A dataset for multi-turn dialogue reasoning
- Dataset
- JSON
DialogConv: A Lightweight Fully Convolutional Network for Multi-view Response...

A lightweight fully convolutional network for multi-view response selection
- Dataset
- JSON
BigBench

The BigBench dataset is a collection of 12 challenging language model reasoning tasks.
- Dataset
- JSON
TEL-NLP

The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
- Dataset
- JSON
ReferItGame

Visual grounding is the task of localizing a language query in an image. The output is often a bounding box as drawn in the yellow color.
- Dataset
- JSON
Flickr30K Entities

The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K...
- Dataset
- JSON
Vision-and-Language Navigation

The Vision-and-Language Navigation (VLN) task gives a global natural sentence I = {w0,..., wl} as an instruction, where wi is a word token while the l is the length of the...
- Dataset
- JSON
From Detection of Toxic Spans in Online Discussions to Analysis of Toxic-to-C...

The ToxicSpans dataset is a subset of the Civil Comments dataset, containing toxic spans.
- Dataset
- JSON
Hate Speech Detection using Large Language Models

The dataset used for probing LLMs for hate speech detection, including HateXplain, implicit hate, and ToxicSpans datasets.
- Dataset
- JSON

530 datasets found