-
CIMT Argument Concreteness Dataset
The dataset is used for the evaluation of argument quality classification tasks, including concreteness, validity, and novelty. -
German Reviews Dataset
A dataset for sentiment analysis on German reviews. -
English Reviews Dataset
A dataset for sentiment analysis on English reviews. -
Spanish Reviews Dataset
A dataset for sentiment analysis on Spanish reviews. -
Universal and Unsupervised Sentiment Analysis
A novel model for universal and unsupervised sentiment analysis driven by a set of syntactic rules for semantic composition. -
ROCStories
The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations. -
Crowd-sourced Language Annotations Dataset
The dataset consists of 5,600 episode-instruction pairs, where each episode is labeled with two hindsight instructions each. -
Data-driven Instruction Augmentation for Language-conditioned Control
Data-driven Instruction Augmentation for Language-conditioned Control (DIAL) is a method that uses pre-trained vision-language models (VLMs) to label offline datasets for... -
E-commerce Dialogue Corpus
The dataset is used for training and testing response selection models for multi-turn conversations. -
Douban Conversation Corpus
The dataset is used for training and testing response selection models for multi-turn conversations. -
Multi-Turn Dialogue Reasoning
A dataset for multi-turn dialogue reasoning -
DialogConv: A Lightweight Fully Convolutional Network for Multi-view Response...
A lightweight fully convolutional network for multi-view response selection -
ReferItGame
Visual grounding is the task of localizing a language query in an image. The output is often a bounding box as drawn in the yellow color. -
Flickr30K Entities
The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K... -
Vision-and-Language Navigation
The Vision-and-Language Navigation (VLN) task gives a global natural sentence I = {w0,..., wl} as an instruction, where wi is a word token while the l is the length of the... -
From Detection of Toxic Spans in Online Discussions to Analysis of Toxic-to-C...
The ToxicSpans dataset is a subset of the Civil Comments dataset, containing toxic spans. -
Hate Speech Detection using Large Language Models
The dataset used for probing LLMs for hate speech detection, including HateXplain, implicit hate, and ToxicSpans datasets.