-
WikiText-103 and Enwik8 datasets
WikiText-103 and Enwik8 datasets are used for language modeling tasks -
Paper-Author
Paper-Author: This dataset contains papers crawled from the arXiv preprint database. Nodes U represent papers, while nodes V represent authors. An edge ⟨u, v⟩ indicates that the... -
Multimodal Attribute Extraction (MAE) dataset
The Multimodal Attribute Extraction (MAE) dataset is a large dataset containing mixed-media data for over 2.2 million commercial product items, collected from a large number of... -
Equity Evaluation Corpus (EEC)
The dataset used in the paper is the Equity Evaluation Corpus (EEC) for emotion prediction, which contains a balanced dataset of sentences with emotions. -
SemEval-2023 Task 1: Visual Word Sense Disambiguation
The SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task dataset consists of a silver dataset with 12,869 V-WSD instances. Each sample is a 4-tuple ⟨f, c, I, i∗ ∈ I⟩ where... -
SemEval-2017 Task 4
The SemEval-2017 Task 4 dataset consists of tweets with sentiment labels. -
OpenSubtitles dataset
Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,... -
Schizophrenia Spectrum Dataset
The dataset used for this study was collected for a mental health assessment project conducted at the University of Maryland School of Medicine in collaboration with the... -
The KIT Motion-Language Dataset
The KIT Motion-Language Dataset consists of 3,911 motion sequences with 12.5 FPS and 6,278 language annotations. -
Text2Shape
Text2Shape is a dataset of 8,447 table instances and 6,591 chair instances from the ShapeNet dataset, along with 75,344 natural language descriptions.