29 datasets found

Formats: JSON Tags: Text

Filter Results
  • WikiText-103 and Enwik8 datasets

    WikiText-103 and Enwik8 datasets are used for language modeling tasks
  • Paper-Author

    Paper-Author: This dataset contains papers crawled from the arXiv preprint database. Nodes U represent papers, while nodes V represent authors. An edge ⟨u, v⟩ indicates that the...
  • AGNews

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
  • Multimodal Attribute Extraction (MAE) dataset

    The Multimodal Attribute Extraction (MAE) dataset is a large dataset containing mixed-media data for over 2.2 million commercial product items, collected from a large number of...
  • EIT-1M

    A large-scale multi-modal dataset comprising 1 million EEG-image-text pairs.
  • Equity Evaluation Corpus (EEC)

    The dataset used in the paper is the Equity Evaluation Corpus (EEC) for emotion prediction, which contains a balanced dataset of sentences with emotions.
  • CLIPfa

    The CLIPfa dataset is a multilingual image-text dataset.
  • SemEval-2023 Task 1: Visual Word Sense Disambiguation

    The SemEval-2023 Visual Word Sense Disambiguation (V-WSD) Task dataset consists of a silver dataset with 12,869 V-WSD instances. Each sample is a 4-tuple ⟨f, c, I, i∗ ∈ I⟩ where...
  • SemEval-2017 Task 4

    The SemEval-2017 Task 4 dataset consists of tweets with sentiment labels.
  • Sherlock

    The Sherlock dataset contains 103K images collected from the Visual Genome and Visual Common Sense Reasoning datasets. These images are split into 90K training, 6.6K validation,...
  • VQA

    The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.
  • Magazine

    Magazine: This dataset contains Amazon Aeviews Data under the category of Magazine Subscriptions. We randomly sampled 100, 000 records and removed nodes with degrees lower than...
  • OpenSubtitles dataset

    Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,...
  • Schizophrenia Spectrum Dataset

    The dataset used for this study was collected for a mental health assessment project conducted at the University of Maryland School of Medicine in collaboration with the...
  • The KIT Motion-Language Dataset

    The KIT Motion-Language Dataset consists of 3,911 motion sequences with 12.5 FPS and 6,278 language annotations.
  • UCM

    Remote sensing image-text retrieval dataset
  • RSITMD

    Remote sensing image-text retrieval dataset
  • RSICD

    The RSICD dataset is a benchmark remote sensing text-image dataset. It contains a total of 10921 aerial remote sensing images with various resolutions collected from Google...
  • Text2Shape

    Text2Shape is a dataset of 8,447 table instances and 6,591 chair instances from the ShapeNet dataset, along with 75,344 natural language descriptions.
  • Text8

    Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.
You can also access this registry using the API (see API Docs).