Natural Language Processing - Groups

R4R Dataset

The R4R dataset is a larger VLN dataset than R2R and with more complicated navigation paths.

Dataset
JSON

R2R Dataset

The R2R dataset is a dataset based on real photos taken in indoor environments. It attracts massive attention for its simple-form task, which at the same time requires complex...

Dataset
JSON

Unsupervised alignment of embeddings with Wasserstein procrustes

This study introduces a new method for unsupervised alignment of embeddings with Wasserstein procrustes.

Dataset
JSON

Discovering Universal Geometry in Embeddings with ICA

This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images.

Dataset
JSON

One-stage Visual Grounding

A fast and accurate one-stage approach to visual grounding

Dataset
JSON

InstanceRefer

Cooperative holistic understanding for visual grounding on point clouds through instance multi-level contextual referring

Dataset
JSON

Free-form description guided 3D visual graph network for object grounding in ...

Free-form description guided 3D visual graph network for 3D object grounding in point clouds

Dataset
JSON

CIFAR-10, FEMNIST, and IMDB

The dataset used in the paper is CIFAR-10, FEMNIST, and IMDB. The authors used these datasets to evaluate the performance of the EmbracingFL framework.

Dataset
JSON

Room-to-Room (R2R) dataset

The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three...

Dataset
JSON

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Tra...

The dataset used in this paper is ImageNet and SQuAD and GLUE datasets.

Dataset
JSON

Text2Pos

Text2Pos for city-scale position localization based on textual descriptions. Given a point cloud that represents our surroundings and a query position description, Text2Pos...

Dataset
JSON

Data-driven Instruction Augmentation for Language-conditioned Control

Data-driven Instruction Augmentation for Language-conditioned Control (DIAL) is a method that uses pre-trained vision-language models (VLMs) to label offline datasets for...

Dataset
JSON

Vision-and-Language Navigation

The Vision-and-Language Navigation (VLN) task gives a global natural sentence I = {w0,..., wl} as an instruction, where wi is a word token while the l is the length of the...

Dataset
JSON

ScanRefer

ScanRefer is a dataset of 51,583 referring descriptions of 11,046 objects from 800 ScanNet scenes.

Dataset
JSON

PhotoBot: Reference-Guided Interactive Photography via Natural Language

PhotoBot is a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer.

Dataset
JSON

Training CLIP models on Data from Scientific Papers

Contrastive Language-Image Pretraining (CLIP) models are trained with datasets extracted from web crawls, which are of large quantity but limited quality. This paper explores...

Dataset
JSON

Validation Dataset

The Validation Dataset is used for validation, it contains 1428 images from nine distinct rooms.

Dataset
JSON

CIFAR-10, CIFAR-100, Stanford background dataset, VOC2012 dataset, Rotten Tom...

The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and Stanford...

Dataset
JSON

DEMYSTIFYING CLIP DATA

Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced research and applications in computer vision, fueling modern recognition systems and generative...

Dataset
JSON

Various Datasets

The datasets used in the paper are described as follows: WikiMIA, BookMIA, Temporal Wiki, Temporal arXiv, ArXiv-1 month, Multi-Webdata, LAION-MI, Gutenberg.

Dataset
JSON

23 datasets found