-
Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation ...
The dataset used in the paper is a simulated dough manipulation environment, where the goal is to create a donut, a baguette, and two pancakes using a set of candidate tools. -
On The Ingredients of an Effective Zero-shot Semantic Parser
Semantic parsers map natural language utterances into meaning representations (e.g. programs). Such models are typically bottle-necked by the paucity of training data due to the... -
UT-Zappos50K
The UT-Zappos50K dataset is a fine-grained shoe catalog, characterized by its smaller scale and relatively stable and simple content. -
Finetuned language models are zero-shot learners
Finetuned language models are zero-shot learners -
Zero-1-to-3: Zero-shot one image to 3D object
Zero-shot one image to 3D object. -
InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction
Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual... -
25 public datasets
The dataset used for evaluation of the MS-CLIP model, which consists of 25 public datasets for zero-shot learning and linear probing. -
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation
Semantic segmentation performs pixel-level classifica- tion to localize objects from different classes in the input image. Open-vocabulary semantic segmentation aims to... -
Google Open Images
Google Open Images dataset, which contains 19,958 categories and is used for zero-shot learning. -
Zero-shot video question answering via frozen bidirectional language models
Zero-shot video question answering via frozen bidirectional language models. -
HMDB51 and UCF101
The dataset used in the paper is HMDB51 and UCF101. -
Kinetics-400 and Something-Something-V2
The dataset used in the paper is Kinetics-400 and Something-Something-V2. -
Language-free Training for Zero-shot Video Grounding
Given an untrimmed video and a language query, video grounding aims to localize the time interval by understanding the text and video simultaneously. -
ZeroSearch Dataset
A custom dataset to simulate a user's image directory for testing the ZeroSearch algorithm. -
VGDiffZero: Text-to-Image Diffusion Models Can Be Zero-Shot Visual Grounders
VGDiffZero is a zero-shot visual grounding framework that leverages pre-trained text-to-image diffusion models' vision-language alignment abilities. -
Kinetics-600
The Kinetics-600 dataset consists of 392k training videos and 30k validation videos in 600 human action categories. -
Kinetics-400
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming....