-
Multiple Subjects Generation
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a large-scale text-to-image model to generate images with multiple subjects. -
Stacked Wasserstein Autoencoder
The proposed model is built on the theoretical analysis presented in [30,14]. Similar to the ARAE [14], our model provides flexibility in learning an autoencoder from the input... -
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of ...
Dysca is a dynamic and scalable benchmark for evaluating the perception ability of Large Vision-Language Models (LVLMs) via various subtasks and scenarios. -
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
MARS is an innovative auto-regressive framework that not only retains the capabilities of pre-trained Large Language Models (LLMs) but also incorporates top-tier text-to-image... -
DPM-Solver++
The dataset used in the paper is DPM-Solver++ -
Exact Diffusion Inversion via Bi-directional Integration Approximation
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a pre-trained model to generate images. -
BigGAN-Deep
This dataset is used for training and testing the BigGAN model. -
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stabl...
The dataset used in the paper is not explicitly described, but it is mentioned that the authors analyzed the generative mechanism of diffusion models and proposed a novel method... -
Image Generation from Scene Graphs
Image generation from scene graphs is a task in computer vision that requires generating images from graph-structured inputs, such as scene graphs. -
Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-T...
Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and... -
RECAP: Principled Recaptioning Improves Image Generation
A text-to-image diffusion model trained on a recaptioned dataset to improve image generation quality and semantic alignment. -
ControlVAE: Controllable Variational Autoencoder
The dataset used for language modeling, disentangled representation learning, and image generation. -
Relational CLEVR
Relational CLEVR is a synthetic dataset of rendered 3D objects of various colours, shapes, sizes and textures. -
Positional CLEVR
Positional CLEVR is a synthetic dataset of rendered 3D objects of various colours, shapes, sizes and textures. -
Improved Precision and Recall Metric for Assessing Generative Models
The dataset used in the paper is not explicitly described, but it is mentioned that it is a generative model dataset. -
Student’s t-Generative Adversarial Networks
Generative Adversarial Networks (GANs) have a great performance in image generation, but they need a large scale of data to train the entire framework, and often result in...