BD807 (BLIP2-guided Dataset with 807 images)

NoiseCollage generates an image with N objects from the following conditions, L, S, and s∗: L = {l1,..., lN } is the N layout conditions to control the layout of individual objects. Each layout condition ln is represented as a region specified by a bounding box or a polygon. S = {s1,..., sN } is the set of N text conditions to describe the visual information of the objects. Each condition is given as a word sequence; for example, “A man wearing an orange jacket is sitting at a table.” s∗ is a global text condition to describe the whole image of the objects.

Data and Resources

Cite this as

Takahiro Shirakawa, Seiichi Uchida (2024). Dataset: BD807 (BLIP2-guided Dataset with 807 images). https://doi.org/10.57702/aaepfajg

DOI retrieved: December 3, 2024

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Defined In https://doi.org/10.48550/arXiv.2403.03485
Author Takahiro Shirakawa
More Authors
Seiichi Uchida
Homepage https://github.com/univ-esuty/noisecollage