LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

doi:doi:10.57702/c1w5v3ns

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

Text-guided image manipulation tasks have recently gained attention in the vision-and-language community. The GeNeVA task is a multi-turn text-conditioned image generation (MTIM) task. It involves two participants: a Teller that instructs how to modify the image, and a Drawer that draws the image according to the Teller’s instructions.

BibTex:

@dataset{Shoya_Matsumori_and_Yuki_Abe_and_Kosuke_Shingyouchi_and_Komei_Sugiura_and_Michita_Imai_2024,
    abstract = {Text-guided image manipulation tasks have recently gained attention in the vision-and-language community. The GeNeVA task is a multi-turn text-conditioned image generation (MTIM) task. It involves two participants: a Teller that instructs how to modify the image, and a Drawer that draws the image according to the Teller’s instructions.},
    author = {Shoya Matsumori and Yuki Abe and Kosuke Shingyouchi and Komei Sugiura and Michita Imai},
    doi = {10.57702/c1w5v3ns},
    institution = {No Organization},
    keyword = {'image manipulation', 'multi-turn text-conditioned image generation', 'text-to-image synthesis'},
    month = {dec},
    publisher = {TIB},
    title = {LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation},
    url = {https://service.tib.eu/ldmservice/dataset/lattegan--visually-guided-language-attention-for-multi-turn-text-conditioned-image-manipulation},
    year = {2024}
}