Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusively generate highly visually appealing images, while maintaining generality across visual concepts.

Data and Resources

Cite this as

xiaoliangdai, jihou, cyma, sstsai, jialiangw, ruiw, stzpz (2024). Dataset: Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack. https://doi.org/10.57702/b5jlsdyr

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Author xiaoliangdai
More Authors
jihou
cyma
sstsai
jialiangw
ruiw
stzpz