LISA: Localized Image Stylization with Audio

A novel framework for audio-guided local image stylization, named LISA. Audio-visual sound source localizer provides a delicate localization map by leveraging the CLIP embedding space in a weakly supervised manner.

BibTex: