-
Extended MusicCaps
Extended MusicCaps is a music caption dataset that is extended to include images. -
MELFUSION: Synthesizing Music from Image and Language Cues using Diffusion Mo...
MELFUSION is a text-to-music diffusion model that can synthesize music conditioned on both visual and textual modality.