SpeechCLIP

SpeechCLIP is a novel framework to integrate speech SSL models with a pre-trained vision and language model.

BibTex: