InstructBLIP

The InstructBLIP dataset is a vision-language model for comprehensive scene understanding and textual descriptions.

BibTex: