1 dataset found

Tags: multimodal question answering

Filter Results
  • InstructBLIP

    The InstructBLIP dataset is a vision-language model for comprehensive scene understanding and textual descriptions.
You can also access this registry using the API (see API Docs).