-
Visual instruction tuning
Visual instruction tuning. -
Flamingo: a visual language model for few-shot learning
Flamingo: a visual language model for few-shot learning. -
Audio-visual scene-aware dialog
Audio-visual scene-aware dialog. -
ChatBridge
ChatBridge is a multimodal language model capable of perceiving real-world multimodal information, as well as following instructions, thinking, and interacting with humans in...