1 dataset found

Tags: instruction-tuned audio-visual language model

Filter Results
  • Video-LLaMA

    Video-LLaMA: An instruction-tuned audio-visual language model for video understanding.
You can also access this registry using the API (see API Docs).