Zero-shot video question answering via frozen bidirectional language models

Zero-shot video question answering via frozen bidirectional language models.

BibTex: