Learning to Predict Situation Hyper-Graphs for Video Question Answering

The SHG-VQA model predicts a situation hyper-graph structure composed of existing actions and relations in the input video.

BibTex: