Cross-modal and hierarchical modeling of video and text

Cross-modal and hierarchical modeling of video and text.

BibTex: