Localizing moments in video with natural language

Localizing moments in video with natural language

BibTex: