-
Dense regression network for video grounding
Dense regression network for video grounding -
Semantic conditioned dynamic modulation for temporal sentence grounding in vi...
Semantic conditioned dynamic modulation for temporal sentence grounding in videos -
Multilevel language and vision integration for text-to-clip retrieval
Multilevel language and vision integration for text-to-clip retrieval -
Tall: Temporal activity localization via language query
Tall: Temporal activity localization via language query. -
Support-Set Based Cross-Supervision for Video Grounding
Support-Set Based Cross-Supervision for Video Grounding -
Localizing moments in video with natural language
Localizing moments in video with natural language -
Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Vide...
Human-centric spatio-temporal video grounding (HC-STVG) task aims to localize a spatio-temporal tube of the target person indicated by a language description.