You're currently viewing an old version of this dataset. To see the current version, click here.

Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization

Temporal action localization is a task to localize the start and end timestamps of action instances and recognize their categories. In recent years, many works put effort into the fully supervised manner and gain great achievements. However, these fully supervised methods require extensive manual frame/snippet level annotations. To address this problem, many weakly supervised temporal action localization (WS-TAL) methods are proposed to explore an efficient way to detect the action instances in the given videos with only video-level supervision which is more easily obtained by the annotator.

Data and Resources

Cite this as

Fa-Ting Hong, Jia-Chang Feng, Dan Xu, Ying Shan, Wei-Shi Zheng (2024). Dataset: Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. https://doi.org/10.57702/q8zmgrmi

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2107.12589
Author Fa-Ting Hong
More Authors
Jia-Chang Feng
Dan Xu
Ying Shan
Wei-Shi Zheng
Homepage https://doi.org/10.1145/3474085.3475298