You're currently viewing an old version of this dataset. To see the current version, click here.

Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization

Temporal action localization is a task to localize the start and end timestamps of action instances and recognize their categories. In recent years, many works put effort into the fully supervised manner and gain great achievements. However, these fully supervised methods require extensive manual frame/snippet level annotations. To address this problem, many weakly supervised temporal action localization (WS-TAL) methods are proposed to explore an efficient way to detect the action instances in the given videos with only video-level supervision which is more easily obtained by the annotator.

Data and Resources

This dataset has no data

Cite this as

Fa-Ting Hong, Jia-Chang Feng, Dan Xu, Ying Shan, Wei-Shi Zheng (2024). Dataset: Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. https://doi.org/10.57702/q8zmgrmi

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2107.12589
Author	Fa-Ting Hong
More Authors	Jia-Chang Feng Dan Xu Ying Shan Wei-Shi Zheng
Homepage	https://doi.org/10.1145/3474085.3475298