MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition

Sign language recognition (SLR) has long been plagued by insufficient model representation capabilities. Al-though current pre-training approaches have alleviated this dilemma to some extent and yielded promising performance by employing various pretext tasks on sign pose data, these methods still suffer from two primary limitations: i) Explicit motion information is usually disregarded in previous pretext tasks, leading to partial information loss and limited representation capability. ii) Previous methods focus on the local context of a sign pose sequence, without incorporating the guidance of the global meaning of lexical signs.

Data and Resources

Cite this as

Weichao Zhao, Hezhen Hu, Wengang Zhou, Yunyao Mao, Min Wang, Houqiang Li (2024). Dataset: MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition. https://doi.org/10.57702/snvks7wc

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Author Weichao Zhao
More Authors
Hezhen Hu
Wengang Zhou
Yunyao Mao
Min Wang
Houqiang Li