Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection

Multimodal hateful content detection is a challenging task that requires complex reasoning across visual and textual modalities.

BibTex: