MM_Claims Dataset

This dataset is introduced by the paper "MM-Claims: A Dataset for Multimodal Claim Detection in Social Media"

If you use this dataset in your work, please cite:

@inproceedings{cheema-etal-2022-mm, title = "{MM}-Claims: A Dataset for Multimodal Claim Detection in Social Media", author = {Cheema, Gullal Singh and Hakimov, Sherzod and Sittar, Abdul and M{\"u}ller-Budack, Eric and Otto, Christian and Ewerth, Ralph}, booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-naacl.72", pages = "962--979" }

Information about columns in the files:

  1. claim_binary: {0: 'Not a claim', 1: 'claim'}

  2. claim_three: {0: 'Not a claim', '1': 'claim but not check-worthy', 2: 'check-worthy claim'}

  3. claim_vis: {0: 'Not a claim', '1': 'visually-irrelevant claim', 2: 'visually-relevant claim'}

Official code repository: https://github.com/TIBHannover/MM_Claims

All files were updated on 5th May 2023, with some images removed because of obscene images that were not automatically detected in the first phase.

If you are interested in the binary task on check-worthiness estimation in multimodal claims, you can find the refined dataset with new test data released as part of the CLEF Checkthat! 2023 challenge: https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main

BibTex: