MM_Claims Dataset

doi:doi:10.25835/2lg7peic

MM_Claims Dataset

This dataset is introduced by the paper "MM-Claims: A Dataset for Multimodal Claim Detection in Social Media"

If you use this dataset in your work, please cite:

@inproceedings{cheema-etal-2022-mm, title = "{MM}-Claims: A Dataset for Multimodal Claim Detection in Social Media", author = {Cheema, Gullal Singh and Hakimov, Sherzod and Sittar, Abdul and M{\"u}ller-Budack, Eric and Otto, Christian and Ewerth, Ralph}, booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-naacl.72", pages = "962--979" }

Information about columns in the files:

claim_binary: {0: 'Not a claim', 1: 'claim'}
claim_three: {0: 'Not a claim', '1': 'claim but not check-worthy', 2: 'check-worthy claim'}
claim_vis: {0: 'Not a claim', '1': 'visually-irrelevant claim', 2: 'visually-relevant claim'}

Official code repository: https://github.com/TIBHannover/MM_Claims

All files were updated on 5th May 2023, with some images removed because of obscene images that were not automatically detected in the first phase.

If you are interested in the binary task on check-worthiness estimation in multimodal claims, you can find the refined dataset with new test data released as part of the CLEF Checkthat! 2023 challenge: https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main

BibTex:

@dataset{Gullal_S_Cheema_and__Sherzod_Hakimov_and__Abdul_Sittar_and__Eric_Müller-Budack_and__Christian_Otto_and__Ralph_Ewerth_2022,
    abstract = {This dataset is introduced by the paper "MM-Claims: A Dataset for Multimodal Claim Detection in Social Media"

If you use this dataset in your work, please cite:

@inproceedings{cheema-etal-2022-mm,
    title = "{MM}-Claims: A Dataset for Multimodal Claim Detection in Social Media",
    author = {Cheema, Gullal Singh  and Hakimov, Sherzod  and Sittar, Abdul  and M{\"u}ller-Budack, Eric  and Otto, Christian and Ewerth, Ralph},
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-naacl.72",
    pages = "962--979"
}

Information about columns in the files:

1. claim_binary: {0: 'Not a claim', 1: 'claim'}

2. claim_three: {0: 'Not a claim', '1': 'claim but not check-worthy', 2: 'check-worthy claim'}

3. claim_vis: {0: 'Not a claim', '1': 'visually-irrelevant claim', 2: 'visually-relevant claim'}

Official code repository: https://github.com/TIBHannover/MM_Claims

**All files were updated on 5th May 2023, with some images removed because of obscene images that were not automatically detected in the first phase.**

**If you are interested in the binary task on check-worthiness estimation in multimodal claims, you can find the refined dataset with new test data released as part of the CLEF Checkthat! 2023 challenge: https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main**},
    author = {Gullal S. Cheema and  Sherzod Hakimov and  Abdul Sittar and  Eric Müller-Budack and  Christian Otto and  Ralph Ewerth},
    doi = {10.25835/2lg7peic},
    institution = {TIB},
    keyword = {'claim detection', 'deep learning', 'fake news', 'multimodality', 'social media', 'twitter'},
    month = {jul},
    publisher = {LUIS},
    title = {MM_Claims Dataset},
    url = {https://service.tib.eu/ldmservice/vdataset/luh-mm_claims},
    year = {2022}
}