Devil in the Number: Towards Robust Multi-modality Data Filter
The dataset used in the paper is a web-scale dataset for training a vision-language model. The dataset contains text-image pairs, and the authors propose a novel filter to remove redundant information such as numbers and bracketed content.
BibTex: