POPE

The dataset used in this paper is a multimodal large language model (LLaMM) dataset, specifically POPE, which consists of 7 billion parameters and is used for multimodal tasks such as object detection and image captioning.

BibTex: