CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans

doi:doi:10.57702/s8xnnmwe

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans

Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal dataset of CT images and reports.

BibTex:

@dataset{Jingyang_Lin_and_Yingda_Xia_and_Jianpeng_Zhang_and_Ke_Yan_and_Le_Lu_and_Jiebo_Luo_and_Ling_Zhang_2024,
    abstract = {Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal dataset of CT images and reports.},
    author = {Jingyang Lin and Yingda Xia and Jianpeng Zhang and Ke Yan and Le Lu and Jiebo Luo and Ling Zhang},
    doi = {10.57702/s8xnnmwe},
    institution = {No Organization},
    keyword = {'CT scans', 'Medical images', 'Radiology reports', 'Text descriptions'},
    month = {dec},
    publisher = {TIB},
    title = {CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans},
    url = {https://service.tib.eu/ldmservice/dataset/ct-glip--3d-grounded-language-image-pretraining-with-ct-scans},
    year = {2024}
}