You're currently viewing an old version of this dataset. To see the current version, click here.

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans

Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal dataset of CT images and reports.

Data and Resources

This dataset has no data

Cite this as

Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang (2024). Dataset: CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans. https://doi.org/10.57702/s8xnnmwe

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Defined In https://doi.org/10.48550/arXiv.2404.15272
Author Jingyang Lin
More Authors
Yingda Xia
Jianpeng Zhang
Ke Yan
Le Lu
Jiebo Luo
Ling Zhang