A large-scale dataset containing 5 million RS images with English descriptions by filtering the image-text pair dataset and generating captions for RS images.
Remote sensing images have different shooting angles and methods compared to ordinary ones, which makes remote sensing images play an irreplaceable role in some areas.