The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K unique entities and 276K unique bounding boxes.
BibTex:
Before browse our site, please accept our cookies policy