Visual Story-Telling dataset (VIST)
Visual Story-Telling dataset (VIST) is the only publicly accessible dataset for storytelling problems. It comprises 210,819 distinct images that can be found in 10,117 different albums on Flickr and is arranged in sets of five different images.
BibTex: