Show-and-Tell

doi:doi:10.57702/krqufe4w

Show-and-Tell

Visual language grounding is widely studied in modern neural networks, which typically adopts an encoder-decoder framework consisting of a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for language caption generation.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Cho-Jui Hsieh (2024). Dataset: Show-and-Tell. https://doi.org/10.57702/krqufe4w

DOI retrieved: December 17, 2024

Additional Info

Field	Value
Created	December 17, 2024
Last update	December 17, 2024
Defined In	https://doi.org/10.48550/arXiv.1712.02051
Author	Hongge Chen
More Authors	Huan Zhang Pin-Yu Chen Jinfeng Yi Cho-Jui Hsieh
Homepage	https://arxiv.org/abs/1506.05981