-
CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks
3D Convolution Neural Networks (CNNs) have been widely applied to 3D scene understanding, such as video analysis and volumetric image recognition. -
Places dataset
The Places dataset is a large-scale dataset for scene recognition, containing 1 million images from 365 categories. -
Visual Genome
The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.