ATR dataset

Human parsing has recently attracted a huge amount of interests and achieved great progress with the advance of deep convolutional neural networks and large-scale datasets. Most of the prior works focus on developing new structures and auxiliary information guidance to improve general feature representation, such as dilated convolution, LSTM structure, encoder-decoder architecture, and human pose constraints. Although these methods show promising results on each human parsing dataset, they directly use one flat prediction layer to classify all labels, which disregards the intrinsic semantic correlations across concepts and utilize the annotations in an inefficient way.

BibTex: