Attention-Based Pedestrian Attribute Analysis

Recognizing the pedestrian attributes in surveillance scenes is an inherently challenging task, especially for the pedestrian images with large pose variations, complex backgrounds, and various camera viewing angles. To select important and discriminative regions or pixels against the variations, th...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 28; no. 12; pp. 6126 - 6140
Main Authors	Tan, Zichang, Yang, Yang, Wan, Jun, Hang, Hanyuan, Guo, Guodong, Li, Stan Z.
Format	Journal Article
Language	English
Published	United States IEEE 01.12.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Aggregates attention mechanism Biometrics (access control) Deep learning Feature extraction Image recognition Learning Pedestrian attribute analysis pedestrian parsing Pixels Semantics Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recognizing the pedestrian attributes in surveillance scenes is an inherently challenging task, especially for the pedestrian images with large pose variations, complex backgrounds, and various camera viewing angles. To select important and discriminative regions or pixels against the variations, three attention mechanisms are proposed, including parsing attention, label attention, and spatial attention. Those attentions aim at accessing effective information by considering problems from different perspectives. To be specific, the parsing attention extracts discriminative features by learning not only where to turn attention to but also how to aggregate features from different semantic regions of human bodies, e.g., head and upper body. The label attention aims at targetedly collecting the discriminative features for each attribute. Different from the parsing and label attention mechanisms, the spatial attention considers the problem from a global perspective, aiming at selecting several important and discriminative image regions or pixels for all attributes. Then, we propose a joint learning framework formulated in a multi-task-like way with these three attention mechanisms learned concurrently to extract complementary and correlated features. This joint learning framework is named Joint Learning of Parsing attention, Label attention, and Spatial attention for Pedestrian Attributes Analysis (JLPLS-PAA, for short). Extensive comparative evaluations conducted on multiple large-scale benchmarks, including PA-100K, RAP, PETA, Market-1501, and Duke attribute datasets, further demonstrate the effectiveness of the proposed JLPLS-PAA framework for pedestrian attribute analysis.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1057-7149 1941-0042 1941-0042
DOI:	10.1109/TIP.2019.2919199