Attentive Hierarchical Label Sharing for Enhanced Garment and Attribute Classification of Fashion Imagery

Fine-grained information extraction from fashion imagery is a challenging task due to the inherent diversity and complexity of fashion categories and attributes. Additionally, fashion imagery often depict multiple items while fashion items tend to follow hierarchical relations among various object t...

Full description

Saved in:

Bibliographic Details
Published in	Recommender Systems in Fashion and Retail pp. 95 - 115
Main Authors	Papadopoulos, Stefanos-Iordanis, Koutlis, Christos, Sudheer, Manjunath, Pugliese, Martina, Rabiller, Delphine, Papadopoulos, Symeon, Kompatsiaris, Ioannis
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing
Series	Lecture Notes in Electrical Engineering
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Fine-grained information extraction from fashion imagery is a challenging task due to the inherent diversity and complexity of fashion categories and attributes. Additionally, fashion imagery often depict multiple items while fashion items tend to follow hierarchical relations among various object types, categories, and attributes. In this study, we address both issues with a 2-step hierarchical deep learning pipeline consisting of (1) a low granularity object type detection module (upper body, lower body, full-body, footwear) and (2) two classification modules for garment categories and attributes based on the outcome of the first step. For the category and attribute-level classification stages, we examine a hierarchical label sharing (HLS) technique in two settings: (1) single-task learning (STL w/ HLS) and (2) multi-task learning with RNN and visual attention (MTL w/ RNN+VA). Our approach enables progressively focusing on appropriately detailed features for automatically learning the hierarchical relations of fashion and enabling predictions on images with complete outfits. Empirically, STL w/ HLS reached 93.99% top-3 accuracy while MTL w/ RNN+VA reached 97.57% top-5 accuracy for category classification on the DeepFashion benchmark, surpassing the current state of the art without requiring landmark or mask annotations nor specialized domain expertise.
ISBN:	9783030940157 3030940152
ISSN:	1876-1100 1876-1119
DOI:	10.1007/978-3-030-94016-4_7