Leveraging multiple cues for recognizing family photos

Social relation analysis via images is a new research area that has attracted much interest recently. As social media usage increases, a wide variety of information can be extracted from the growing number of consumer photos shared online, such as the category of events captured or the relationships...

Full description

Saved in:

Bibliographic Details
Published in	Image and vision computing Vol. 58; pp. 61 - 75
Main Authors	Wang, Xiaolong, Guo, Guodong, Merler, Michele, C. F. Codella, Noel, MV, Rohith, Smith, John R., Kambhamettu, Chandra
Format	Journal Article
Language	English
Published	Elsevier B.V 01.02.2017
Subjects	Family photo recognition Group photo analysis Semantics Social media Family photo recognition Semantics Social media Group photo analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Social relation analysis via images is a new research area that has attracted much interest recently. As social media usage increases, a wide variety of information can be extracted from the growing number of consumer photos shared online, such as the category of events captured or the relationships between individuals in a given picture. Family is one of the most important units in our society, thus categorizing family photos constitutes an essential step toward image-based social analysis and content-based retrieval of consumer photos. We propose an approach that combines multiple unique and complimentary cues for recognizing family photos. The first cue analyzes the geometric arrangement of people in the photograph, which characterizes scene-level information with efficient yet discriminative capability. The second cue models facial appearance similarities to capture and quantify relevant pairwise relations between individuals in a given photo. The last cue investigates the semantics of the context in which the photo was taken. Experiments on a dataset containing thousands of family and non-family pictures collected from social media indicate that each individual model produces good recognition results. Furthermore, a combined approach incorporating appearance, geometric and semantic features significantly outperforms the state of the art in this domain, achieving 96.7% classification accuracy. •A new geometry feature is proposed to capture people's standing pattern at the scene level.•Deep convolutional neural network is incorporated into appearance model to capture facial similarities of the group photo.•Semantic information is applied and fused with other information to discriminant two different photo categories.
ISSN:	0262-8856 1872-8138
DOI:	10.1016/j.imavis.2016.07.006