Cross-Modality Feature Learning Through Generic Hierarchical Hyperlingual-Words

Recognizing facial images captured under visible light has long been discussed in the past decades. However, there are many impact factors that hinder its successful application in real-world, e.g., illumination, pose variations. Recent work has concentrated on different spectrals, i.e., near infrar...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 28; no. 2; pp. 451 - 463
Main Authors	Shao, Ming, Fu, Yun
Format	Journal Article
Language	English
Published	United States IEEE 01.02.2017 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Cross-modality face recognition Face Face recognition Feature recognition Histograms hyperlingual-words (Hwords) Illumination Image classification Learning systems Light Measurement near infrared (NIR) Near infrared radiation Object recognition Pattern recognition Semantics State of the art Structural hierarchy Training Variation Visualization weighted distance metric
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recognizing facial images captured under visible light has long been discussed in the past decades. However, there are many impact factors that hinder its successful application in real-world, e.g., illumination, pose variations. Recent work has concentrated on different spectrals, i.e., near infrared, that can only be perceived by specifically designed device to avoid the illumination problem. However, this inevitably introduces a new problem, namely, cross-modality classification. In brief, images registered in the system are in one modality, while images that captured momentarily used as the tests are in another modality. In addition, there could be many within-modality variations-pose and expression-leading to a more complicated problem for the researchers. To address this problem, we propose a novel framework called hierarchical hyperlingual-words (Hwords) in this paper. First, we design a novel structure, called generic Hwords, to capture the high-level semantics across different modalities and within each modality in weakly supervised fashion, meaning only modality pair and variations information are needed in the training. Second, to improve the discriminative power of Hwords, we propose a novel distance metric through the hierarchical structure of Hwords. Extensive experiments on multimodality face databases demonstrate the superiority of our method compared with the state-of-the-art works on face recognition tasks subject to pose and expression variations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2016.2517014