Cross-Modality Feature Learning Through Generic Hierarchical Hyperlingual-Words
Recognizing facial images captured under visible light has long been discussed in the past decades. However, there are many impact factors that hinder its successful application in real-world, e.g., illumination, pose variations. Recent work has concentrated on different spectrals, i.e., near infrar...
Saved in:
Published in | IEEE transaction on neural networks and learning systems Vol. 28; no. 2; pp. 451 - 463 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.02.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recognizing facial images captured under visible light has long been discussed in the past decades. However, there are many impact factors that hinder its successful application in real-world, e.g., illumination, pose variations. Recent work has concentrated on different spectrals, i.e., near infrared, that can only be perceived by specifically designed device to avoid the illumination problem. However, this inevitably introduces a new problem, namely, cross-modality classification. In brief, images registered in the system are in one modality, while images that captured momentarily used as the tests are in another modality. In addition, there could be many within-modality variations-pose and expression-leading to a more complicated problem for the researchers. To address this problem, we propose a novel framework called hierarchical hyperlingual-words (Hwords) in this paper. First, we design a novel structure, called generic Hwords, to capture the high-level semantics across different modalities and within each modality in weakly supervised fashion, meaning only modality pair and variations information are needed in the training. Second, to improve the discriminative power of Hwords, we propose a novel distance metric through the hierarchical structure of Hwords. Extensive experiments on multimodality face databases demonstrate the superiority of our method compared with the state-of-the-art works on face recognition tasks subject to pose and expression variations. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 2162-237X 2162-2388 2162-2388 |
DOI: | 10.1109/TNNLS.2016.2517014 |