A Composite Network Model for Face Super-Resolution with Multi-Order Head Attention Facial Priors
•The proposed composite network model seamlessly integrates the advantages of DCNNs and transformers to super-resolve LR face images.•The proposed Multi-Order Head Attention Network not only captures spatial and channel dependencies of facial priors, but it also models 2D information of face images....
Saved in:
Published in | Pattern recognition Vol. 139; p. 109503 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.07.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •The proposed composite network model seamlessly integrates the advantages of DCNNs and transformers to super-resolve LR face images.•The proposed Multi-Order Head Attention Network not only captures spatial and channel dependencies of facial priors, but it also models 2D information of face images.•The proposed model demonstrates competitive recovery performance in terms of visual results and quantitative evaluation, when compared with state-of-the-art FSR methods.
Face super-resolution (FSR) aims to reconstruct high-resolution face images from low-resolution (LR) ones. Despite the progress made by deep convolutional neural networks (DCNNs) on FSR, convolutions struggle to relate spatially distant concepts and what is more, all image pixels and prior information (e.g., landmarks and facial component heatmaps) are treated equally regardless of importance, causing inaccuracy and decreasing the quality of face image recovery. To address these issues, in this paper we propose a composite network model for FSR with multi-order head attention facial priors. The proposed model contains a face hallucination transformer (FHT)-based network and a multi-order head attention (MOHA)-based DCNN. The FHT-based network can capture long-range dependencies and gradually increase resolution to achieve efficient and effective inference, while the MOHA-based DCNN exploits detailed and two-dimensional information of LR face images. Moreover, the novel generic submodule of the MOHA-based DCNN, namely Multi-Order Head Attention Network, can accurately model the relationship of facial components between spatial and channel dimensions. The proposed composite network model seamlessly integrates the advantages of DCNNs and transformers to super-resolve LR face images. When compared with state-of-the-art FSR methods on public benchmark datasets, the proposed model shows competitive recovery performance. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2023.109503 |