Expression-Preserving Face Frontalization Improves Visually Assisted Speech Processing

Face frontalization consists of synthesizing a frontal view from a profile one. This paper proposes a frontalization method that preserves non-rigid facial deformations, i.e. facial expressions. It is shown that expression-preserving frontalization boosts the performance of visually assisted speech...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of computer vision Vol. 131; no. 5; pp. 1122 - 1140
Main Authors Kang, Zhiqi, Sadeghi, Mostafa, Horaud, Radu, Alameda-Pineda, Xavier
Format Journal Article
LanguageEnglish
Published New York Springer US 01.05.2023
Springer
Springer Nature B.V
Springer Verlag
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Face frontalization consists of synthesizing a frontal view from a profile one. This paper proposes a frontalization method that preserves non-rigid facial deformations, i.e. facial expressions. It is shown that expression-preserving frontalization boosts the performance of visually assisted speech processing. The method alternates between the estimation of (i) the rigid transformation (scale, rotation, and translation) and (ii) the non-rigid deformation between an arbitrarily-viewed face and a face model. The method has two important merits: it can deal with non-Gaussian errors in the data and it incorporates a dynamical face deformation model. For that purpose, we use the Student’s t-distribution in combination with a Bayesian filter in order to account for both rigid head motions and time-varying facial deformations, e.g. caused by speech production. The zero-mean normalized cross-correlation score is used to evaluate the ability of the method to preserve facial expressions. The method is thoroughly evaluated and compared with several state of the art methods, either based on traditional geometric models or on deep learning. Moreover, we show that the method, when incorporated into speech processing pipelines, improves word recognition rates and speech intelligibility scores by a considerable margin.
ISSN:0920-5691
1573-1405
DOI:10.1007/s11263-022-01742-1