Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks

This paper focuses on the issue of recognition of facial emotion expressions in video sequences and proposes an integrated framework of two networks: a local network, and a global network, which are based on local enhanced motion history image (LEMHI) and CNN-LSTM cascaded networks respectively. In...

Full description

Saved in:

Bibliographic Details
Published in	Journal of visual communication and image representation Vol. 59; pp. 176 - 185
Main Authors	Hu, Min, Wang, Haowen, Wang, Xiaohua, Yang, Juan, Wang, Ronggui
Format	Journal Article
Language	English
Published	Elsevier Inc 01.02.2019
Subjects	Facial landmarks LSTM Motion history image Video emotion recognition Video emotion recognition LSTM Motion history image Facial landmarks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper focuses on the issue of recognition of facial emotion expressions in video sequences and proposes an integrated framework of two networks: a local network, and a global network, which are based on local enhanced motion history image (LEMHI) and CNN-LSTM cascaded networks respectively. In the local network, frames from unrecognized video are aggregated into a single frame by a novel method, LEMHI. This approach improves MHI by using detected human facial landmarks as attention areas to boost local value in difference image calculation, so that the action of crucial facial unit can be captured effectively. Then this single frame will be fed into a CNN network for prediction. On the other hand, an improved CNN-LSTM model is used as a global feature extractor and classifier for video facial emotion recognition in the global network. Finally, a random search weighted summation strategy is conducted as late-fusion fashion to final predication. Our work also offers an insight into networks and visible feature maps from each layer of CNN to decipher which portions of the face influence the networks’ predictions. Experiments on the AFEW, CK+ and MMI datasets using subject-independent validation scheme demonstrate that the integrated framework of two networks achieves a better performance than using individual network separately. Compared with state-of-the-arts methods, the proposed framework demonstrates a superior performance.
ISSN:	1047-3203 1095-9076
DOI:	10.1016/j.jvcir.2018.12.039