Leveraging facial expressions as emotional context in image captioning

Image captioning has emerged as a prominent approach for generating verbal descriptions of images that humans can read and understand. Numerous techniques and models in this domain have predominantly focused on analyzing the factual elements present within an image, employing convolutional neural ne...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 83; no. 30; pp. 75195 - 75216
Main Authors	Das, Riju, Wu, Nan, Dev, Soumyabrata
Format	Journal Article
Language	English
Published	New York Springer US 01.09.2024 Springer Nature B.V
Subjects	Artificial neural networks Computer Communication Networks Computer Science Computer vision Data Structures and Information Theory Datasets Emotions Human subjects Multimedia Multimedia Information Systems Natural language Neural networks Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications Image captioning Facial cues Facial expression Facial emotion recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Image captioning has emerged as a prominent approach for generating verbal descriptions of images that humans can read and understand. Numerous techniques and models in this domain have predominantly focused on analyzing the factual elements present within an image, employing convolutional neural networks (CNN) and long short-term memory (LSTM) networks to generate captions. However, an inherent limitation of these existing approaches is their failure to consider the emotional aspects exhibited by the main subject within an image, thereby potentially leading to inaccuracies in reflecting the conveyed emotional content. Acknowledging this limitation, this paper endeavors to construct an improved model dedicated to extracting human emotions from images and seamlessly embedding emotional attributes into the accompanying captions. In our research, we employ the widely accessible benchmarking image captioning dataset, Flickr8k. Our ultimate objective is to establish a more appropriate and impactful model for images containing human faces that provide more accurate and impacting captions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-17904-3