From methods to datasets: A survey on Image-Caption Generators

Image - Caption Generator is a popular Artificial Intelligence research tool that works with image comprehension and language definition. Creating well-structured sentences requires a thorough understanding of language in a systematic and semantic way. Being able to describe the substance of an imag...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 83; no. 9; pp. 28077 - 28123
Main Authors	Agarwal, Lakshita, Verma, Bindu
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2024 Springer Nature B.V
Subjects	Artificial intelligence Computer Communication Networks Computer Science Computer vision Data Structures and Information Theory Machine learning Multimedia Information Systems Natural language Natural language processing Recurrent neural networks Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications Image- Caption Generator Deep learning Computer vision Natural language processing Intelligent exploration
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Image - Caption Generator is a popular Artificial Intelligence research tool that works with image comprehension and language definition. Creating well-structured sentences requires a thorough understanding of language in a systematic and semantic way. Being able to describe the substance of an image using well-structured phrases is a difficult undertaking, but it can have a significant impact in terms of assisting visually impaired people in better understanding the images’ content. Image captions has gained a lot of attention as a study subject for various computer vision and natural language processing (NLP) applications. The goal of image captions is to create logical and accurate natural language phrases that describes an image. It relies on the caption model to see items and appropriately characterise their relationships. Intuitively, it is also difficult for a machine to see a typical image in the same way that humans do. It does, however, provide the foundation for intelligent exploration in deep learning. In this review paper, we will focus on the latest in-depth advanced captions techniques for image captioning. This paper highlights related methodologies and focuses on aspects that are crucial in computer recognition, as well as on the numerous strategies and procedures being developed for the development of image captions. It was also observed that Recurrent neural networks (RNNs) are used in the bulk of research works (45%), followed by attention-based models (30%), transformer-based models (15%) and other methods (10%). An overview of the approaches utilised in image captioning research is discussed in this paper. Furthermore, the benefits and drawbacks of these methodologies are explored, as well as the most regularly used data sets and evaluation processes in this sector are being studied.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-16560-x