Image Captioning Using Scene Graph Generation
Understanding and generating natural language descriptions from images is a fundamental challenge in vision-language tasks within artificial intelligence. This paper introduces a novel image captioning framework that integrates scene graph generation to improve the semantic richness of generated cap...
Saved in:
Published in | 2025 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET) pp. 1 - 5 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
20.03.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Understanding and generating natural language descriptions from images is a fundamental challenge in vision-language tasks within artificial intelligence. This paper introduces a novel image captioning framework that integrates scene graph generation to improve the semantic richness of generated captions. The proposed method employs the Relation Transformer (RelTR) model to extract structural representations from visual scenes in the form of subject-predicate-object triplets. A transformer-based captioning model then utilizes these structured scene graphs to produce fluent and contextually accurate captions. Experimental evaluations on the Visual Genome dataset demonstrate that our approach yields superior semantic coherence and captioning accuracy compared to traditional image-to-text models. The incorporation of relational scene understanding results in captions that are more contextually informed and descriptive. |
---|---|
DOI: | 10.1109/WiSPNET64060.2025.11005338 |