Multilingual Text Enhancer (MLTE) - A LLaMA2 based Model for Prompt Generation

This research work introduces MLTE - Multilingual Text Enhancer, a text enhancement model developed primarily to the enhance input text for text-to-image generation models. The existing text encoders in image generation models often have limited capabilities, and the quality of image generation depe...

Full description

Saved in:

Bibliographic Details
Published in	2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) pp. 895 - 900
Main Authors	Teja, Nv Sai, Kumar, Kuldeep, Malarvel, Muthukumaran
Format	Conference Proceeding
Language	English
Published	IEEE 05.06.2024
Subjects	Accuracy Boosting Coherence Image synthesis Linguistics LLaMA2 Multilingual Text Processing Natural Language Enhancement Production Text Summarization Text-to-Image Generation Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This research work introduces MLTE - Multilingual Text Enhancer, a text enhancement model developed primarily to the enhance input text for text-to-image generation models. The existing text encoders in image generation models often have limited capabilities, and the quality of image generation depends on the prompt. If the prompt includes misspelled words, the encoder will create an irrelevant image. Most encoders are primarily based on English. MLTE effectively addresses multilingual prompts, misspelled words, overly verbose prompts, and creatively enhances the prompt to achieve improved results. MLTE employs sophisticated natural language processing algorithms to establish a link between unprocessed textual input and the production of highly precise visual content. MLTE is based on LLaMA2, which has the ability to handle numerous languages and enables the simple incorporation of content from diverse linguistic origins. Additionally, its spellchecking and correction functions ensure the quality and coherence of the prompt. Moreover, MLTE's scene and text augmentation features strengthen the visual richness and coherence of generated photos, thereby enhancing their overall quality and realism. Its summarizing capability condenses large paragraphs into concise yet helpful summaries, which assisting the image creation process by delivering more focused inputs. MLTE can be used with any text to image generating models. By undertaking empirical evaluation, this paper demonstrates the effectiveness of MLTE in boosting text quality for text-to-image synthesis tasks, leading to significantly improved image generation results.
DOI:	10.1109/ICAAIC60222.2024.10575122