Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation

The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic variations,...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 15; no. 1; pp. 27619 - 10
Main Authors Lee, Chanseo, Kumar, Sonu, Vogt, Kimon A., Munshi, Muhammad, Tallapudi, Panindhra, Vogt, Antonia, Awad, Hamzeh, Khan, Wasim
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 29.07.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic variations, and diglossia—present significant challenges for foundational large language models (LLMs), especially in domain-specific tasks like medical summarization. This study introduces AraSum, a domain-specific AI agent built using a novel knowledge distillation framework that transforms large multilingual LLMs into lightweight, task-optimized small language models (SLMs). Leveraging a synthetic dataset of Arabic medical dialogues, AraSum demonstrates superior performance over JAIS-30B, a foundational Arabic LLM, across key evaluation metrics, including BLEU and ROUGE scores. AraSum also outperforms JAIS in Arabic-speaking evaluator assessments of accuracy, comprehensiveness, and clinical utility while maintaining comparable linguistic performance as measured by a modified PDQI-9 inventory. Beyond accuracy, AraSum achieves these results with significantly lower computational and environmental costs, demonstrating the feasibility of deploying resource-efficient AI models in low-resource settings for domain-specific tasks. This work underscores the potential of SLM-based agentic architectures for advancing multilingual healthcare, encouraging sustainable artificial intelligence, and fostering equity in access to care.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-025-10451-x