Democratizing cost-effective, agentic artificial intelligence to multilingual medical summarization through knowledge distillation

The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic variations,...

Full description

Saved in:

Bibliographic Details
Published in	Scientific reports Vol. 15; no. 1; pp. 27619 - 10
Main Authors	Lee, Chanseo, Kumar, Sonu, Vogt, Kimon A., Munshi, Muhammad, Tallapudi, Panindhra, Vogt, Antonia, Awad, Hamzeh, Khan, Wasim
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 29.07.2025 Nature Publishing Group Nature Portfolio
Subjects	639/705/1046 639/705/117 639/705/794 692/700 692/700/3934 Accuracy AI Agents Artificial Intelligence Clinical Documentation Cost-Benefit Analysis Datasets Distillation Documentation Health care Human subjects Humanities and Social Sciences Humans Knowledge Knowledge Distillation Language Large language models Learning multidisciplinary Multilingualism Personal information Regularization methods Science Science (multidisciplinary) Small Language Models (SLMs) Sustainability in AI Terminology Verbal communication Knowledge Distillation Sustainability in AI Small Language Models (SLMs) Clinical Documentation AI Agents Artificial Intelligence
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The increasing demand for multilingual capabilities in healthcare technology highlights the critical need for AI solutions capable of handling underrepresented languages, such as Arabic, in clinical documentation. Arabic’s unique linguistic complexities—morphological richness, syntactic variations, and diglossia—present significant challenges for foundational large language models (LLMs), especially in domain-specific tasks like medical summarization. This study introduces AraSum, a domain-specific AI agent built using a novel knowledge distillation framework that transforms large multilingual LLMs into lightweight, task-optimized small language models (SLMs). Leveraging a synthetic dataset of Arabic medical dialogues, AraSum demonstrates superior performance over JAIS-30B, a foundational Arabic LLM, across key evaluation metrics, including BLEU and ROUGE scores. AraSum also outperforms JAIS in Arabic-speaking evaluator assessments of accuracy, comprehensiveness, and clinical utility while maintaining comparable linguistic performance as measured by a modified PDQI-9 inventory. Beyond accuracy, AraSum achieves these results with significantly lower computational and environmental costs, demonstrating the feasibility of deploying resource-efficient AI models in low-resource settings for domain-specific tasks. This work underscores the potential of SLM-based agentic architectures for advancing multilingual healthcare, encouraging sustainable artificial intelligence, and fostering equity in access to care.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-025-10451-x