Artificial Intelligence-Based Automated Interpretation of Images of Electrocardiograms: Development and Multinational Validation of ECG-GPT

Timely and accurate assessment of electrocardiograms (ECGs) is crucial for diagnosing, triaging, and clinically managing patients. Current workflows rely on computerized ECG interpretation tools built into ECG signal acquisition systems, which use rule-based algorithms that are unreliable and freque...

Full description

Saved in:
Bibliographic Details
Published inmedRxiv : the preprint server for health sciences
Main Authors Khunte, Akshay, Sangha, Veer, Oikonomou, Evangelos K, Dhingra, Lovedeep S, Aminorroaya, Arya, Coppi, Andreas, Shankar, Sumukh Vasisht, Rockers, Elijah, Mortazavi, Bobak J, Bhatt, Deepak L, Krumholz, Harlan M, Al-Kindi, Sadeer, Nadkarni, Girish N, Vaid, Akhil, Khera, Rohan
Format Journal Article
LanguageEnglish
Published United States 22.04.2025
Online AccessGet more information

Cover

Loading…
More Information
Summary:Timely and accurate assessment of electrocardiograms (ECGs) is crucial for diagnosing, triaging, and clinically managing patients. Current workflows rely on computerized ECG interpretation tools built into ECG signal acquisition systems, which use rule-based algorithms that are unreliable and frequently not available in low-resource settings. We developed and validated a format-independent vision encoder-decoder model - ECG-GPT - that can generate free-text, expert-level interpretations directly from 12-lead ECG images. Using 12-lead ECGs and their corresponding diagnosis statements collected at the Yale-New Haven Health System (YNHHS) between 2000 and 2022, we developed a vision-text transformer model to generate interpretation statements from images of ECGs. Using structured clinical assessment, semantic similarity, and conventional natural language generation metrics, we validated ECG-GPT across 7 geographically distinct health settings. These include (1) 3 large and diverse US health systems, (2) consecutive ECGs from a central reading system in Minas Gerais, Brazil, (3) the prospective cohort study, UK Biobank, (4) a Germany-based, publicly available repository, PTB-XL, and (5) a community hospital in Missouri. Overall, 2.9 million ECGs were used for model development. The model performed well in clinical assessment across 26 extracted labels: for atrial fibrillation, sinus tachycardia, sinus bradycardia, premature atrial contractions, and premature ventricular contractions, AUROCs and AUPRCs ranged from 0.80-0.95 and 0.50-0.86, respectively. For left bundle branch block, right bundle branch block, first degree atrioventricular block, left anterior fascicular block, and left posterior fascicular block, AUROCs and AUPRCs ranged from 0.88-0.96 and 0.23-0.86, respectively. Across all 26 conditions, diagnostic accuracy ranged between 0.93-0.99. ECG-GPT identified the full context of the diagnosis statements with allied conditions. It had a median pairwise cosine similarity of 0.90 (IQR 0.83-0.97), significantly greater than the median baseline similarity of 0.73 (IQR 0.67-0.78, p<0.001). This separation between median pairwise and baseline similarity remained consistent across all 26 condition-specific subsets. The results were comparable across external validation sites. We developed and extensively validated a vision encoder-decoder model that generates expert-level interpretations from ECG images. This represents a scalable and accessible strategy for automated ECG analysis, especially in low-resource settings.
DOI:10.1101/2024.02.17.24302976