A progressive attention-based cross-modal fusion network for cardiovascular disease detection using synchronized electrocardiogram and phonocardiogram signals

Synchronized electrocardiogram (ECG) and phonocardiogram (PCG) signals provide complementary diagnostic insights crucial for improving the accuracy of cardiovascular disease (CVD) detection. However, existing deep learning methods often utilize single-modal data or employ simplistic early or late fu...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ. Computer science Vol. 11; p. e3038
Main Authors	Li, Wei Peng, Chuah, Joon Huang, Tan, Guo Jeng, Liu, Chengyu, Ting, Hua-Nong
Format	Journal Article
Language	English
Published	PeerJ. Ltd 25.07.2025 PeerJ Inc
Subjects	Cardiovascular diseases Channel attention Electrocardiogram Electrocardiogram (ECG) Electrocardiography Heart beat Multi-modality Phonocardiogram (PCG) Spatial attention
Online Access	Get full text
ISSN	2376-5992 2376-5992
DOI	10.7717/peerj-cs.3038

Cover

Loading…

More Information
Summary:	Synchronized electrocardiogram (ECG) and phonocardiogram (PCG) signals provide complementary diagnostic insights crucial for improving the accuracy of cardiovascular disease (CVD) detection. However, existing deep learning methods often utilize single-modal data or employ simplistic early or late fusion strategies, which inadequately capture the complex, hierarchical interdependencies between these modalities, thereby limiting detection performance. This study introduces PACFNet, a novel progressive attention-based cross-modal feature fusion network, for end-to-end CVD detection. PACFNet features a three-branch architecture: two modality-specific encoders for ECG and PCG, and a progressive selective attention-based cross-modal fusion encoder. A key innovation is its four-layer progressive fusion mechanism, which integrates multi-modal information from low-level morphological details to high-level semantic representations. This is achieved by selective attention-based cross-modal fusion (SACMF) modules at each progressive level, employing cascaded spatial and channel attention to dynamically emphasize salient feature contributions across modalities, thus significantly enhancing feature learning. Signals are pre-processed using a beat-to-beat segmentation approach to analyze individual cardiac cycles. Experimental validation on the public PhysioNet 2016 dataset demonstrates PACFNet’s state-of-the-art performance, with an accuracy of 97.7%, sensitivity of 98%, specificity of 97.3%, and an F1-score of 99.7%. Notably, PACFNet not only excels in multi-modal settings but also maintains robust diagnostic capabilities even with missing modalities, underscoring its practical effectiveness and reliability. The source code is publicly available on Zenodo ( https://zenodo.org/records/15450169 ).
ISSN:	2376-5992 2376-5992
DOI:	10.7717/peerj-cs.3038