Classification of Short-Segment Pediatric Heart Sounds Based on a Transformer-Based Convolutional Neural Network

Congenital heart diseases (CHDs), caused by structural abnormalities in the heart and blood vessels, pose a significant public health concern and contribute significantly to the socioeconomic burden, particularly in pediatric populations. Phonocardiograms (PCGs), as a non-invasive and cost-effective...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 13; pp. 93852 - 93868
Main Authors Hassanuzzaman, Md, Ghosh, Samit Kumar, Hasan, Mohammad Nurul Akhtar, Mamun, Mohammad Abdullah Al, Ahmed, Khawza I., Mostafa, Raqibul, Khandoker, Ahsan H.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Congenital heart diseases (CHDs), caused by structural abnormalities in the heart and blood vessels, pose a significant public health concern and contribute significantly to the socioeconomic burden, particularly in pediatric populations. Phonocardiograms (PCGs), as a non-invasive and cost-effective diagnostic modality, capture vital acoustic signals that reflect the mechanical activity of the heart and can reveal pathological patterns associated with various CHD types. This study investigates the minimum signal duration required for accurate automatic classification of heart sounds and evaluates signal quality using the root mean square of successive differences (RMSSD) and the zero-crossing rate (ZCR). Mel-frequency cepstral coefficients (MFCCs) are extracted as features and fed into a transformer-based residual one-dimensional convolutional neural network (1D-CNN) for classification. Experimental results show that a threshold of 0.4 for RMSSD and ZCR yields optimal classification performance, with a minimum signal length of 5 seconds required for reliable results. Shorter segments (3 seconds) lack sufficient diagnostic information, while longer segments (15 seconds) may introduce additional noise. The proposed model achieves a maximum classification accuracy of 93.69% with 5-second signals.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3573870