Healthy Bio-Core: A Framework for Selection of Homogeneous Healthy Biomedical Multivariate Time Series Employing Classification Performance
In biomedical datasets pertaining to disease detection, data typically falls into two classes: healthy and diseased. The diseased cohort often exhibits inherent heterogeneity due to clinical subtyping. Although the healthy cohort is presumed to be homogeneous, it contains heterogeneities arising fro...
Saved in:
Published in | IEEE journal of biomedical and health informatics Vol. 29; no. 7; pp. 5205 - 5218 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.07.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In biomedical datasets pertaining to disease detection, data typically falls into two classes: healthy and diseased. The diseased cohort often exhibits inherent heterogeneity due to clinical subtyping. Although the healthy cohort is presumed to be homogeneous, it contains heterogeneities arising from inter-subject variation, which affects the effectiveness of classification. To address this issue, we propose a novel methodology for multivariate time series data that discerns a homogeneous sub-cohort of healthy samples, referred to as the 'Healthy Bio-Core' (HBC). The employment of HBC augments the discriminative capacity of classification models. The selection process for HBC integrates dynamic time warping (DTW), and the accuracy of the ROCKET (RandOm Convolutional KErnel Transform) classifier, treating the entire time series as a single instance. Empirical results indicate that utilizing HBC enhances classification performance in comparison to utilizing the complete healthy dataset. We substantiate this approach with three classifiers: HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles), MUSE (Multi-variate Unsupervised Symbols and Derivatives), and DTW-NN (DTW with Nearest Neighbor), assessing metrics, such as accuracy, precision, recall, and F1-score. Although our approach relies on DTW, it is limited to cases where a DTW path can be identified; otherwise, another distance metric must be used. Currently, the efficiency depends on the classifier used. Future studies might investigate combining different classifiers for HBC sample selection and devise a method to synthesize their outcomes. Moreover, assuming that the dataset is predominantly healthy may not hold true in contexts with significant noise. Notwithstanding these limitations, our approach results in significant improvements in classification, with average accuracy increases of 5.49%, 14.28%, and 6.16% for the sepsis, gait, and EMO pain datasets, respectively. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2168-2194 2168-2208 2168-2208 |
DOI: | 10.1109/JBHI.2025.3546844 |