Healthy Bio-Core: A Framework for Selection of Homogeneous Healthy Biomedical Multivariate Time Series Employing Classification Performance

In biomedical datasets pertaining to disease detection, data typically falls into two classes: healthy and diseased. The diseased cohort often exhibits inherent heterogeneity due to clinical subtyping. Although the healthy cohort is presumed to be homogeneous, it contains heterogeneities arising fro...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of biomedical and health informatics Vol. 29; no. 7; pp. 5205 - 5218
Main Authors	Patharkar, Abhidnya, Al-Hindawi, Firas, Wu, Teresa
Format	Journal Article
Language	English
Published	United States IEEE 01.07.2025
Subjects	Accuracy Algorithms Bioinformatics Biomedical Data Classification algorithms data heterogeneity data homogeneity Data mining Databases, Factual Diseases DTW DTW-NN Gaussian distribution HIVE-COTE Humans Measurement Medical Informatics - methods Multivariate Analysis multivariate time series classification MUSE ROCKET Rockets Time series analysis Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In biomedical datasets pertaining to disease detection, data typically falls into two classes: healthy and diseased. The diseased cohort often exhibits inherent heterogeneity due to clinical subtyping. Although the healthy cohort is presumed to be homogeneous, it contains heterogeneities arising from inter-subject variation, which affects the effectiveness of classification. To address this issue, we propose a novel methodology for multivariate time series data that discerns a homogeneous sub-cohort of healthy samples, referred to as the 'Healthy Bio-Core' (HBC). The employment of HBC augments the discriminative capacity of classification models. The selection process for HBC integrates dynamic time warping (DTW), and the accuracy of the ROCKET (RandOm Convolutional KErnel Transform) classifier, treating the entire time series as a single instance. Empirical results indicate that utilizing HBC enhances classification performance in comparison to utilizing the complete healthy dataset. We substantiate this approach with three classifiers: HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles), MUSE (Multi-variate Unsupervised Symbols and Derivatives), and DTW-NN (DTW with Nearest Neighbor), assessing metrics, such as accuracy, precision, recall, and F1-score. Although our approach relies on DTW, it is limited to cases where a DTW path can be identified; otherwise, another distance metric must be used. Currently, the efficiency depends on the classifier used. Future studies might investigate combining different classifiers for HBC sample selection and devise a method to synthesize their outcomes. Moreover, assuming that the dataset is predominantly healthy may not hold true in contexts with significant noise. Notwithstanding these limitations, our approach results in significant improvements in classification, with average accuracy increases of 5.49%, 14.28%, and 6.16% for the sepsis, gait, and EMO pain datasets, respectively.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2168-2194 2168-2208 2168-2208
DOI:	10.1109/JBHI.2025.3546844