Estimating individual minimum calibration for deep-learning with predictive performance recovery: An example case of gait surface classification from wearable sensor gait data

Clinical datasets often comprise multiple data points or trials sampled from a single participant. When these datasets are used to train machine learning models, the method used to extract train and test sets must be carefully chosen. Using the standard machine learning approach (random-wise split),...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomechanics Vol. 154; p. 111606
Main Authors Lam, Guillaume, Rish, Irina, Dixon, Philippe C.
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.06.2023
Elsevier Limited
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Clinical datasets often comprise multiple data points or trials sampled from a single participant. When these datasets are used to train machine learning models, the method used to extract train and test sets must be carefully chosen. Using the standard machine learning approach (random-wise split), different trials from the same participant may appear in both training and test sets. This has led to schemes capable of segregating data points from a same participant into a single set (subject-wise split). Past investigations have demonstrated that models trained in this manner underperform compared to those trained using random-split schemes. Additional training of models via a small subset of trials, known as calibration, bridges the gap in performance across split schemes; however, the amount of calibration trials required to achieve strong model performance is unclear. Thus, this study aims to investigate the relationship between calibration training set size and prediction accuracy on the calibration test set. A database of 30 young, healthy adults performing multiple walking trials across nine different surfaces while fit with inertial measurement unit sensors on the lower limbs was used to develop a deep-learning classifier. For subject-wise trained models, calibration on a single gait cycle per surface yielded a 70% increase in F1-score, the harmonic mean of precision and recall, while 10 gait cycles per surface were sufficient to match the performance of a random-wise trained model. Code to generate calibration curves may be found at (https://github.com/GuillaumeLam/PaCalC). •In machine learning, data are split into training and testing sets.•Random-wise split distributes participant data (trials) across both sets.•Subject-wise split ensures trials from a given participant are present in only 1 set.•Calibration improves performance of models trained with a subject-wise split.•Ten gait trials per surface are needed to match performance of random-wise split.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0021-9290
1873-2380
1873-2380
DOI:10.1016/j.jbiomech.2023.111606