FeSTwo, a two-step feature selection algorithm based on feature engineering and sampling for the chronological age regression problem

Accurate determination of the sample's chronological age is an important forensic problem. This regression problem may be improved by selecting appropriate methylomic features. Most of the existing feature selection algorithms, however, optimize the regression performance by considering only th...

Full description

Saved in:
Bibliographic Details
Published inComputers in biology and medicine Vol. 125; p. 104008
Main Authors Wei, Zhipeng, Ding, Shiying, Duan, Meiyu, Liu, Shuai, Huang, Lan, Zhou, Fengfeng
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.10.2020
Elsevier Limited
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Accurate determination of the sample's chronological age is an important forensic problem. This regression problem may be improved by selecting appropriate methylomic features. Most of the existing feature selection algorithms, however, optimize the regression performance by considering only the original features. This study proposed four feature engineering strategies to transform the original methylomic features. The regression performance of the age regression model was improved by the resampling-based feature selection algorithm FeSTwo proposed in this study. FeSTwo outperformed the parallel algorithms used in the previous studies even with the electronic health record data. The age prediction performance of the FeSTwo-detected features was also confirmed for another independent dataset. The study results demonstrated that the proposed model, FeSTwo, led to a more than 8% reduction in root-mean-square error (RMSE) on the test dataset with only 70 features. •A two-step feature selection algorithm for methylomic biomarkers of age prediction.•Optimization of age prediction using feature engineering and sampling.•Gender variation, feature interaction, and squaring for engineering features.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0010-4825
1879-0534
DOI:10.1016/j.compbiomed.2020.104008