Performance of the supervised learning algorithms in sex estimation of the proximal femur: A comparative study in contemporary Egyptian and Turkish samples

•Overall and sex-specific accuracies of classifiers are comparable at 0.50 threshold.•Conditional class probabilities vary among classifiers due to different assumptions.•Linear discriminant function is a simple and elegant method for binary classification.•NaïveBayes classified most of cases at 0.9...

Full description

Saved in:

Bibliographic Details
Published in	Science & justice Vol. 62; no. 3; pp. 288 - 309
Main Authors	H. Attia, MennattAllah, H. Attia, Mohamed, Tarek Farghaly, Yasmin, Ahmed El-Sayed Abulnoor, Bassam, Curate, Francisco
Format	Journal Article
Language	English
Published	England Elsevier B.V 01.05.2022
Subjects	Contemporary metapopulations skeletal database Femur sexual dimorphism Forensic anthropology Regional sex estimation standards Supervised machine learning algorithms Regional sex estimation standards Forensic anthropology Supervised machine learning algorithms Contemporary metapopulations skeletal database Femur sexual dimorphism
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Overall and sex-specific accuracies of classifiers are comparable at 0.50 threshold.•Conditional class probabilities vary among classifiers due to different assumptions.•Linear discriminant function is a simple and elegant method for binary classification.•NaïveBayes classified most of cases at 0.95 threshold but calibration is required.•Random forest is the best supervised learning method for sex estimation. Sex estimation standards are population specific however, we argue that machine learning techniques (ML) may enhance the biological sex determination on trans-population application. Linear discriminant analysis (LDA) versus nine ML including quadratic discriminant analysis (QDA), support vector machine (SVM), Decision Tree (DT), Gaussian process (GPC), Naïve Bayesian (NBC), K-Nearest Neighbor (KNN), Random Forest (RFM) and Adaptive boosting (Adaboost) were compared. The experiments involve two contemporary populations: Turkish (n = 300) and Egyptian populations (n = 100) for training and validation, respectively. Base models were calibrated using isotonic and sigmoid calibration schemes. Results were analyzed at posterior probabilities (pp) thresholds >0.95 and >0.80. At pp = 0.5, ML algorithms yielded comparable accuracies in the training (90% to 97%) and test sets (81% to 88%) which are not modified after employing the calibration techniques. At pp >0.95, the raw RFM, LDA, QDA, and SVM models have shown the best performance however, calibration techniques improved the performance of various classifier especially NBC and Adaboost. By contrast, the performance of GPC, KNN, QDA models worsened by calibration. RFM has shown the best performance among all models at both thresholds whereas LDA benefited the best from using both calibration methods at pp >0.80. Complex ML models are not necessarily achieving better performance metrics. LDA and QDA remain the fastest and simplest classifiers. We demonstrated the capability of enhancing sex estimation using ML on an independent population sample however, differences in the underlying probability distribution generated by models were detected which warranted more cautious application by forensic practitioners.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1355-0306 1876-4452
DOI:	10.1016/j.scijus.2022.03.003