Machine learning in medicine: a practical introduction

Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a p...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical research methodology Vol. 19; no. 1; p. 64
Main Authors	Sidey-Gibbons, Jenni A M, Sidey-Gibbons, Chris J
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 19.03.2019 BioMed Central BMC
Subjects	Accuracy Algorithms Archives & records Artificial intelligence Artificial neural networks Big Data Breast cancer Breast Neoplasms - diagnosis Cancer diagnosis Classification Computer aided medical diagnosis Computer-assisted Diabetes Diagnosis Diagnosis, Computer-Assisted - methods Female Humans Image processing equipment Informatics Linear programming Machine Learning Medical diagnosis Medical informatics Medical research Medical researchers Methods Natural language processing Neural networks Neural Networks, Computer Open source software Principal components analysis Programming languages Researchers Science Sensitivity and Specificity Software Statistical methods Supervised machine learning Support Vector Machine United States Supervised machine learning Computer-assisted Decision making Classification Diagnosis Programming languages Medical informatics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data. We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment. The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble. We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2288 1471-2288
DOI:	10.1186/s12874-019-0681-4