Human Versus Machine: Comparing a Deep Learning Algorithm to Human Gradings for Detecting Glaucoma on Fundus Photographs

To compare the diagnostic performance of human gradings vs predictions provided by a machine-to-machine (M2M) deep learning (DL) algorithm trained to quantify retinal nerve fiber layer (RNFL) damage on fundus photographs. Evaluation of a machine learning algorithm. An M2M DL algorithm trained with R...

Full description

Saved in:

Bibliographic Details
Published in	American journal of ophthalmology Vol. 211; pp. 123 - 131
Main Authors	Jammal, Alessandro A., Thompson, Atalie C., Mariottoni, Eduardo B., Berchuck, Samuel I., Urata, Carla N., Estrela, Tais, Wakil, Susan M., Costa, Vital P., Medeiros, Felipe A.
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.03.2020 Elsevier Limited
Subjects	Aged Algorithms Area Under Curve Cross-Sectional Studies Datasets Deep Learning Defects Diabetic retinopathy Female Fundus Oculi Glaucoma Glaucoma, Open-Angle - diagnosis Glaucoma, Open-Angle - diagnostic imaging Gonioscopy Humans Intraocular Pressure - physiology Male Middle Aged Nerve Fibers - pathology Neural networks Optic nerve Optic Nerve Diseases - diagnosis Optic Nerve Diseases - diagnostic imaging Optics Photography Physical Examination Retinal Ganglion Cells - pathology Retrospective Studies ROC Curve Software Tomography Tomography, Optical Coherence Vision Disorders - diagnosis Visual Field Tests - methods Visual Fields - physiology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	To compare the diagnostic performance of human gradings vs predictions provided by a machine-to-machine (M2M) deep learning (DL) algorithm trained to quantify retinal nerve fiber layer (RNFL) damage on fundus photographs. Evaluation of a machine learning algorithm. An M2M DL algorithm trained with RNFL thickness parameters from spectral-domain optical coherence tomography was applied to a subset of 490 fundus photos of 490 eyes of 370 subjects graded by 2 glaucoma specialists for the probability of glaucomatous optical neuropathy (GON), and estimates of cup-to-disc (C/D) ratios. Spearman correlations with standard automated perimetry (SAP) global indices were compared between the human gradings vs the M2M DL–predicted RNFL thickness values. The area under the receiver operating characteristic curves (AUC) and partial AUC for the region of clinically meaningful specificity (85%-100%) were used to compare the ability of each output to discriminate eyes with repeatable glaucomatous SAP defects vs eyes with normal fields. The M2M DL–predicted RNFL thickness had a significantly stronger absolute correlation with SAP mean deviation (rho=0.54) than the probability of GON given by human graders (rho=0.48; P < .001). The partial AUC for the M2M DL algorithm was significantly higher than that for the probability of GON by human graders (partial AUC = 0.529 vs 0.411, respectively; P = .016). An M2M DL algorithm performed as well as, if not better than, human graders at detecting eyes with repeatable glaucomatous visual field loss. This DL algorithm could potentially replace human graders in population screening efforts for glaucoma.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 ObjectType-Undefined-3
ISSN:	0002-9394 1879-1891 1879-1891
DOI:	10.1016/j.ajo.2019.11.006