Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning

•Deep learning-based auto-segmented contours (DC) can provide significant time savings.•DCs for organs at risk accurately reproduce expert contours.•DCs for target volumes are less accurate but may serve as a template for manual edits. Deep learning-based auto-segmented contours (DC) aim to alleviat...

Full description

Saved in:
Bibliographic Details
Published inRadiotherapy and oncology Vol. 144; pp. 152 - 158
Main Authors Wong, Jordan, Fong, Allan, McVicar, Nevin, Smith, Sally, Giambattista, Joshua, Wells, Derek, Kolbeck, Carter, Giambattista, Jonathan, Gondara, Lovedeep, Alexander, Abraham
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.03.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Deep learning-based auto-segmented contours (DC) can provide significant time savings.•DCs for organs at risk accurately reproduce expert contours.•DCs for target volumes are less accurate but may serve as a template for manual edits. Deep learning-based auto-segmented contours (DC) aim to alleviate labour intensive contouring of organs at risk (OAR) and clinical target volumes (CTV). Most previous DC validation studies have a limited number of expert observers for comparison and/or use a validation dataset related to the training dataset. We determine if DC models are comparable to Radiation Oncologist (RO) inter-observer variability on an independent dataset. Expert contours (EC) were created by multiple ROs for central nervous system (CNS), head and neck (H&N), and prostate radiotherapy (RT) OARs and CTVs. DCs were generated using deep learning-based auto-segmentation software trained by a single RO on publicly available data. Contours were compared using Dice Similarity Coefficient (DSC) and 95% Hausdorff distance (HD). Sixty planning CT scans had 2–4 ECs, for a total of 60 CNS, 53 H&N, and 50 prostate RT contour sets. The mean DC and EC contouring times were 0.4 vs 7.7 min for CNS, 0.6 vs 26.6 min for H&N, and 0.4 vs 21.3 min for prostate RT contours. There were minimal differences in DSC and 95% HD involving DCs for OAR comparisons, but more noticeable differences for CTV comparisons. The accuracy of DCs trained by a single RO is comparable to expert inter-observer variability for the RT planning contours in this study. Use of deep learning-based auto-segmentation in clinical practice will likely lead to significant benefits to RT planning workflow and resources.
ISSN:0167-8140
1879-0887
DOI:10.1016/j.radonc.2019.10.019