A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data

Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducib...

Full description

Saved in:

Bibliographic Details
Published in	Nature genetics Vol. 50; no. 12; pp. 1735 - 1743
Main Authors	Ainscough, Benjamin J., Barnell, Erica K., Ronning, Peter, Campbell, Katie M., Wagner, Alex H., Fehniger, Todd A., Dunn, Gavin P., Uppaluri, Ravindra, Govindan, Ramaswamy, Rohan, Thomas E., Griffith, Malachi, Mardis, Elaine R., Swamidass, S. Joshua, Griffith, Obi L.
Format	Journal Article
Language	English
Published	New York Nature Publishing Group US 01.12.2018 Nature Publishing Group
Subjects	45 631/114/1314 631/114/2785 631/114/794 631/208/514/2184 692/699/67 Agriculture Algorithms Animal Genetics and Genomics Artificial intelligence Automation Biomedical and Life Sciences Biomedicine Breast cancer Cancer Cancer genetics Cancer Research Computer Simulation Data processing Datasets Deep Learning DNA Mutational Analysis - instrumentation DNA Mutational Analysis - methods Electronic Data Processing - methods Gene Function Genetic variation Genomes Genomic analysis Genomics Handbooks High-Throughput Nucleotide Sequencing - instrumentation Human Genetics Humans Learning algorithms Leukemia Machine learning Mutation Neoplasms - genetics Polymorphism, Single Nucleotide Quality standards Reproducibility of Results Sequence Analysis, DNA - instrumentation Sequence Analysis, DNA - methods Software technical-report Technology application Tumors
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability. A machine learning approach for refinement of somatic variant calls automates this process and reduces bias stemming from inter-reviewer variability.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23 Author contributions B.J.A. designed the study, assembled and cleaned training data, performed feature engineering, designed model architecture, tuned hyperparameters, performed model training and analysis, performed manual review, assembled validation data, wrote code, created figures, and wrote the manuscript. E.K.B. designed the study, performed manual review, performed model training and analysis, performed clinical data analysis, assembled validation data, wrote code, created figures, and wrote the manuscript. P.R. and K.M.C. wrote code, performed manual review, and edited the manuscript. A.H.W. wrote code. T.E.R., R.G., R.U., G.P.D, and T.A.F. shared genomic data that was used in training the model and revised the paper. M.G., E.R.M., S.J.S., and O.L.G. designed the study, supervised the project and revised the paper.
ISSN:	1061-4036 1546-1718 1546-1718
DOI:	10.1038/s41588-018-0257-y