Reducing variability of breast cancer subtype predictors by grounding deep learning models in prior knowledge

Deep learning neural networks have improved performance in many cancer informatics problems, including breast cancer subtype classification. However, many networks experience underspecificationwheremultiplecombinationsofparametersachievesimilarperformance, bothin training and validation. Additionall...

Full description

Saved in:

Bibliographic Details
Published in	Computers in biology and medicine Vol. 138; p. 104850
Main Authors	Anderson, Paul, Gadgil, Richa, Johnson, William A., Schwab, Ella, Davidson, Jean M.
Format	Journal Article
Language	English
Published	Oxford Elsevier Ltd 01.11.2021 Elsevier Limited
Subjects	Algorithms Applied computing Bioinformatics Biology Biomarkers Breast cancer Cancer research Datasets Deep learning Disease Embedding Gene expression Genomics Informatics Knowledge Learning algorithms Lung cancer Machine learning Medical prognosis Medical research Neural networks Ontology Prediction models Training Transcriptomics Tumors Applied computing Transcriptomics Bioinformatics Genomics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep learning neural networks have improved performance in many cancer informatics problems, including breast cancer subtype classification. However, many networks experience underspecificationwheremultiplecombinationsofparametersachievesimilarperformance, bothin training and validation. Additionally, certain parameter combinations may perform poorly when the test distribution differs from the training distribution. Embedding prior knowledge from the literature may address this issue by boosting predictive models that provide crucial, in-depth information about a given disease. Breast cancer research provides a wealth of such knowledge, particularly in the form of subtype biomarkers and genetic signatures. In this study, we draw on past research on breast cancer subtype biomarkers, label propagation, and neural graph machines to present a novel methodology for embedding knowledge into machine learning systems. We embed prior knowledge into the loss function in the form of inter-subject distances derived from a well-known published breast cancer signature. Our results show that this methodology reduces predictor variability on state-of-the-art deep learning architectures and increases predictor consistency leading to improved interpretation. We find that pathway enrichment analysis is more consistent after embedding knowledge. This novel method applies to a broad range of existing studies and predictive models. Our method moves the traditional synthesis of predictive models from an arbitrary assignment of weights to genes toward a more biologically meaningful approach of incorporating knowledge. •Deep learning neural networks are critical tools for translating large biological datasets into useful classifications.•However, reproducibility, especially across datasets can suffer, reducing confidence and clinical utility.•Incorporating knowledge from literature and databases into classifiers improves reliability while maintaining efficiency.•Improved abilities to parse reliable results from machine learning algorithms can uncover novel biological insights.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0010-4825 1879-0534
DOI:	10.1016/j.compbiomed.2021.104850