Prediction of Breast Cancer Treatment\textendashInduced Fatigue by Machine Learning Using Genome-Wide Association Data

Background We aimed at predicting fatigue after breast cancer treatment using machine learning on clinical covariates and germline genome-wide data. Methods We accessed germline genome-wide data of 2799 early-stage breast cancer patients from the Cancer Toxicity study (NCT01993498). The primary endp...

Full description

Saved in:
Bibliographic Details
Published inJNCI cancer spectrum Vol. 4; no. 5
Main Authors Lee, Sangkyu, Deasy, Joseph O, Oh, Jung Hun, Di Meglio, Antonio, Dumas, Agnès, Menvielle, Gwenn, Charles, Cecile, Boyault, Sandrine, Rousseau, Marina, Besse, Celine, Thomas, Emilie, Boland, Anne, Cottu, Paul, Tredan, Olivier, Levy, Christelle, Martin, Anne-Laure, Everhard, Sibille, Ganz, Patricia A, Partridge, Ann H, Michiels, Stefan, Deleuze, Jean-François, Andre, Fabrice, Vaz-Luis, Ines
Format Journal Article
LanguageEnglish
Published Oxford University Press 01.10.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Background We aimed at predicting fatigue after breast cancer treatment using machine learning on clinical covariates and germline genome-wide data. Methods We accessed germline genome-wide data of 2799 early-stage breast cancer patients from the Cancer Toxicity study (NCT01993498). The primary endpoint was defined as scoring zero at diagnosis and higher than quartile 3 at 1\,year after primary treatment completion on European Organization for Research and Treatment of Cancer quality-of-life questionnaires for Overall Fatigue and on the multidimensional questionnaire for Physical, Emotional, and Cognitive fatigue. First, we tested univariate associations of each endpoint with clinical variables and genome-wide variants. Then, using preselected clinical (false discovery rate < 0.05) and genomic (P\,<\,.001) variables, a multivariable preconditioned random-forest regression model was built and validated on a hold-out subset to predict fatigue. Gene set enrichment analysis identified key biological correlates (MetaCore). All statistical tests were 2-sided. Results Statistically significant clinical associations were found only with Emotional and Cognitive Fatigue, including receipt of chemotherapy, anxiety, and pain. Some single nucleotide polymorphisms had some degree of association (P\,<\,.001) with the different fatigue endpoints, although there were no genome-wide statistically significant (P\,<\,5.00 \texttimes 10-8) associations. Only for Cognitive Fatigue, the predictive ability of the genomic multivariable model was statistically significantly better than random (area under the curve = 0.59, P\,=\,.01) and marginally improved with clinical variables (area under the curve\,=\,0.60, P\,=\,.005). Single nucleotide polymorphisms found to be associated (P\,<\,.001) with Cognitive Fatigue belonged to genes linked to inflammation (false discovery rate adjusted P\,=\,.03), cognitive disorders (P\,=\,1.51 \texttimes 10-12), and synaptic transmission (P\,=\,6.28 \texttimes 10-8). Conclusions Genomic analyses in this large cohort of breast cancer survivors suggest a possible genetic role for severe Cognitive Fatigue that warrants further exploration.
Bibliography:PMCID: PMC7583150
ISSN:2515-5091
DOI:10.1093/jncics/pkaa039