Quantifying predictive capability of electronic health records for the most harmful breast cancer
Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most har...
Saved in:
Published in | Proceedings of SPIE, the international society for optical engineering Vol. 10577 |
---|---|
Main Authors | , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
01.02.2018
|
Subjects | |
Online Access | Get full text |
ISSN | 0277-786X 1996-756X |
DOI | 10.1117/12.2293954 |
Cover
Loading…
Abstract | Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice. |
---|---|
AbstractList | Improved prediction of the “most harmful” breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the “most harmful” breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the “most harmful” breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 “most harmful” breast cancer cases and 399 “least harmful” breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the “most harmful” breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the “most harmful” breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice. Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice. |
Author | Berg, Richard Tafti, Ahmad Pahlavan Fan, Jun Yin, Jie Page, David Cox, Jennifer Peissig, Peggy Yuan, Ming Wu, Yirong Burnside, Elizabeth S |
AuthorAffiliation | b Marshfield Clinic, Marshfield, WI, USA a University of Wisconsin Madison, WI, USA d China Three Gorges University, Hubei, China c Jiangbei People’s Hospital, Jiangsu, China |
AuthorAffiliation_xml | – name: c Jiangbei People’s Hospital, Jiangsu, China – name: a University of Wisconsin Madison, WI, USA – name: d China Three Gorges University, Hubei, China – name: b Marshfield Clinic, Marshfield, WI, USA |
Author_xml | – sequence: 1 givenname: Yirong surname: Wu fullname: Wu, Yirong organization: University of Wisconsin Madison, WI, USA – sequence: 2 givenname: Jun surname: Fan fullname: Fan, Jun organization: University of Wisconsin Madison, WI, USA – sequence: 3 givenname: Peggy surname: Peissig fullname: Peissig, Peggy organization: Marshfield Clinic, Marshfield, WI, USA – sequence: 4 givenname: Richard surname: Berg fullname: Berg, Richard organization: Marshfield Clinic, Marshfield, WI, USA – sequence: 5 givenname: Ahmad Pahlavan surname: Tafti fullname: Tafti, Ahmad Pahlavan organization: Marshfield Clinic, Marshfield, WI, USA – sequence: 6 givenname: Jie surname: Yin fullname: Yin, Jie organization: China Three Gorges University, Hubei, China – sequence: 7 givenname: Ming surname: Yuan fullname: Yuan, Ming organization: University of Wisconsin Madison, WI, USA – sequence: 8 givenname: David surname: Page fullname: Page, David organization: University of Wisconsin Madison, WI, USA – sequence: 9 givenname: Jennifer surname: Cox fullname: Cox, Jennifer organization: University of Wisconsin Madison, WI, USA – sequence: 10 givenname: Elizabeth S surname: Burnside fullname: Burnside, Elizabeth S organization: University of Wisconsin Madison, WI, USA |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/29706685$$D View this record in MEDLINE/PubMed |
BookMark | eNpVkE1LxDAYhIOsuB968QdIjl66JmmTNBdBFr9gQQSFvZU0fbuNtElN04X99y64ip6GYYZnYOZo4rwDhC4pWVJK5Q1lS8ZUqnh2gmZUKZFILjYTNCNMykTmYjNF82H4IITlXKozNGVKEiFyPkP6ddQu2npv3Rb3ASprot0BNrrXpW1t3GNfY2jBxOCdNbgB3cYGBzA-VAOufcCxAdz5IeJGh64eW1wG0AdrtDMQztFprdsBLo66QO8P92-rp2T98vi8ulsnfUaymJhUZCrVBLismaQAPCsrQeqc5UITA0KxlBLJy5JVtDSSE6KYUGlZgaCVMekC3X5z-7HsoDLgYtBt0Qfb6bAvvLbF_8TZptj6XcEVzajkB8D1ERD85whDLDo7GGhb7cCPQ8FIyqTiKs8P1au_W78jP7-mXyWlfcI |
ContentType | Journal Article |
DBID | NPM 7X8 5PM |
DOI | 10.1117/12.2293954 |
DatabaseName | PubMed MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | PubMed MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic PubMed |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1996-756X |
ExternalDocumentID | PMC5914175 29706685 |
Genre | Journal Article |
GroupedDBID | 29O 4.4 5SJ ACGFS ADMLS AFFNX ALMA_UNASSIGNED_HOLDINGS EBS EJD F5P FQ0 NPM R.2 RNS RSJ SPBNH 7X8 5PM |
ID | FETCH-LOGICAL-p404t-c36493a0e57f271ee54bd60f8286a0ce69231075bb2d1bc750092693bde61dcc3 |
ISSN | 0277-786X |
IngestDate | Thu Aug 21 18:00:21 EDT 2025 Fri Jul 11 10:06:42 EDT 2025 Wed Feb 19 02:44:39 EST 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Keywords | least absolute shrinkage and selection operator (Lasso) regularized prediction model breast cancer electronic health records (EHRs) |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p404t-c36493a0e57f271ee54bd60f8286a0ce69231075bb2d1bc750092693bde61dcc3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/5914175 |
PMID | 29706685 |
PQID | 2032795988 |
PQPubID | 23479 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_5914175 proquest_miscellaneous_2032795988 pubmed_primary_29706685 |
PublicationCentury | 2000 |
PublicationDate | 20180201 |
PublicationDateYYYYMMDD | 2018-02-01 |
PublicationDate_xml | – month: 2 year: 2018 text: 20180201 day: 1 |
PublicationDecade | 2010 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Proceedings of SPIE, the international society for optical engineering |
PublicationTitleAlternate | Proc SPIE Int Soc Opt Eng |
PublicationYear | 2018 |
References | 3203132 - Biometrics. 1988 Sep;44(3):837-45 9122385 - Radiology. 1997 Apr;203(1):159-63 17978811 - Kidney Int. 2008 Feb;73(3):256-60 20808728 - J Stat Softw. 2010;33(1):1-22 25847639 - Health Aff (Millwood). 2015 Apr;34(4):576-83 20237344 - N Engl J Med. 2010 Mar 18;362(11):986-93 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 19535781 - J Natl Cancer Inst. 2009 Jul 1;101(13):959-63 26514439 - Acad Radiol. 2016 Jan;23 (1):62-9 25112586 - BMC Cancer. 2014 Aug 11;14:584 23503987 - J Digit Imaging. 2013 Oct;26(5):941-7 25797300 - Acad Radiol. 2015 Aug;22(8):961-6 18612136 - J Natl Cancer Inst. 2008 Jul 16;100(14):1037-41 28559747 - J Mach Learn Res. 2016 Dec;17 27189013 - J Am Med Inform Assoc. 2017 Jan;24(1):198-208 19304723 - AJR Am J Roentgenol. 2009 Apr;192(4):1117-27 24834204 - Gastroenterol Hepatol Bed Bench. 2012 Spring;5(2):79-83 24840597 - Cancer. 2014 Sep 1;120(17 ):2649-56 15968496 - Ann Surg Oncol. 2005 Aug;12 (8):660-73 19366902 - Radiology. 2009 Jun;251(3):663-72 2593165 - J Natl Cancer Inst. 1989 Dec 20;81(24):1879-86 |
References_xml | – reference: 23503987 - J Digit Imaging. 2013 Oct;26(5):941-7 – reference: 2593165 - J Natl Cancer Inst. 1989 Dec 20;81(24):1879-86 – reference: 9122385 - Radiology. 1997 Apr;203(1):159-63 – reference: 25847639 - Health Aff (Millwood). 2015 Apr;34(4):576-83 – reference: 19304723 - AJR Am J Roentgenol. 2009 Apr;192(4):1117-27 – reference: 19366902 - Radiology. 2009 Jun;251(3):663-72 – reference: 18612136 - J Natl Cancer Inst. 2008 Jul 16;100(14):1037-41 – reference: 17978811 - Kidney Int. 2008 Feb;73(3):256-60 – reference: 25112586 - BMC Cancer. 2014 Aug 11;14:584 – reference: 15968496 - Ann Surg Oncol. 2005 Aug;12 (8):660-73 – reference: 27189013 - J Am Med Inform Assoc. 2017 Jan;24(1):198-208 – reference: 25797300 - Acad Radiol. 2015 Aug;22(8):961-6 – reference: 24840597 - Cancer. 2014 Sep 1;120(17 ):2649-56 – reference: 20237344 - N Engl J Med. 2010 Mar 18;362(11):986-93 – reference: 24834204 - Gastroenterol Hepatol Bed Bench. 2012 Spring;5(2):79-83 – reference: 3203132 - Biometrics. 1988 Sep;44(3):837-45 – reference: 19535781 - J Natl Cancer Inst. 2009 Jul 1;101(13):959-63 – reference: 28559747 - J Mach Learn Res. 2016 Dec;17 : – reference: 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70 – reference: 20808728 - J Stat Softw. 2010;33(1):1-22 – reference: 26514439 - Acad Radiol. 2016 Jan;23 (1):62-9 |
SSID | ssj0028579 |
Score | 2.2734532 |
Snippet | Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense... Improved prediction of the “most harmful” breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense... |
SourceID | pubmedcentral proquest pubmed |
SourceType | Open Access Repository Aggregation Database Index Database |
Title | Quantifying predictive capability of electronic health records for the most harmful breast cancer |
URI | https://www.ncbi.nlm.nih.gov/pubmed/29706685 https://www.proquest.com/docview/2032795988 https://pubmed.ncbi.nlm.nih.gov/PMC5914175 |
Volume | 10577 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagvcAB8WbLQ0biVqU4dhLHx_KoCqKoqK3U28qvlD00WbHZQ_n1jB-Js1UrAZdo5azslefb8Yz9zWeE3hHLFCeSZZBqmMyF2JnMFc3gP04rZhtJ_NbF0ffq8Kz4el6eJ-qQry7p1Z7-fWNdyf9YFdrArq5K9h8sO3YKDfAZ7AtPsDA8_8rGP9bScX2uQkW5O3LxRCANC6DnvPrT88lFN6HocTdszKxGhuFlt-p3nYS14ysrx1LvHRlMR-JuDF2Px6XOsz9Ojr94IqXrYLGxrbiaEEG7Zdgrt0n2cFwG1t79uzK7se1AxlKRNrlsQMbiIlCJL9IBwIdIS5sIAwybF3k98J1HH-dOkHntLzNMDhliSH6Le_cCAXSPQpQigvr0xM7LS29oKjgEUuEmoGti2sOru2gbRqbgybf3Px19Oxlz9LoM8ozD74qKtjDw-zSsU5COHd2UmFzn104CltOH6EHMNPB-gM0jdMe2j9H9if7kEyQnAMIJQDgBCHcNTgDCAUA4AgiDhTHYHzsA4QggHACEA4CeorODz6cfD7N450a2LEjRZ5pVhWCS2JI3lOfWloUyFWmc2oAk2lY-IeClUtTkSkO8SQStBFPGVrnRmj1DW23X2hcIC2GYsUyYsjZFA6GhhmwDOpKCSsOtmqG3w9TNwae5gyrZ2m69mlPCKBelqOsZeh6mcr4M4ivzYeJniG9M8vgFp5e--aZd_PS66aXIC4iWd27t8yW6l0D6Cm31v9b2NcScvXoTcfIH-tCHxQ |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quantifying+predictive+capability+of+electronic+health+records+for+the+most+harmful+breast+cancer&rft.jtitle=Proceedings+of+SPIE%2C+the+international+society+for+optical+engineering&rft.au=Wu%2C+Yirong&rft.au=Fan%2C+Jun&rft.au=Peissig%2C+Peggy&rft.au=Berg%2C+Richard&rft.date=2018-02-01&rft.issn=0277-786X&rft.volume=10577&rft_id=info:doi/10.1117%2F12.2293954&rft_id=info%3Apmid%2F29706685&rft.externalDocID=29706685 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0277-786X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0277-786X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0277-786X&client=summon |