Quantifying predictive capability of electronic health records for the most harmful breast cancer

Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most har...

Full description

Saved in:
Bibliographic Details
Published inProceedings of SPIE, the international society for optical engineering Vol. 10577
Main Authors Wu, Yirong, Fan, Jun, Peissig, Peggy, Berg, Richard, Tafti, Ahmad Pahlavan, Yin, Jie, Yuan, Ming, Page, David, Cox, Jennifer, Burnside, Elizabeth S
Format Journal Article
LanguageEnglish
Published United States 01.02.2018
Subjects
Online AccessGet full text
ISSN0277-786X
1996-756X
DOI10.1117/12.2293954

Cover

Loading…
Abstract Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.
AbstractList Improved prediction of the “most harmful” breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the “most harmful” breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the “most harmful” breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 “most harmful” breast cancer cases and 399 “least harmful” breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the “most harmful” breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the “most harmful” breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.
Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.
Author Berg, Richard
Tafti, Ahmad Pahlavan
Fan, Jun
Yin, Jie
Page, David
Cox, Jennifer
Peissig, Peggy
Yuan, Ming
Wu, Yirong
Burnside, Elizabeth S
AuthorAffiliation b Marshfield Clinic, Marshfield, WI, USA
a University of Wisconsin Madison, WI, USA
d China Three Gorges University, Hubei, China
c Jiangbei People’s Hospital, Jiangsu, China
AuthorAffiliation_xml – name: c Jiangbei People’s Hospital, Jiangsu, China
– name: a University of Wisconsin Madison, WI, USA
– name: d China Three Gorges University, Hubei, China
– name: b Marshfield Clinic, Marshfield, WI, USA
Author_xml – sequence: 1
  givenname: Yirong
  surname: Wu
  fullname: Wu, Yirong
  organization: University of Wisconsin Madison, WI, USA
– sequence: 2
  givenname: Jun
  surname: Fan
  fullname: Fan, Jun
  organization: University of Wisconsin Madison, WI, USA
– sequence: 3
  givenname: Peggy
  surname: Peissig
  fullname: Peissig, Peggy
  organization: Marshfield Clinic, Marshfield, WI, USA
– sequence: 4
  givenname: Richard
  surname: Berg
  fullname: Berg, Richard
  organization: Marshfield Clinic, Marshfield, WI, USA
– sequence: 5
  givenname: Ahmad Pahlavan
  surname: Tafti
  fullname: Tafti, Ahmad Pahlavan
  organization: Marshfield Clinic, Marshfield, WI, USA
– sequence: 6
  givenname: Jie
  surname: Yin
  fullname: Yin, Jie
  organization: China Three Gorges University, Hubei, China
– sequence: 7
  givenname: Ming
  surname: Yuan
  fullname: Yuan, Ming
  organization: University of Wisconsin Madison, WI, USA
– sequence: 8
  givenname: David
  surname: Page
  fullname: Page, David
  organization: University of Wisconsin Madison, WI, USA
– sequence: 9
  givenname: Jennifer
  surname: Cox
  fullname: Cox, Jennifer
  organization: University of Wisconsin Madison, WI, USA
– sequence: 10
  givenname: Elizabeth S
  surname: Burnside
  fullname: Burnside, Elizabeth S
  organization: University of Wisconsin Madison, WI, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/29706685$$D View this record in MEDLINE/PubMed
BookMark eNpVkE1LxDAYhIOsuB968QdIjl66JmmTNBdBFr9gQQSFvZU0fbuNtElN04X99y64ip6GYYZnYOZo4rwDhC4pWVJK5Q1lS8ZUqnh2gmZUKZFILjYTNCNMykTmYjNF82H4IITlXKozNGVKEiFyPkP6ddQu2npv3Rb3ASprot0BNrrXpW1t3GNfY2jBxOCdNbgB3cYGBzA-VAOufcCxAdz5IeJGh64eW1wG0AdrtDMQztFprdsBLo66QO8P92-rp2T98vi8ulsnfUaymJhUZCrVBLismaQAPCsrQeqc5UITA0KxlBLJy5JVtDSSE6KYUGlZgaCVMekC3X5z-7HsoDLgYtBt0Qfb6bAvvLbF_8TZptj6XcEVzajkB8D1ERD85whDLDo7GGhb7cCPQ8FIyqTiKs8P1au_W78jP7-mXyWlfcI
ContentType Journal Article
DBID NPM
7X8
5PM
DOI 10.1117/12.2293954
DatabaseName PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList
MEDLINE - Academic
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1996-756X
ExternalDocumentID PMC5914175
29706685
Genre Journal Article
GroupedDBID 29O
4.4
5SJ
ACGFS
ADMLS
AFFNX
ALMA_UNASSIGNED_HOLDINGS
EBS
EJD
F5P
FQ0
NPM
R.2
RNS
RSJ
SPBNH
7X8
5PM
ID FETCH-LOGICAL-p404t-c36493a0e57f271ee54bd60f8286a0ce69231075bb2d1bc750092693bde61dcc3
ISSN 0277-786X
IngestDate Thu Aug 21 18:00:21 EDT 2025
Fri Jul 11 10:06:42 EDT 2025
Wed Feb 19 02:44:39 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Keywords least absolute shrinkage and selection operator (Lasso)
regularized prediction model
breast cancer
electronic health records (EHRs)
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p404t-c36493a0e57f271ee54bd60f8286a0ce69231075bb2d1bc750092693bde61dcc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/5914175
PMID 29706685
PQID 2032795988
PQPubID 23479
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_5914175
proquest_miscellaneous_2032795988
pubmed_primary_29706685
PublicationCentury 2000
PublicationDate 20180201
PublicationDateYYYYMMDD 2018-02-01
PublicationDate_xml – month: 2
  year: 2018
  text: 20180201
  day: 1
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Proceedings of SPIE, the international society for optical engineering
PublicationTitleAlternate Proc SPIE Int Soc Opt Eng
PublicationYear 2018
References 3203132 - Biometrics. 1988 Sep;44(3):837-45
9122385 - Radiology. 1997 Apr;203(1):159-63
17978811 - Kidney Int. 2008 Feb;73(3):256-60
20808728 - J Stat Softw. 2010;33(1):1-22
25847639 - Health Aff (Millwood). 2015 Apr;34(4):576-83
20237344 - N Engl J Med. 2010 Mar 18;362(11):986-93
14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
19535781 - J Natl Cancer Inst. 2009 Jul 1;101(13):959-63
26514439 - Acad Radiol. 2016 Jan;23 (1):62-9
25112586 - BMC Cancer. 2014 Aug 11;14:584
23503987 - J Digit Imaging. 2013 Oct;26(5):941-7
25797300 - Acad Radiol. 2015 Aug;22(8):961-6
18612136 - J Natl Cancer Inst. 2008 Jul 16;100(14):1037-41
28559747 - J Mach Learn Res. 2016 Dec;17
27189013 - J Am Med Inform Assoc. 2017 Jan;24(1):198-208
19304723 - AJR Am J Roentgenol. 2009 Apr;192(4):1117-27
24834204 - Gastroenterol Hepatol Bed Bench. 2012 Spring;5(2):79-83
24840597 - Cancer. 2014 Sep 1;120(17 ):2649-56
15968496 - Ann Surg Oncol. 2005 Aug;12 (8):660-73
19366902 - Radiology. 2009 Jun;251(3):663-72
2593165 - J Natl Cancer Inst. 1989 Dec 20;81(24):1879-86
References_xml – reference: 23503987 - J Digit Imaging. 2013 Oct;26(5):941-7
– reference: 2593165 - J Natl Cancer Inst. 1989 Dec 20;81(24):1879-86
– reference: 9122385 - Radiology. 1997 Apr;203(1):159-63
– reference: 25847639 - Health Aff (Millwood). 2015 Apr;34(4):576-83
– reference: 19304723 - AJR Am J Roentgenol. 2009 Apr;192(4):1117-27
– reference: 19366902 - Radiology. 2009 Jun;251(3):663-72
– reference: 18612136 - J Natl Cancer Inst. 2008 Jul 16;100(14):1037-41
– reference: 17978811 - Kidney Int. 2008 Feb;73(3):256-60
– reference: 25112586 - BMC Cancer. 2014 Aug 11;14:584
– reference: 15968496 - Ann Surg Oncol. 2005 Aug;12 (8):660-73
– reference: 27189013 - J Am Med Inform Assoc. 2017 Jan;24(1):198-208
– reference: 25797300 - Acad Radiol. 2015 Aug;22(8):961-6
– reference: 24840597 - Cancer. 2014 Sep 1;120(17 ):2649-56
– reference: 20237344 - N Engl J Med. 2010 Mar 18;362(11):986-93
– reference: 24834204 - Gastroenterol Hepatol Bed Bench. 2012 Spring;5(2):79-83
– reference: 3203132 - Biometrics. 1988 Sep;44(3):837-45
– reference: 19535781 - J Natl Cancer Inst. 2009 Jul 1;101(13):959-63
– reference: 28559747 - J Mach Learn Res. 2016 Dec;17 :
– reference: 14681409 - Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
– reference: 20808728 - J Stat Softw. 2010;33(1):1-22
– reference: 26514439 - Acad Radiol. 2016 Jan;23 (1):62-9
SSID ssj0028579
Score 2.2734532
Snippet Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense...
Improved prediction of the “most harmful” breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense...
SourceID pubmedcentral
proquest
pubmed
SourceType Open Access Repository
Aggregation Database
Index Database
Title Quantifying predictive capability of electronic health records for the most harmful breast cancer
URI https://www.ncbi.nlm.nih.gov/pubmed/29706685
https://www.proquest.com/docview/2032795988
https://pubmed.ncbi.nlm.nih.gov/PMC5914175
Volume 10577
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagvcAB8WbLQ0biVqU4dhLHx_KoCqKoqK3U28qvlD00WbHZQ_n1jB-Js1UrAZdo5azslefb8Yz9zWeE3hHLFCeSZZBqmMyF2JnMFc3gP04rZhtJ_NbF0ffq8Kz4el6eJ-qQry7p1Z7-fWNdyf9YFdrArq5K9h8sO3YKDfAZ7AtPsDA8_8rGP9bScX2uQkW5O3LxRCANC6DnvPrT88lFN6HocTdszKxGhuFlt-p3nYS14ysrx1LvHRlMR-JuDF2Px6XOsz9Ojr94IqXrYLGxrbiaEEG7Zdgrt0n2cFwG1t79uzK7se1AxlKRNrlsQMbiIlCJL9IBwIdIS5sIAwybF3k98J1HH-dOkHntLzNMDhliSH6Le_cCAXSPQpQigvr0xM7LS29oKjgEUuEmoGti2sOru2gbRqbgybf3Px19Oxlz9LoM8ozD74qKtjDw-zSsU5COHd2UmFzn104CltOH6EHMNPB-gM0jdMe2j9H9if7kEyQnAMIJQDgBCHcNTgDCAUA4AgiDhTHYHzsA4QggHACEA4CeorODz6cfD7N450a2LEjRZ5pVhWCS2JI3lOfWloUyFWmc2oAk2lY-IeClUtTkSkO8SQStBFPGVrnRmj1DW23X2hcIC2GYsUyYsjZFA6GhhmwDOpKCSsOtmqG3w9TNwae5gyrZ2m69mlPCKBelqOsZeh6mcr4M4ivzYeJniG9M8vgFp5e--aZd_PS66aXIC4iWd27t8yW6l0D6Cm31v9b2NcScvXoTcfIH-tCHxQ
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quantifying+predictive+capability+of+electronic+health+records+for+the+most+harmful+breast+cancer&rft.jtitle=Proceedings+of+SPIE%2C+the+international+society+for+optical+engineering&rft.au=Wu%2C+Yirong&rft.au=Fan%2C+Jun&rft.au=Peissig%2C+Peggy&rft.au=Berg%2C+Richard&rft.date=2018-02-01&rft.issn=0277-786X&rft.volume=10577&rft_id=info:doi/10.1117%2F12.2293954&rft_id=info%3Apmid%2F29706685&rft.externalDocID=29706685
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0277-786X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0277-786X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0277-786X&client=summon