Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts

Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would requi...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 10; no. 8; p. e0136651
Main Authors	Liao, Katherine P., Ananthakrishnan, Ashwin N., Kumar, Vishesh, Xia, Zongqi, Cagan, Andrew, Gainer, Vivian S., Goryachev, Sergey, Chen, Pei, Savova, Guergana K., Agniel, Denis, Churchill, Susanne, Lee, Jaeyoung, Murphy, Shawn N., Plenge, Robert M., Szolovits, Peter, Kohane, Isaac, Shaw, Stanley Y., Karlson, Elizabeth W., Cai, Tianxi
Format	Journal Article
Language	English
Published	United States Public Library of Science 24.08.2015 Public Library of Science (PLoS)
Subjects	Adult Aged Algorithms Analysis Arthritis Arthritis, Rheumatoid - complications Arthritis, Rheumatoid - epidemiology Arthritis, Rheumatoid - physiopathology Cardiology Cardiovascular disease Cardiovascular diseases Chronic diseases Classification Computerized physician order entry Coronary artery Coronary artery disease Coronary Artery Disease - complications Coronary Artery Disease - epidemiology Coronary Artery Disease - physiopathology Coronary heart disease Coronary vessels Diabetes Diabetes mellitus Diabetes Mellitus - epidemiology Diabetes Mellitus - physiopathology Electronic Health Records Electronic medical records Electronic records Female Genetic aspects Genotype & phenotype Health informatics Health risks Heart Heart diseases Hospitals Humans Hyperlipidemias - complications Hyperlipidemias - epidemiology Hyperlipidemias - physiopathology Inflammatory bowel disease Inflammatory bowel diseases Inflammatory Bowel Diseases - complications Inflammatory Bowel Diseases - epidemiology Inflammatory Bowel Diseases - physiopathology Intestine Laboratories Male Medical electronics Medical records Medical schools Middle Aged Natural Language Processing Patients Phenotype Phenotypes Population studies Populations Rheumatic diseases Rheumatism Rheumatoid arthritis Rheumatoid factor Rheumatology Risk analysis Risk Factors Sensitivity Studies Systematic review Womens health United States United States > US Massachusetts
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Conceived and designed the experiments: KPL ANA VK ZX SC SNM RMP PS IK SYS EWK TC. Performed the experiments: KPL ANA VK ZX AC VSG SG PC GKS DA JYL SYS TC. Analyzed the data: KPL ANA VK ZX DA SYS TC. Wrote the paper: KPL ANA VK ZX AC VSG SG PC GKS DA SC JYL SNM RMP PS IK SYS EWK TC. Competing Interests: Author RMP is currently employed by Merck Research Laboratories. The majority of this study was conducted while RMP was a faculty member at Harvard Medical School and Brigham and Women's Hospital. RMP was employed at Merck during the drafting of the manuscript only. No part of this study was funded by Merck Laboratories. There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0136651