Risk factor detection for heart disease by applying text analytics in electronic medical records

[Display omitted] •Risk factor detection in electronic medical records (EMR) was automated.•Existing tools and techniques were leveraged to build detection systems.•A general binary classification system was used to extract various risk factors.•Additional classifiers were built for subsets of targe...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biomedical informatics Vol. 58; no. Suppl; pp. S164 - S170
Main Authors	Torii, Manabu, Fan, Jung-wei, Yang, Wei-li, Lee, Theodore, Wiley, Matthew T., Zisook, Daniel S., Huang, Yang
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.12.2015
Subjects	Aged California - epidemiology Cardiovascular Diseases - diagnosis Cardiovascular Diseases - epidemiology Cohort Studies Comorbidity Computer Security Confidentiality Data Mining - methods Diabetes Complications - diagnosis Diabetes Complications - epidemiology Electronic health records Electronic Health Records - organization & administration Female Heart diseases Humans Hybrid systems Incidence Longitudinal Studies Male Medical records Medical services Middle Aged Narration Natural Language Processing Pattern Recognition, Automated - methods Risk analysis Risk assessment Risk Assessment - methods Text classification Texts Tracking Vocabulary, Controlled California Medical records Natural language processing Risk assessment Text classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	[Display omitted] •Risk factor detection in electronic medical records (EMR) was automated.•Existing tools and techniques were leveraged to build detection systems.•A general binary classification system was used to extract various risk factors.•Additional classifiers were built for subsets of target risk factors.•The hybrid approach combining our systems achieved F-score of 0.92. In the United States, about 600,000 people die of heart disease every year. The annual cost of care services, medications, and lost productivity reportedly exceeds 108.9billion dollars. Effective disease risk assessment is critical to prevention, care, and treatment planning. Recent advancements in text analytics have opened up new possibilities of using the rich information in electronic medical records (EMRs) to identify relevant risk factors. The 2014 i2b2/UTHealth Challenge brought together researchers and practitioners of clinical natural language processing (NLP) to tackle the identification of heart disease risk factors reported in EMRs. We participated in this track and developed an NLP system by leveraging existing tools and resources, both public and proprietary. Our system was a hybrid of several machine-learning and rule-based components. The system achieved an overall F1 score of 0.9185, with a recall of 0.9409 and a precision of 0.8972.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1532-0464 1532-0480
DOI:	10.1016/j.jbi.2015.08.011