Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city

Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-ti...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical research methodology Vol. 20; no. 1; pp. 77 - 10
Main Authors	Kim, Ryung S., Shankar, Viswanathan
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 06.04.2020 BioMed Central BMC
Subjects	Bias Big Data Electronic health records Electronic records Estimates Health surveillance Health surveys Immunization Influenza Measurement error Medical records Medical research Multiple imputations Population health surveillance Primary care Public health Research methodology Sample variance Selection bias Software Victimization United States Selection bias Population health surveillance Big data Multiple imputations Electronic health records Measurement error
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-time EHR while correcting the potential bias. We demonstrate joint analyses of EHR and a smaller gold-standard health survey. We first adopted Mosteller's method that pools two estimators, among which one is potentially biased. It only requires knowing the prevalence estimates from two data sources and their standard errors. Then, we adopted the method of Schenker et al., which uses multiple imputations of subject-level health outcomes that are missing for the subjects in EHR. This procedure requires information to link some subjects between two sources and modeling the mechanism of misclassification in EHR as well as modeling inclusion probabilities to both sources. In a simulation study, both estimators yielded negligible bias even when EHR was biased. They performed as well as health survey estimator when EHR bias was large and better than health survey estimator when EHR bias was moderate. It may be challenging to model the misclassification mechanism in real data for the subject-level imputation estimator. We illustrated the methods analyzing six health indicators from 2013 to 14 NYC HANES and the 2013 NYC Macroscope, and a study that linked some subjects in both data sources. When a small gold-standard health survey exists, it can serve as a safeguard against potential bias in EHR through the joint analysis of the two sources.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2288 1471-2288
DOI:	10.1186/s12874-020-00956-6