Ensemble modelling in verbal autopsy: the Popular Voting method
Abstract Background Verbal autopsy (VA) is a highly valuable tool for assessing causes of death in resource-limited settings without medically certified death certificates. The Population Health Metrics Research Consortium (PHMRC) collected 12 535 VAs in four countries for which the true cause of de...
Saved in:
Published in | The Lancet (British edition) Vol. 381; no. S2; p. S48 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
London
Elsevier Ltd
17.06.2013
Elsevier Limited |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Abstract Background Verbal autopsy (VA) is a highly valuable tool for assessing causes of death in resource-limited settings without medically certified death certificates. The Population Health Metrics Research Consortium (PHMRC) collected 12 535 VAs in four countries for which the true cause of death was reliably known. This project led to the development of three new computer algorithms to determine cause of death from these VAs, all of which predict underlying cause of death more accurately than the status quo: physician review. Concurrently, ensemble models, or blends of well-performing models, have been shown to have favourable predictive validity and have begun to be implemented in global health metrics settings. Methods We developed a simple ensemble model based on the three top performing PHMRC methods: the Simplified Symptom Pattern (SSP), the Tariff, and the Random Forest (RF). This ensemble method functions at the individual-record level, examining the predicted cause of death from the three component models and selecting cause of death by a simple majority (Popular Voting). Sensitivity analyses revealed that selecting the prediction made by RF in cases where all three models disagreed was preferable, and this ensemble method was adapted accordingly. Findings The Popular Voting method performed better in cause-specific mortality fraction accuracy than did any individual model alone for adults, children, and neonates, and performed better in chance-corrected concordance than did any individual model except SSP in adults. The three component models disagreed in 16% of all cases, and unanimously agreed in 47% of cases. Interpretation As VA continues to be an effective source of data for estimating cause of death, accurate and inexpensive methods for analysing VA interview responses are increasingly important. The recent development of the three highly accurate PHMRC computational models allows for the option of a meta-model such as the ensemble introduced here. This ensemble model for VA achieves superior performance, and could be applied to other VA samples to accurately assess the relative mortality burden from a variety of diseases and injuries. Funding Population Health Metrics Research Consortium. |
---|---|
Bibliography: | http://dx.doi.org/10.1016/S0140-6736(13)61302-1 |
ISSN: | 0140-6736 1474-547X |
DOI: | 10.1016/S0140-6736(13)61302-1 |