Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining

The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using syst...

Full description

Saved in:

Bibliographic Details
Published in	Genetics in medicine Vol. 14; no. 7; pp. 663 - 669
Main Authors	Wallace, Byron C., Small, Kevin, Brodley, Carla E., Lau, Joseph, Schmid, Christopher H., Bertram, Lars, Lill, Christina M., Cohen, Joshua T., Trikalinos, Thomas A.
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.07.2012 Elsevier Limited Nature Publishing Group
Subjects	Alzheimer Disease - genetics Alzheimer's disease citation screening Cost analysis Cost-Benefit Analysis Data mining Data Mining - methods Databases, Factual Empirical Research Humans machine learning meta-analysis Meta-Analysis as Topic Original Parkinson Disease - genetics Periodicals as Topic Schizophrenia - genetics Software support vector machine Systematic review Systematic Reviews as Topic Technology Assessment, Biomedical text classification citation screening meta-analysis support vector machine text classification machine learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to classify citations published in 2010 as “relevant” or “irrelevant” using human screening as the gold standard. Classification models did not miss any of the 104, 65, and 179 eligible citations in PDGene, AlzGene, and SzGene, respectively, and missed only 1 of 79 in the CEA Registry (100% sensitivity for the first three and 99% for the fourth). The respective specificities were 90, 93, 90, and 73%. Had the semiautomated system been used in 2010, a human would have needed to read only 605/5,616 citations to update the PDGene registry (11%) and 555/7,298 (8%), 717/5,381 (13%), and 334/1,015 (33%) for the other three databases. Data mining methodologies can reduce the burden of updating systematic reviews, without missing more papers than humans. Genet Med advance online publication 5 April 2012
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1
ISSN:	1098-3600 1530-0366
DOI:	10.1038/gim.2012.7