Constructing and evaluating automated literature review systems

Automated literature reviews have the potential to accelerate knowledge synthesis and provide new insights. However, a lack of labeled ground-truth data has made it difficult to develop and evaluate these methods. We propose a framework that uses the reference lists from existing review papers as la...

Full description

Saved in:

Bibliographic Details
Published in	Scientometrics Vol. 125; no. 3; pp. 3233 - 3251
Main Authors	Portenoy, Jason, West, Jevin D.
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.12.2020
Subjects	Computer Science Information Storage and Retrieval Library Science Autoreview Big scholarly data Citation networks Scholarly recommendation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Automated literature reviews have the potential to accelerate knowledge synthesis and provide new insights. However, a lack of labeled ground-truth data has made it difficult to develop and evaluate these methods. We propose a framework that uses the reference lists from existing review papers as labeled data, which can then be used to train supervised classifiers, allowing for experimentation and testing of models and features at a large scale. We demonstrate our framework by training classifiers using different combinations of citation- and text-based features on 500 review papers. We use the R-Precision scores for the task of reconstructing the review papers’ reference lists as a way to evaluate and compare methods. We also extend our method, generating a novel set of articles relevant to the fields of misinformation studies and science communication. We find that our method can identify many of the most relevant papers for a literature review from a large set of candidate papers, and that our framework allows for development and testing of models and features to incrementally improve the results. The models we build are able to identify relevant papers even when starting with a very small set of seed papers. We also find that the methods can be adapted to identify previously undiscovered articles that may be relevant to a given topic.
ISSN:	0138-9130 1588-2861
DOI:	10.1007/s11192-020-03490-w