Error Detection of CRF-Based Bibliography Extraction from Reference Strings

We proposed a parsing method for reference strings usually listed at the end of research papers to extract important bibliographies such as a title from them. The method uses a conditional random field (CRF) to estimate the correct bibliographic label for each token in the token sequence generated f...

Full description

Saved in:

Bibliographic Details
Published in	The Outreach of Digital Libraries: A Globalized Resource Network pp. 229 - 238
Main Authors	Ohta, Manabu, Arauchi, Daiki, Takasu, Atsuhiro, Adachi, Jun
Format	Book Chapter
Language	English
Published	Berlin, Heidelberg Springer Berlin Heidelberg
Series	Lecture Notes in Computer Science
Subjects	bibliography extraction conditional random field (CRF) confidence measure digital library error detection reference string
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We proposed a parsing method for reference strings usually listed at the end of research papers to extract important bibliographies such as a title from them. The method uses a conditional random field (CRF) to estimate the correct bibliographic label for each token in the token sequence generated from a reference string. Although we achieved reasonable parsing accuracies for a Japanese academic journal, errors are inevitable. Therefore, this paper proposes ways to increase confidence for CRF-based bibliography parsing to detect such parsing errors. This paper also reports an empirical evaluation of the proposed parsing on the basis not only of its accuracies but also of how easy it is to detect errors. The experiments showed that the proposed measures reasonably indicated parsing errors and could be used to improve the quality of extracted bibliographies at a moderate manual post-editing cost.
ISBN:	3642347517 9783642347511
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-642-34752-8_29