A systematic review and comparative analysis of cross-document coreference resolution methods and tools

Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from an ever-increasing amount of data depends critically upon cross-document coreferenc...

Full description

Saved in:
Bibliographic Details
Published inComputing Vol. 99; no. 4; pp. 313 - 349
Main Authors Beheshti, Seyed-Mehdi-Reza, Benatallah, Boualem, Venugopal, Srikumar, Ryu, Seung Hwan, Motahari-Nezhad, Hamid Reza, Wang, Wei
Format Journal Article
LanguageEnglish
Published Vienna Springer Vienna 01.04.2017
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from an ever-increasing amount of data depends critically upon cross-document coreference resolution (CDCR) - the task of identifying entity mentions across information sources that refer to the same underlying entity. CDCR is the basis of knowledge acquisition and is at the heart of Web search, recommendations, and analytics. Real time processing of CDCR processes is very important and have various applications in discovering must-know information in real-time for clients in finance, public sector, news, and crisis management. Being an emerging area of research and practice, the reported literature on CDCR challenges and solutions is growing fast but is scattered due to the large space, various applications, and large datasets of the order of peta-/tera-bytes. In order to fill this gap, we provide a systematic review of the state of the art of challenges and solutions for a CDCR process. We identify a set of quality attributes, that have been frequently reported in the context of CDCR processes, to be used as a guide to identify important and outstanding issues for further investigations. Finally, we assess existing tools and techniques for CDCR subtasks and provide guidance on selection of tools and algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0010-485X
1436-5057
DOI:10.1007/s00607-016-0490-0