A systematic review and comparative analysis of cross-document coreference resolution methods and tools

Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from an ever-increasing amount of data depends critically upon cross-document coreferenc...

Full description

Saved in:
Bibliographic Details
Published inComputing Vol. 99; no. 4; pp. 313 - 349
Main Authors Beheshti, Seyed-Mehdi-Reza, Benatallah, Boualem, Venugopal, Srikumar, Ryu, Seung Hwan, Motahari-Nezhad, Hamid Reza, Wang, Wei
Format Journal Article
LanguageEnglish
Published Vienna Springer Vienna 01.04.2017
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from an ever-increasing amount of data depends critically upon cross-document coreference resolution (CDCR) - the task of identifying entity mentions across information sources that refer to the same underlying entity. CDCR is the basis of knowledge acquisition and is at the heart of Web search, recommendations, and analytics. Real time processing of CDCR processes is very important and have various applications in discovering must-know information in real-time for clients in finance, public sector, news, and crisis management. Being an emerging area of research and practice, the reported literature on CDCR challenges and solutions is growing fast but is scattered due to the large space, various applications, and large datasets of the order of peta-/tera-bytes. In order to fill this gap, we provide a systematic review of the state of the art of challenges and solutions for a CDCR process. We identify a set of quality attributes, that have been frequently reported in the context of CDCR processes, to be used as a guide to identify important and outstanding issues for further investigations. Finally, we assess existing tools and techniques for CDCR subtasks and provide guidance on selection of tools and algorithms.
AbstractList Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from an ever-increasing amount of data depends critically upon cross-document coreference resolution (CDCR) - the task of identifying entity mentions across information sources that refer to the same underlying entity. CDCR is the basis of knowledge acquisition and is at the heart of Web search, recommendations, and analytics. Real time processing of CDCR processes is very important and have various applications in discovering must-know information in real-time for clients in finance, public sector, news, and crisis management. Being an emerging area of research and practice, the reported literature on CDCR challenges and solutions is growing fast but is scattered due to the large space, various applications, and large datasets of the order of peta-/tera-bytes. In order to fill this gap, we provide a systematic review of the state of the art of challenges and solutions for a CDCR process. We identify a set of quality attributes, that have been frequently reported in the context of CDCR processes, to be used as a guide to identify important and outstanding issues for further investigations. Finally, we assess existing tools and techniques for CDCR subtasks and provide guidance on selection of tools and algorithms.
Author Benatallah, Boualem
Beheshti, Seyed-Mehdi-Reza
Venugopal, Srikumar
Wang, Wei
Ryu, Seung Hwan
Motahari-Nezhad, Hamid Reza
Author_xml – sequence: 1
  givenname: Seyed-Mehdi-Reza
  surname: Beheshti
  fullname: Beheshti, Seyed-Mehdi-Reza
  email: sbeheshti@cse.unsw.edu.au
  organization: School of Computer Science and Engineering, University of New South Wales
– sequence: 2
  givenname: Boualem
  surname: Benatallah
  fullname: Benatallah, Boualem
  organization: School of Computer Science and Engineering, University of New South Wales
– sequence: 3
  givenname: Srikumar
  surname: Venugopal
  fullname: Venugopal, Srikumar
  organization: School of Computer Science and Engineering, University of New South Wales
– sequence: 4
  givenname: Seung Hwan
  surname: Ryu
  fullname: Ryu, Seung Hwan
  organization: School of Computer Science and Engineering, University of New South Wales
– sequence: 5
  givenname: Hamid Reza
  surname: Motahari-Nezhad
  fullname: Motahari-Nezhad, Hamid Reza
  organization: School of Computer Science and Engineering, University of New South Wales, IBM Almaden Research Center
– sequence: 6
  givenname: Wei
  surname: Wang
  fullname: Wang, Wei
  organization: School of Computer Science and Engineering, University of New South Wales
BookMark eNp1kEtLxDAUhYMoOD5-gLuCGzfRm2SStksRXyC4UXAX0vR27NA2Y26rzL83M-NCBFf3wXcOnHPE9ocwIGNnAi4FQH5FAAZyDsJwmJfAYY_NxFwZrkHn-2wGIIDPC_12yI6IlgAgVVHO2OI6ozWN2Lux9VnEzxa_MjfUmQ_9ysX0_cR0u25NLWWhyXwMRLwOfupxGBMWscGIg8ekptBNYxuGrMfxPdS0dRpD6OiEHTSuIzz9mcfs9e725eaBPz3fP95cP3GvSjnyQnuFOk-7alwtC-G0MSlPVZW1kZWfSyGVVFikTTrTYKPBm8rUutKN8V4ds4ud7yqGjwlptH1LHrvODRgmsqIoVQkyB5HQ8z_oMkwxRd1QBRRGl1InSuyobfAU1q5i27u4tgLspnq7q96m6u2megtJI3caSuywwPjL-V_RN-UUiW4
CitedBy_id crossref_primary_10_3390_make2030009
crossref_primary_10_1016_j_wpi_2018_10_002
crossref_primary_10_14778_3229863_3236230
crossref_primary_10_1017_S1351324920000443
crossref_primary_10_1007_s00766_022_00374_8
crossref_primary_10_1109_ACCESS_2020_3009445
crossref_primary_10_3390_bdcc2040033
crossref_primary_10_3390_e21040419
crossref_primary_10_3390_app13169272
crossref_primary_10_1007_s10619_018_7245_1
Cites_doi 10.1145/356827.356830
10.1007/s10579-007-9044-6
10.1145/1105664.1105679
10.1109/MIC.2010.58
10.1075/li.30.1.03nad
10.1007/s10579-012-9194-z
10.1016/j.asoc.2009.12.025
10.1145/1327452.1327492
10.1145/219717.219748
10.1145/1010925.1010927
10.14778/2367502.2367527
10.4018/jswis.2009081901
10.1017/S1351324911000106
10.3115/1613715.1613795
10.1109/ICSC.2014.31
10.1162/tacl_a_00119
10.3115/1620754.1620778
10.1145/1242572.1242667
10.3115/1219840.1219885
10.3115/1218955.1219031
10.3115/1557690.1557767
10.3115/1219044.1219066
10.3115/1610075.1610158
10.3115/1219840.1219841
10.1007/s00453-001-0010-1
10.1145/1007568.1007652
10.1075/bct.19
10.1109/ITCC.2002.1000354
10.1007/978-3-642-03070-3_52
10.1137/1.9781611972795.32
10.3115/1613715.1613756
10.3115/1699571.1699635
10.3115/1218955.1218973
10.3115/1609067.1609072
10.1109/ESEM.2011.36
10.1007/s00778-008-0098-x
10.1145/2488388.2488411
10.1007/978-3-319-15350-6_3
10.3115/1220575.1220588
10.1145/1376616.1376726
10.3115/1072399.1072405
10.1109/ICDE.2002.994694
10.1109/ICDE.2011.5767865
10.1007/11573036_36
10.1186/1471-2105-9-S9-S11
10.1007/978-3-540-76298-0_52
10.1145/1376616.1376746
10.3115/1219840.1219917
10.3115/1220575.1220579
ContentType Journal Article
Copyright Springer-Verlag Wien 2016
Computing is a copyright of Springer, 2017.
Copyright_xml – notice: Springer-Verlag Wien 2016
– notice: Computing is a copyright of Springer, 2017.
DBID AAYXX
CITATION
0U~
1-H
3V.
7SC
7WY
7WZ
7XB
87Z
8AL
8AO
8FD
8FE
8FG
8FK
8FL
8G5
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FRNLG
F~G
GNUQQ
GUQSH
HCIFZ
JQ2
K60
K6~
K7-
L.-
L.0
L7M
L~C
L~D
M0C
M0N
M2O
MBDVC
P5Z
P62
PQBIZ
PQBZA
PQEST
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.1007/s00607-016-0490-0
DatabaseName CrossRef
Global News & ABI/Inform Professional
Trade PRO
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Computing Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni Edition)
Research Library (Alumni Edition)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Database‎ (1962 - current)
ProQuest Central Essentials
AUTh Library subscriptions: ProQuest Central
Business Premium Collection
Technology Collection
ProQuest One Community College
ProQuest Central
Business Premium Collection (Alumni)
ABI/INFORM Global (Corporate)
ProQuest Central Student
Research Library Prep
SciTech Premium Collection (Proquest) (PQ_SDU_P3)
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
ABI/INFORM Professional Advanced
ABI/INFORM Professional Standard
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global (ProQuest)
Computing Database
ProQuest_Research Library
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
One Business (ProQuest)
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
ABI/INFORM Global (Corporate)
ProQuest Business Collection (Alumni Edition)
ProQuest One Business
Research Library Prep
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
Research Library (Alumni Edition)
Trade PRO
ProQuest Pharma Collection
ProQuest Central China
ABI/INFORM Complete
ProQuest Central
Global News & ABI/Inform Professional
ABI/INFORM Professional Advanced
ABI/INFORM Professional Standard
ProQuest Central Korea
ProQuest Research Library
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
ProQuest Computing
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Business Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Business (Alumni)
ProQuest One Academic
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList Computer and Information Systems Abstracts

ABI/INFORM Global (Corporate)
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
Computer Science
EISSN 1436-5057
EndPage 349
ExternalDocumentID 4321169011
10_1007_s00607_016_0490_0
Genre Feature
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
0VY
1N0
1SB
2.D
203
28-
29F
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
6TJ
78A
7WY
8AO
8FE
8FG
8FL
8G5
8TC
8UJ
8VB
95-
95.
95~
96X
AAAVM
AABHQ
AABYN
AAFGU
AAHNG
AAIAL
AAJKR
AANZL
AAOBN
AAPBV
AARHV
AARTL
AATNV
AATVU
AAUYE
AAWCG
AAWWR
AAYFA
AAYIU
AAYQN
AAYTO
ABBBX
ABBXA
ABDBF
ABDZT
ABECU
ABFGW
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKAS
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABPTK
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACBMV
ACBRV
ACBXY
ACBYP
ACGFS
ACHSB
ACHXU
ACIGE
ACIPQ
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACTTH
ACVWB
ACWMK
ADGRI
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMDM
ADOXG
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEEQQ
AEFIE
AEFTE
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMOZ
AENEX
AEOHA
AEPYU
AESKC
AESTI
AETLH
AEVLU
AEVTX
AEXYK
AEYWE
AFEXP
AFFNX
AFGCZ
AFKRA
AFLOW
AFNRJ
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGBP
AGGDS
AGJBK
AGMZJ
AGQMX
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIIXL
AILAN
AIMYW
AITGF
AJBLW
AJDOV
AJRNO
AJZVZ
AKQUC
AKVCP
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
AZQEC
B-.
B0M
BA0
BBWZM
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BKOMP
BPHCQ
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DWQXO
EAD
EAP
EBA
EBLON
EBR
EBS
EBU
ECS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
EST
ESX
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GUQSH
GXS
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IHE
IJ-
IKXTQ
ITG
ITH
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K1G
K60
K6V
K6~
K7-
KDC
KOV
KOW
LAS
LLZTM
M0C
M0N
M2O
M4Y
MA-
MK~
ML~
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
P19
P2P
P62
P9O
PF0
PQBIZ
PQQKQ
PROAC
PT4
PT5
Q2X
QOK
QOS
QWB
R4E
R89
R9I
RHV
RIG
RNI
RNS
ROL
RPX
RSV
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TH9
TN5
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UNUBA
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7R
Z7S
Z7X
Z7Z
Z81
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8U
Z8W
Z92
ZL0
ZMTXR
~8M
~EX
AACDK
AAEOY
AAJBT
AASML
AAYXX
ABAKF
ACAOD
ACDTI
ACZOJ
AEFQL
AEMSY
AFBBN
AGQEE
AGRTI
AIGIU
CITATION
H13
PQBZA
0U~
1-H
7SC
7XB
8AL
8FD
8FK
JQ2
L.-
L.0
L7M
L~C
L~D
MBDVC
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c392t-85c3e573923fad281a566490bb9d62bc4212323e8c422a6fef50c6b6d5b5f6cc3
IEDL.DBID AGYKE
ISSN 0010-485X
IngestDate Fri Aug 16 11:29:36 EDT 2024
Thu Oct 10 19:06:16 EDT 2024
Thu Sep 12 17:23:36 EDT 2024
Sat Dec 16 11:58:42 EST 2023
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords Information extraction
68 Computer Science
Large datasets
Cross-document coreference Resolution
68U15 Text processing; mathematical typography
68-02 Research exposition (monographs, survey articles)
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c392t-85c3e573923fad281a566490bb9d62bc4212323e8c422a6fef50c6b6d5b5f6cc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PQID 1880865925
PQPubID 48322
PageCount 37
ParticipantIDs proquest_miscellaneous_1893902701
proquest_journals_1880865925
crossref_primary_10_1007_s00607_016_0490_0
springer_journals_10_1007_s00607_016_0490_0
PublicationCentury 2000
PublicationDate 2017-04-01
PublicationDateYYYYMMDD 2017-04-01
PublicationDate_xml – month: 04
  year: 2017
  text: 2017-04-01
  day: 01
PublicationDecade 2010
PublicationPlace Vienna
PublicationPlace_xml – name: Vienna
– name: Wien
PublicationSubtitle Archives for Scientific Computing
PublicationTitle Computing
PublicationTitleAbbrev Computing
PublicationYear 2017
Publisher Springer Vienna
Springer Nature B.V
Publisher_xml – name: Springer Vienna
– name: Springer Nature B.V
References Köpcke, Thor, Rahm (CR34) 2010; 14
CR39
CR38
CR37
CR36
CR33
CR32
CR30
McCallum (CR1) 2005; 3
Weikum, Hoffart, Nakashole, Spaniol, Suchanek, Yosef (CR81) 2012; 35
CR48
CR47
Frakes, Baeza-Yates (CR92) 1992
CR46
CR45
CR44
CR42
Bizer, Heath, Berners-Lee (CR31) 2009; 5
CR41
CR40
Marrero, Sanchez-Cuadrado, Morato, Andreadakis (CR84) 2009; 41
Tasdemir, Merényi (CR88) 2011; 41
Anderberg (CR62) 1973
CR59
CR58
CR57
CR56
CR55
CR54
CR53
Màrquez, Recasens, Sapena (CR29) 2013; 47
CR51
CR50
Karaboga, Ozturk (CR52) 2011; 11
Miller, Fellbaum (CR71) 2007; 41
CR69
CR67
CR66
CR65
CR64
CR63
CR61
Nadeau, Sekine (CR77) 2007; 30
Kolb, Thor, Rahm (CR10) 2012; 5
CR60
Dutta, Weikum (CR4) 2015; 3
Hall, Dowling (CR49) 1980; 12
CR79
CR78
Chen, Ding, Tsai (CR26) 1998; 12
CR76
CR75
CR74
CR73
CR72
CR2
CR3
CR6
CR5
CR8
CR7
CR9
CR89
CR87
CR86
CR85
Dean, Ghemawat (CR14) 2008; 51
CR83
CR80
Ni, Zhang, Qiu, Wang (CR35) 2010; 1
Bagga, Baldwin (CR43) 1998; 1
CR19
CR18
CR17
CR16
CR15
CR13
CR12
CR11
CR99
CR98
CR97
CR96
CR95
CR94
CR93
CR91
CR90
Hachey, Grover, Tobin (CR68) 2012; 18
Miller (CR70) 1995; 38
CR28
CR27
CR25
CR24
CR23
CR22
CR21
CR20
Riddle (CR82) 1984; 9
490_CR11
490_CR99
490_CR98
490_CR97
490_CR96
490_CR15
490_CR13
490_CR12
490_CR19
L Màrquez (490_CR29) 2013; 47
490_CR18
WE Riddle (490_CR82) 1984; 9
490_CR17
490_CR16
A Bagga (490_CR43) 1998; 1
490_CR22
490_CR21
490_CR20
GA Miller (490_CR70) 1995; 38
490_CR25
490_CR24
490_CR23
490_CR28
490_CR27
490_CR76
490_CR75
490_CR74
490_CR79
490_CR78
490_CR80
Y Ni (490_CR35) 2010; 1
490_CR83
490_CR87
490_CR86
490_CR85
490_CR89
490_CR3
K Tasdemir (490_CR88) 2011; 41
490_CR5
490_CR6
490_CR7
M Marrero (490_CR84) 2009; 41
490_CR8
490_CR9
H-H Chen (490_CR26) 1998; 12
490_CR2
490_CR91
490_CR90
490_CR95
490_CR94
490_CR93
490_CR55
490_CR54
490_CR53
(490_CR92) 1992
490_CR59
490_CR58
490_CR57
490_CR56
B Hachey (490_CR68) 2012; 18
GA Miller (490_CR71) 2007; 41
490_CR61
490_CR60
490_CR66
490_CR65
490_CR64
490_CR63
490_CR69
490_CR67
J Dean (490_CR14) 2008; 51
C Bizer (490_CR31) 2009; 5
490_CR73
490_CR72
490_CR33
490_CR32
490_CR30
490_CR37
490_CR36
490_CR39
490_CR38
L Kolb (490_CR10) 2012; 5
490_CR40
490_CR44
MR Anderberg (490_CR62) 1973
490_CR42
490_CR41
490_CR48
PA Hall (490_CR49) 1980; 12
490_CR47
490_CR46
D Karaboga (490_CR52) 2011; 11
490_CR45
S Dutta (490_CR4) 2015; 3
G Weikum (490_CR81) 2012; 35
A McCallum (490_CR1) 2005; 3
H Köpcke (490_CR34) 2010; 14
D Nadeau (490_CR77) 2007; 30
490_CR51
490_CR50
References_xml – ident: CR45
– ident: CR22
– ident: CR97
– ident: CR74
– ident: CR39
– ident: CR16
– ident: CR51
– ident: CR54
– ident: CR80
– ident: CR8
– volume: 1
  start-page: 566
  year: 2010
  end-page: 581
  ident: CR35
  article-title: Enhancing the open-domain classification of named entity using linked open data
  publication-title: Int Semantic Web Conf
  contributor:
    fullname: Wang
– ident: CR25
– ident: CR42
– volume: 12
  start-page: 381
  issue: 4
  year: 1980
  end-page: 402
  ident: CR49
  article-title: Approximate string matching
  publication-title: ACM Comput Surv
  doi: 10.1145/356827.356830
  contributor:
    fullname: Dowling
– ident: CR19
– volume: 41
  start-page: 209
  issue: 2
  year: 2007
  end-page: 214
  ident: CR71
  article-title: Wordnet then and now
  publication-title: Lang Resour Eval
  doi: 10.1007/s10579-007-9044-6
  contributor:
    fullname: Fellbaum
– volume: 3
  start-page: 48
  issue: 9
  year: 2005
  end-page: 57
  ident: CR1
  article-title: Information extraction: distilling structured data from unstructured text
  publication-title: ACM Queue
  doi: 10.1145/1105664.1105679
  contributor:
    fullname: McCallum
– ident: CR11
– ident: CR57
– ident: CR60
– ident: CR36
– ident: CR85
– year: 1973
  ident: CR62
  publication-title: Cluster analysis for applications
  contributor:
    fullname: Anderberg
– volume: 3
  start-page: 15
  year: 2015
  end-page: 28
  ident: CR4
  article-title: Cross-document co-reference resolution using sample-based clustering with knowledge enrichment
  publication-title: Trans Assoc Comput Linguist
  contributor:
    fullname: Weikum
– ident: CR5
– volume: 14
  start-page: 23
  issue: 4
  year: 2010
  end-page: 31
  ident: CR34
  article-title: Learning-based approaches for matching web data entities
  publication-title: IEEE Internet Comput
  doi: 10.1109/MIC.2010.58
  contributor:
    fullname: Rahm
– volume: 1
  start-page: 563
  year: 1998
  end-page: 566
  ident: CR43
  article-title: Algorithms for scoring coreference chains
  publication-title: Int Conf Lang Resour Eval Workshop Linguist Coreference
  contributor:
    fullname: Baldwin
– volume: 41
  start-page: 47
  year: 2009
  end-page: 58
  ident: CR84
  article-title: Evaluation of named entity extraction systems
  publication-title: Adv Comput Linguistics
  contributor:
    fullname: Andreadakis
– volume: 35
  start-page: 46
  issue: 3
  year: 2012
  end-page: 64
  ident: CR81
  article-title: Big data methods for computational linguistics
  publication-title: IEEE Data Eng Bull
  contributor:
    fullname: Yosef
– ident: CR18
– ident: CR66
– ident: CR91
– ident: CR47
– ident: CR72
– volume: 30
  start-page: 3
  issue: 1
  year: 2007
  end-page: 26
  ident: CR77
  article-title: A survey of named entity recognition and classification
  publication-title: Lingvisticae Investigationes
  doi: 10.1075/li.30.1.03nad
  contributor:
    fullname: Sekine
– ident: CR89
– ident: CR30
– ident: CR33
– ident: CR6
– ident: CR86
– volume: 47
  start-page: 661
  issue: 3
  year: 2013
  end-page: 694
  ident: CR29
  article-title: Coreference resolution: an empirical study based on semeval-2010 shared task 1
  publication-title: Lang Resour Eval
  doi: 10.1007/s10579-012-9194-z
  contributor:
    fullname: Sapena
– ident: CR63
– ident: CR27
– ident: CR69
– ident: CR94
– ident: CR44
– volume: 41
  start-page: 1039
  issue: 4
  year: 2011
  end-page: 1053
  ident: CR88
  article-title: A validity index for prototype-based clustering of data sets with complex cluster structures
  publication-title: IEEE Trans
  contributor:
    fullname: Merényi
– volume: 11
  start-page: 652
  issue: 1
  year: 2011
  end-page: 657
  ident: CR52
  article-title: A novel clustering approach: artificial bee colony (abc) algorithm
  publication-title: Appl Soft Comput
  doi: 10.1016/j.asoc.2009.12.025
  contributor:
    fullname: Ozturk
– ident: CR3
– ident: CR38
– ident: CR13
– ident: CR55
– ident: CR83
– ident: CR41
– ident: CR24
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  end-page: 113
  ident: CR14
  article-title: Mapreduce: simplified data processing on large clusters
  publication-title: Commun. ACM
  doi: 10.1145/1327452.1327492
  contributor:
    fullname: Ghemawat
– ident: CR93
– ident: CR87
– ident: CR12
– ident: CR61
– ident: CR58
– ident: CR21
– ident: CR46
– ident: CR96
– ident: CR67
– ident: CR75
– ident: CR15
– ident: CR50
– ident: CR9
– volume: 12
  start-page: 75
  issue: 1
  year: 1998
  end-page: 85
  ident: CR26
  article-title: Named entity extraction for information retrieval
  publication-title: Comput Process Orient Lang
  contributor:
    fullname: Tsai
– ident: CR32
– volume: 38
  start-page: 39
  issue: 11
  year: 1995
  end-page: 41
  ident: CR70
  article-title: Wordnet: a lexical database for english
  publication-title: Commun ACM
  doi: 10.1145/219717.219748
  contributor:
    fullname: Miller
– ident: CR78
– ident: CR64
– ident: CR99
– volume: 9
  start-page: 21
  issue: 2
  year: 1984
  end-page: 37
  ident: CR82
  article-title: The magic number eighteen plus or minus three: a study of software technology maturation
  publication-title: ACM SIGSOFT Softw Eng Note
  doi: 10.1145/1010925.1010927
  contributor:
    fullname: Riddle
– ident: CR95
– ident: CR2
– ident: CR37
– ident: CR53
– volume: 5
  start-page: 1878
  issue: 12
  year: 2012
  end-page: 1881
  ident: CR10
  article-title: Dedoop: efficient deduplication with hadoop
  publication-title: Proc VLDB Endow
  doi: 10.14778/2367502.2367527
  contributor:
    fullname: Rahm
– volume: 5
  start-page: 1
  issue: 3
  year: 2009
  end-page: 22
  ident: CR31
  article-title: Linked data—the story so far
  publication-title: Int J Semant Web Inf Syst
  doi: 10.4018/jswis.2009081901
  contributor:
    fullname: Berners-Lee
– year: 1992
  ident: CR92
  publication-title: Information retrieval: data structures and algorithms
  contributor:
    fullname: Baeza-Yates
– ident: CR79
– ident: CR56
– ident: CR40
– ident: CR98
– ident: CR23
– ident: CR48
– ident: CR73
– volume: 18
  start-page: 21
  issue: 1
  year: 2012
  end-page: 59
  ident: CR68
  article-title: Datasets for generic relation extraction
  publication-title: Nat Lang Eng
  doi: 10.1017/S1351324911000106
  contributor:
    fullname: Tobin
– ident: CR65
– ident: CR90
– ident: CR17
– ident: CR7
– ident: CR59
– ident: CR76
– ident: CR28
– ident: CR20
– ident: 490_CR25
– ident: 490_CR90
  doi: 10.3115/1613715.1613795
– ident: 490_CR54
– ident: 490_CR48
– ident: 490_CR8
– volume: 41
  start-page: 209
  issue: 2
  year: 2007
  ident: 490_CR71
  publication-title: Lang Resour Eval
  doi: 10.1007/s10579-007-9044-6
  contributor:
    fullname: GA Miller
– volume: 38
  start-page: 39
  issue: 11
  year: 1995
  ident: 490_CR70
  publication-title: Commun ACM
  doi: 10.1145/219717.219748
  contributor:
    fullname: GA Miller
– volume: 35
  start-page: 46
  issue: 3
  year: 2012
  ident: 490_CR81
  publication-title: IEEE Data Eng Bull
  contributor:
    fullname: G Weikum
– ident: 490_CR85
  doi: 10.1109/ICSC.2014.31
– ident: 490_CR5
– ident: 490_CR72
– ident: 490_CR45
– volume: 3
  start-page: 15
  year: 2015
  ident: 490_CR4
  publication-title: Trans Assoc Comput Linguist
  doi: 10.1162/tacl_a_00119
  contributor:
    fullname: S Dutta
– volume-title: Information retrieval: data structures and algorithms
  year: 1992
  ident: 490_CR92
– ident: 490_CR27
  doi: 10.3115/1620754.1620778
– volume: 5
  start-page: 1
  issue: 3
  year: 2009
  ident: 490_CR31
  publication-title: Int J Semant Web Inf Syst
  doi: 10.4018/jswis.2009081901
  contributor:
    fullname: C Bizer
– ident: 490_CR65
– ident: 490_CR86
– ident: 490_CR20
  doi: 10.1145/1242572.1242667
– ident: 490_CR13
– ident: 490_CR67
  doi: 10.3115/1219840.1219885
– ident: 490_CR6
– volume: 5
  start-page: 1878
  issue: 12
  year: 2012
  ident: 490_CR10
  publication-title: Proc VLDB Endow
  doi: 10.14778/2367502.2367527
  contributor:
    fullname: L Kolb
– ident: 490_CR79
– ident: 490_CR36
  doi: 10.3115/1218955.1219031
– ident: 490_CR94
– ident: 490_CR9
  doi: 10.3115/1557690.1557767
– ident: 490_CR46
– ident: 490_CR51
  doi: 10.3115/1219044.1219066
– ident: 490_CR97
  doi: 10.3115/1610075.1610158
– ident: 490_CR42
  doi: 10.3115/1219840.1219841
– volume: 41
  start-page: 47
  year: 2009
  ident: 490_CR84
  publication-title: Adv Comput Linguistics
  contributor:
    fullname: M Marrero
– ident: 490_CR37
– ident: 490_CR89
  doi: 10.1007/s00453-001-0010-1
– volume: 12
  start-page: 75
  issue: 1
  year: 1998
  ident: 490_CR26
  publication-title: Comput Process Orient Lang
  contributor:
    fullname: H-H Chen
– ident: 490_CR58
  doi: 10.1145/1007568.1007652
– ident: 490_CR78
  doi: 10.1075/bct.19
– ident: 490_CR3
– ident: 490_CR99
– ident: 490_CR50
  doi: 10.1109/ITCC.2002.1000354
– ident: 490_CR12
  doi: 10.1007/978-3-642-03070-3_52
– volume: 47
  start-page: 661
  issue: 3
  year: 2013
  ident: 490_CR29
  publication-title: Lang Resour Eval
  doi: 10.1007/s10579-012-9194-z
  contributor:
    fullname: L Màrquez
– volume: 12
  start-page: 381
  issue: 4
  year: 1980
  ident: 490_CR49
  publication-title: ACM Comput Surv
  doi: 10.1145/356827.356830
  contributor:
    fullname: PA Hall
– ident: 490_CR80
– ident: 490_CR74
– ident: 490_CR15
– ident: 490_CR40
  doi: 10.1137/1.9781611972795.32
– ident: 490_CR23
  doi: 10.3115/1613715.1613756
– ident: 490_CR11
  doi: 10.3115/1699571.1699635
– ident: 490_CR53
  doi: 10.3115/1218955.1218973
– ident: 490_CR21
  doi: 10.3115/1609067.1609072
– ident: 490_CR83
  doi: 10.1109/ESEM.2011.36
– ident: 490_CR64
  doi: 10.1007/s00778-008-0098-x
– ident: 490_CR73
– ident: 490_CR96
– ident: 490_CR18
  doi: 10.1145/2488388.2488411
– ident: 490_CR44
– ident: 490_CR16
  doi: 10.1007/978-3-319-15350-6_3
– volume-title: Cluster analysis for applications
  year: 1973
  ident: 490_CR62
  contributor:
    fullname: MR Anderberg
– ident: 490_CR87
– ident: 490_CR41
– ident: 490_CR32
  doi: 10.3115/1220575.1220588
– volume: 9
  start-page: 21
  issue: 2
  year: 1984
  ident: 490_CR82
  publication-title: ACM SIGSOFT Softw Eng Note
  doi: 10.1145/1010925.1010927
  contributor:
    fullname: WE Riddle
– ident: 490_CR91
  doi: 10.1145/1376616.1376726
– ident: 490_CR55
– volume: 1
  start-page: 563
  year: 1998
  ident: 490_CR43
  publication-title: Int Conf Lang Resour Eval Workshop Linguist Coreference
  contributor:
    fullname: A Bagga
– ident: 490_CR76
– ident: 490_CR60
  doi: 10.3115/1072399.1072405
– ident: 490_CR66
  doi: 10.1109/ICDE.2002.994694
– ident: 490_CR39
  doi: 10.1109/ICDE.2011.5767865
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  ident: 490_CR14
  publication-title: Commun. ACM
  doi: 10.1145/1327452.1327492
  contributor:
    fullname: J Dean
– ident: 490_CR24
– ident: 490_CR38
– ident: 490_CR61
– volume: 30
  start-page: 3
  issue: 1
  year: 2007
  ident: 490_CR77
  publication-title: Lingvisticae Investigationes
  doi: 10.1075/li.30.1.03nad
  contributor:
    fullname: D Nadeau
– ident: 490_CR93
– ident: 490_CR69
– ident: 490_CR59
  doi: 10.1007/11573036_36
– ident: 490_CR17
– ident: 490_CR30
– volume: 14
  start-page: 23
  issue: 4
  year: 2010
  ident: 490_CR34
  publication-title: IEEE Internet Comput
  doi: 10.1109/MIC.2010.58
  contributor:
    fullname: H Köpcke
– ident: 490_CR75
– ident: 490_CR98
– ident: 490_CR2
– ident: 490_CR47
  doi: 10.1186/1471-2105-9-S9-S11
– volume: 41
  start-page: 1039
  issue: 4
  year: 2011
  ident: 490_CR88
  publication-title: IEEE Trans
  contributor:
    fullname: K Tasdemir
– volume: 11
  start-page: 652
  issue: 1
  year: 2011
  ident: 490_CR52
  publication-title: Appl Soft Comput
  doi: 10.1016/j.asoc.2009.12.025
  contributor:
    fullname: D Karaboga
– ident: 490_CR63
  doi: 10.1007/978-3-540-76298-0_52
– volume: 18
  start-page: 21
  issue: 1
  year: 2012
  ident: 490_CR68
  publication-title: Nat Lang Eng
  doi: 10.1017/S1351324911000106
  contributor:
    fullname: B Hachey
– volume: 3
  start-page: 48
  issue: 9
  year: 2005
  ident: 490_CR1
  publication-title: ACM Queue
  doi: 10.1145/1105664.1105679
  contributor:
    fullname: A McCallum
– ident: 490_CR33
– ident: 490_CR56
– ident: 490_CR19
  doi: 10.1145/1376616.1376746
– ident: 490_CR57
  doi: 10.3115/1219840.1219917
– ident: 490_CR7
– ident: 490_CR95
– ident: 490_CR22
– volume: 1
  start-page: 566
  year: 2010
  ident: 490_CR35
  publication-title: Int Semantic Web Conf
  contributor:
    fullname: Y Ni
– ident: 490_CR28
  doi: 10.3115/1220575.1220579
SSID ssj0002389
Score 2.346251
Snippet Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Publisher
StartPage 313
SubjectTerms Algorithms
Analysis
Artificial Intelligence
Comparative analysis
Computer Appl. in Administrative Data Processing
Computer Communication Networks
Computer Science
Datasets
Information retrieval
Information sources
Information systems
Information Systems Applications (incl.Internet)
Intelligence
Knowledge acquisition
Literature reviews
Management of crises
Mathematical models
Natural language
Public sector
Real time
Recommendations
Searching
Software Engineering
Studies
Systematic review
Tasks
Taxonomy
SummonAdditionalLinks – databaseName: ProQuest Technology Collection
  dbid: 8FG
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV07T8MwED5BWWDgUUAECjISE8giL6fxhBCiVEhlolK3yHZsFpQU0v5_zs4LkGBzlMRO7uzzZ9_5O4AriSA5QDtHZZTGNJapoSlTEQ24CfKUhcY4kqTZSzKdx88Ltmg23KomrLK1ic5Q56Wye-S3ljcstT5Adrf8oDZrlPWuNik0NmErsEx49qT45KmzxDgd1fAXbU2cskXr1fQdiWjigi5xPR1zn_o_56UebP7yj7ppZ7IPuw1eJPe1gg9gQxdD2GtzMZBmaA5hZ9bxr1aH8HZPeopmUh9PIaLIierJvvG65iMhpSHuoyhKYW13C4nq04_g2233JHW66crVtCrL9-oI5pPH14cpbVIqUIVAaOX0oNkYy5EReZgGAuEc_r-UPE9Cqax_OAojnWIpFInRhvkqkUnOJDOJUtExDIqy0CdAEmEU92OF-CyMBa6kha9CLbgyfBzlRntw3Qo0W9bMGVnHkeykn9noMiv9zPdg1Io8awZRlfUq9-Cyu43d3_o0RKHLtX2GRxyX1n7gwU2rqm9V_NXg6f8NnsF2aOdvF6IzgsHqc63PEX2s5IXrYl8ssddp
  priority: 102
  providerName: ProQuest
Title A systematic review and comparative analysis of cross-document coreference resolution methods and tools
URI https://link.springer.com/article/10.1007/s00607-016-0490-0
https://www.proquest.com/docview/1880865925
https://search.proquest.com/docview/1893902701
Volume 99
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT9swFH9i5bIdxlZAhLHKSDsNpUqcOHWOHWpBQ60mRKXuFNmOzaEomZb0sr-eZ-ejMLYDpziJ48h-_vg9v-ffA_giESSHOM_5MuKxH0tufM5U5IepCXPOqDGOJGmxTK5X8fc1W-8B7bcuis24s0i6ibo_62aZQ6yXJCrAcRr4qKbvMxuVegD706ufN7N-_sVFqAG9OMPEnK07W-a_Cnm-Gu0g5l9WUbfYzA-aA4CV4yi0Piab8baWY_XnJYPjK-rxAd632JNMm87yEfZ0MYSDLq4DaYf5EN4tei7X6hDup2RH90yaoy5EFDlRO-JwvG-4TUhpiKuqn5dqa3ceidqFMsGvu65OmtDVlSupLsuH6ghW89nd5bXfhmfwFYKq2slUswmmIyNyykOB0BCrJGWaJ1Qqa2uOaKQ5pqhIjDYsUIlMciaZSZSKjmFQlIU-AZIIo9IgVoj1aCxQKxeBolqkyqSTKDfag6-dmLJfDQtH1vMtuwbNrKeabdAs8OCsE2TWDsgqs7Rz3JqQmQfn_WscStY-Igpdbm2eNEpRTQ9CDy464T0p4n8_PH1V7k_wllpo4Lx_zmBQ_97qzwhsajmCN3x-NWr7M16_zZY_bvHpik4fAVg08wM
link.rule.ids 315,786,790,12792,21416,27957,27958,33408,33409,33779,33780,41116,41558,42185,42627,43635,43840,52146,52269,74392,74659
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LT4QwEJ7oelAPvo3rsyaeNI3lUYSTUbOb9bEbYzTZG2lL68WAyu7_d1pgURO9QYAWZtqZr53hG4ATiSDZQztHZRCHNJSxoTFXAfUS42Ux941xJEnDUTR4Ce_GfFxvuJV1WmVjE52hzgpl98jPLW9YbGOA_PL9g9qqUTa6WpfQmIeFEPtiHVi47o0en2a2GB1SBYDR2oQxHzdxTeZoRCOXdokr6jBhlP30TC3c_BUhdY6nvwYrNWIkV5WK12FO5xuw2lRjIPXk3IDl4YyBtdyE1yvSkjST6gcVIvKMqJbuG88rRhJSGOJeiqIcpna_kKi2AAk-3QxQUhWcLl1Lk6J4K7fgpd97vhnQuqgCVQiFJk4Tml_gcWBE5seeQECH3y9lkkW-VDZCHPiBjvHIF5HRhjMVySjjkptIqWAbOnmR6x0gkTAqYaFChOaHAtfSgilfi0SZ5CLIjO7CaSPQ9L3izkhnLMlO-qnNL7PST1kX9huRp_U0KtNW6V04nl3GCWCjGiLXxdTekwQJLq6Z14WzRlXfmvirw93_OzyCxcHz8CF9uB3d78GSb725S9jZh87kc6oPEItM5GE94L4AGr3buQ
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LT8MwDLZ4SAgOPAaI8QwSJ1BEXynpCU3AeE8cmLRblaQJF9QOuv1_nLRdAQlurdomrZ3YX2L3M8CJRJDso52jMuQRjSQ3lDMVUj8xfsZZYIwjSXoexHfD6GHERnX-U1mnVTY20RnqrFB2j_zc8oZxGwNk56ZOi3i57l-OP6itIGUjrXU5jXlYtCDbVjPg_duZVUbXVEFhtDsRZ6Mmwuk5QtHYJWDi2jpKPOr99FEt8PwVK3UuqL8OqzV2JL1K2Rswp_MOrDV1GUg9TTuw8jzjYi034a1HWrpmUv2qQkSeEdUSf-N5xU1CCkPcS1GUyNTuHBLVliLBp5uhSqrS06VraVIU7-UWDPs3r1d3tC6vQBWKZ-J0otkFHodGZAH3BUI7_H4pkywOpLKx4jAINcejQMRGG-apWMYZk8zESoXbsJAXud4BEgujEi9SiNWCSOCqWngq0CJRJrkIM6O7cNoINB1XLBrpjC_ZST-1mWZW-qnXhf1G5Gk9ocq0VX8XjmeXcSrY-IbIdTG19yRhgstsz-_CWaOqb0381eHu_x0ewRKOtPTpfvC4B8uBdesuc2cfFiafU32AoGQiD91o-wKbFN6I
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+systematic+review+and+comparative+analysis+of+cross-document+coreference+resolution+methods+and+tools&rft.jtitle=Computing&rft.au=Beheshti%2C+Seyed-Mehdi-Reza&rft.au=Benatallah%2C+Boualem&rft.au=Venugopal%2C+Srikumar&rft.au=Ryu%2C+Seung+Hwan&rft.date=2017-04-01&rft.pub=Springer+Vienna&rft.issn=0010-485X&rft.eissn=1436-5057&rft.volume=99&rft.issue=4&rft.spage=313&rft.epage=349&rft_id=info:doi/10.1007%2Fs00607-016-0490-0&rft.externalDocID=10_1007_s00607_016_0490_0
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0010-485X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0010-485X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0010-485X&client=summon