Fulltext Geocoding Versus Spatial Metadata for Large Text Archives: Towards a Geographically Enriched Wikipedia

Presents the basic workflow of a fulltext geocoding system which uses software algorithms to parse through a document, identify textual mentions of locations, and use databases of places and their approximate locations known as gazetteers to convert those mentions into mappable geographic coordinate...

Full description

Saved in:
Bibliographic Details
Published inD-Lib magazine Vol. 18; no. 9/10
Main Author Leetaru, Kalev H.
Format Journal Article
LanguageEnglish
Published 01.09.2012
Subjects
Online AccessGet full text
ISSN1082-9873
1082-9873
DOI10.1045/september2012-leetaru

Cover

Abstract Presents the basic workflow of a fulltext geocoding system which uses software algorithms to parse through a document, identify textual mentions of locations, and use databases of places and their approximate locations known as gazetteers to convert those mentions into mappable geographic coordinates. Overviews the United States National Geospatial-Intelligence Agency's GEOnet Names Server (GNS) (NGA) and the United States Geological Survey's Geographic Names Information System (GNIS) gazetteers that lie at the heart of nearly every global geocoding system. Provides a case study comparing manually-specified geographic indexing terms versus fulltext geocoding on the English-language edition of Wikipedia to demonstrate the significant advantages of automated approaches, including finding that previous studies of Wikipedia's spatial focus using its human-provided spatial metadata have erroneously identified Europe as its focal point because of bias in the underlying metdata. Source: National Library of New Zealand Te Puna Matauranga o Aotearoa, licensed by the Department of Internal Affairs for re-use under the Creative Commons Attribution 3.0 New Zealand Licence.
AbstractList Presents the basic workflow of a fulltext geocoding system which uses software algorithms to parse through a document, identify textual mentions of locations, and use databases of places and their approximate locations known as gazetteers to convert those mentions into mappable geographic coordinates. Overviews the United States National Geospatial-Intelligence Agency's GEOnet Names Server (GNS) (NGA) and the United States Geological Survey's Geographic Names Information System (GNIS) gazetteers that lie at the heart of nearly every global geocoding system. Provides a case study comparing manually-specified geographic indexing terms versus fulltext geocoding on the English-language edition of Wikipedia to demonstrate the significant advantages of automated approaches, including finding that previous studies of Wikipedia's spatial focus using its human-provided spatial metadata have erroneously identified Europe as its focal point because of bias in the underlying metdata. Source: National Library of New Zealand Te Puna Matauranga o Aotearoa, licensed by the Department of Internal Affairs for re-use under the Creative Commons Attribution 3.0 New Zealand Licence.
The rise of 'born geographic' information and the increasing creation and mediation of information in a spatial context has given rise to a demand for extracting and indexing the spatial information in large textual archives. Spatial indexing of archives has traditionally been a manual process, with human editors reading and assigning country-level metadata indicating the major spatial focus of a document. The demand for subnational saturation indexing of all geographic mentions in a document, coupled with the need to scale to archives totaling hundreds of billions of pages or those accessioning hundreds of millions of new items a day requires automated approaches. Fulltext geocoding refers to the process of using software algorithms to parse through a document, identify textual mentions of locations, and using databases of places and their approximate locations known as gazetteers, to convert those mentions into mappable geographic coordinates. The basic workflow of a fulltext geocoding system is presented, together with an overview of the GNS and GNIS gazetteers that lie at the heart of nearly every global geocoding system. Finally, a case study comparing manually-specified geographic indexing terms versus fulltext geocoding on the English-language edition of Wikipedia demonstrates the significant advantages of automated approaches, including finding that previous studies of Wikipedia's spatial focus using its human-provided spatial metadata have erroneously identified Europe as its focal point because of bias in the underlying metadata. Adapted from the source document.
Author Leetaru, Kalev H.
Author_xml – sequence: 1
  givenname: Kalev H.
  surname: Leetaru
  fullname: Leetaru, Kalev H.
BackLink https://natlib-primo.hosted.exlibrisgroup.com/primo-explore/search?query=any,contains,997417853602837&tab=innz&search_scope=INNZ&vid=NLNZ&offset=0$$DView this record in NLNZ
BookMark eNqFkUtv1DAUhS1UJNqBn4BkiQ2btH4kcdKuqqotSINYMC3L6Ma-mTH12MF2-uDXN6PpgrJhde7ifGdxvyNy4INHQj5ydsxZWZ0kHDNue4yCcVE4xAxxekMOOWtE0TZKHvx1vyNHKf1iTLCybA5JuJqcy_iY6TUGHYz1a3qLMU2J_hghW3D027xnIAMdQqRLiGukqx1wHvXG3mM6pavwANEkCruRdYRxYzU490QvfbR6g4b-tHd2RGPhPXk7gEv44SUX5ObqcnXxpVh-v_56cb4stFD1VEAvoS_LqjJG1xyEaTlIzUwlOSBDJZAL0SJoxZiUA3KDtW76qhLDIHqs5YJ83u-OMfyeMOVua5NG58BjmFLHZanUDM_0gnzaV73zfzrrDT7O2baq5KqpZM1EI9XcOtu3dAwpRRw6bfP8oOBzBOs6zrqdjO6VjO5FxkxX_9BjtFuIT__hngHrQJgn
CODEN DLMAF7
CitedBy_id crossref_primary_10_1007_s10708_015_9622_x
crossref_primary_10_4018_IJSSMET_2020070101
crossref_primary_10_3390_ijgi4042246
crossref_primary_10_1080_08838151_2020_1796391
crossref_primary_10_2139_ssrn_2957362
crossref_primary_10_3390_fi16030087
ContentType Journal Article
DBID AAYXX
CITATION
GOM
LETOP
E3H
F2A
DOI 10.1045/september2012-leetaru
DatabaseName CrossRef
Index New Zealand
Index New Zealand (Open Access)
Library & Information Sciences Abstracts (LISA)
Library & Information Science Abstracts (LISA)
DatabaseTitle CrossRef
Library and Information Science Abstracts (LISA)
DatabaseTitleList
Library and Information Science Abstracts (LISA)
DeliveryMethod fulltext_linktorsrc
Discipline Library & Information Science
EISSN 1082-9873
ExternalDocumentID 997417853602837
10_1045_september2012_leetaru
GroupedDBID .DC
29F
2WC
5GY
77K
9I1
AAFWJ
AAKPC
AAYXX
ACGFO
ALMA_UNASSIGNED_HOLDINGS
C1A
CITATION
CS3
DU5
EBS
EJD
GX1
KQ8
OK1
OVT
P2P
RNS
TR2
XSB
~02
BBORY
GOM
LETOP
M~E
77I
E3H
F2A
ID FETCH-LOGICAL-c276u-ab3ab4455ddc61a2d91a3c0d531ae0e72e1229eac70033fe1de6c8b552ff2be63
ISSN 1082-9873
IngestDate Fri Sep 05 03:42:13 EDT 2025
Thu Sep 21 18:41:42 EDT 2023
Tue Jul 01 00:51:10 EDT 2025
Thu Apr 24 23:10:04 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 9/10
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c276u-ab3ab4455ddc61a2d91a3c0d531ae0e72e1229eac70033fe1de6c8b552ff2be63
Notes Includes illustration, maps, references
Includes links to related electronic resources
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://natlib-primo.hosted.exlibrisgroup.com/primo-explore/search?query=any,contains,997417853602837&tab=innz&search_scope=INNZ&vid=NLNZ&offset=0
PQID 1347770003
PQPubID 23477
ParticipantIDs proquest_miscellaneous_1347770003
nlnz_indexnz_997417853602837
crossref_citationtrail_10_1045_september2012_leetaru
crossref_primary_10_1045_september2012_leetaru
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20120901
PublicationDateYYYYMMDD 2012-09-01
PublicationDate_xml – month: 09
  year: 2012
  text: 20120901
  day: 01
PublicationDecade 2010
PublicationTitle D-Lib magazine
PublicationYear 2012
SSID ssj0020448
Score 2.0748262
Snippet Presents the basic workflow of a fulltext geocoding system which uses software algorithms to parse through a document, identify textual mentions of locations,...
The rise of 'born geographic' information and the increasing creation and mediation of information in a spatial context has given rise to a demand for...
SourceID proquest
nlnz
crossref
SourceType Aggregation Database
Index Database
Enrichment Source
SubjectTerms Algorithms
Automatic indexing
Data processing
Gazetteers
Geographical location codes
Information storage and retrieval systems
Location
Subject indexing
Workflow software
Title Fulltext Geocoding Versus Spatial Metadata for Large Text Archives: Towards a Geographically Enriched Wikipedia
URI https://natlib-primo.hosted.exlibrisgroup.com/primo-explore/search?query=any,contains,997417853602837&tab=innz&search_scope=INNZ&vid=NLNZ&offset=0
https://www.proquest.com/docview/1347770003
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pb9MwGLVgXLggGCAKG_IkxCVKSRwnTrghYJsQg8sm9RY5toOmhbRak0nbX8_3OXa60mr8uKSVJTtR3svnz_Z7NiFvElbHdZ2pUOrchBD9dChT-B6FVjAQk5VOrVH45Ft2fMa_zNLZ6uBC6y7pqqm62eor-R9UoQxwRZfsPyA7NgoF8B_whSsgDNe_wtgrN4IfZq7m1p6CKot-GSxRKG2NIZ1EEahVEzao-g5sBek3nLW6DiudXQYS27E7WCNyzXUA90WlqIbYcXG-QI_J7Vz2U_j1vAp-3t6helD2dPKyH5Qajbly7gc3sYAKjcJPLAyxELKDsMiHk0amZkvZRgDtnUR8Iy5D4ogImEVn8JwTe7tmeJ5VR-QX34--n6wX2u62gJFPLCC7yDAlEvfJAyaEXZk_mo2qHhZxPrgf3XN60xZP3229-1o6stM27c1Gp2wzjdPH5JEbItAPA95PyD3T7pJ9ZzChb6lzkCGtqAvNT8nCc4GOXKADF6jjAvVcoFCdWi5QW8Fzgb6njglU0nUmUM8EOjLhGTk7_Hz68Th0p2mEiomsD2WVyIrzNNVaZbFkuohloiINQViayAhmYsYK6IcFnu9Xm1ibTOVVmrK6ZpXJkufwfuateUGoSLJa5CoSdVTzggsp4thInkNbPFEVmxDuX2qp3FbzeOJJU1rJA0_LNSxKh8WETMdqi2GvlT9V2EPESruxKPz-RpEJOfBIlhA0cSVMtmbeL0v0TwuB8wEv727iFXm4-jT2yE532Zt9yEK76rXl3S8s4Y91
linkProvider Geneva Foundation for Medical Education and Research
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fulltext+geocoding+versus+spatial+metadata+for+large+text+archives+%3A+towards+a+geographically+enriched+Wikipedia&rft.jtitle=D-Lib+magazine&rft.au=Leetaru%2C+Kalev+H&rft.date=2012-09-01&rft.issn=1082-9873&rft.eissn=1082-9873&rft.volume=18&rft.issue=9&rft_id=info:doi/10.1045%2Fseptember2012-leetaru&rft.externalDBID=GOM&rft.externalDocID=997417853602837
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1082-9873&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1082-9873&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1082-9873&client=summon