Fulltext Geocoding Versus Spatial Metadata for Large Text Archives: Towards a Geographically Enriched Wikipedia
Presents the basic workflow of a fulltext geocoding system which uses software algorithms to parse through a document, identify textual mentions of locations, and use databases of places and their approximate locations known as gazetteers to convert those mentions into mappable geographic coordinate...
Saved in:
Published in | D-Lib magazine Vol. 18; no. 9/10 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
01.09.2012
|
Subjects | |
Online Access | Get full text |
ISSN | 1082-9873 1082-9873 |
DOI | 10.1045/september2012-leetaru |
Cover
Abstract | Presents the basic workflow of a fulltext geocoding system which uses software algorithms to parse through a document, identify textual mentions of locations, and use databases of places and their approximate locations known as gazetteers to convert those mentions into mappable geographic coordinates. Overviews the United States National Geospatial-Intelligence Agency's GEOnet Names Server (GNS) (NGA) and the United States Geological Survey's Geographic Names Information System (GNIS) gazetteers that lie at the heart of nearly every global geocoding system. Provides a case study comparing manually-specified geographic indexing terms versus fulltext geocoding on the English-language edition of Wikipedia to demonstrate the significant advantages of automated approaches, including finding that previous studies of Wikipedia's spatial focus using its human-provided spatial metadata have erroneously identified Europe as its focal point because of bias in the underlying metdata. Source: National Library of New Zealand Te Puna Matauranga o Aotearoa, licensed by the Department of Internal Affairs for re-use under the Creative Commons Attribution 3.0 New Zealand Licence. |
---|---|
AbstractList | Presents the basic workflow of a fulltext geocoding system which uses software algorithms to parse through a document, identify textual mentions of locations, and use databases of places and their approximate locations known as gazetteers to convert those mentions into mappable geographic coordinates. Overviews the United States National Geospatial-Intelligence Agency's GEOnet Names Server (GNS) (NGA) and the United States Geological Survey's Geographic Names Information System (GNIS) gazetteers that lie at the heart of nearly every global geocoding system. Provides a case study comparing manually-specified geographic indexing terms versus fulltext geocoding on the English-language edition of Wikipedia to demonstrate the significant advantages of automated approaches, including finding that previous studies of Wikipedia's spatial focus using its human-provided spatial metadata have erroneously identified Europe as its focal point because of bias in the underlying metdata. Source: National Library of New Zealand Te Puna Matauranga o Aotearoa, licensed by the Department of Internal Affairs for re-use under the Creative Commons Attribution 3.0 New Zealand Licence. The rise of 'born geographic' information and the increasing creation and mediation of information in a spatial context has given rise to a demand for extracting and indexing the spatial information in large textual archives. Spatial indexing of archives has traditionally been a manual process, with human editors reading and assigning country-level metadata indicating the major spatial focus of a document. The demand for subnational saturation indexing of all geographic mentions in a document, coupled with the need to scale to archives totaling hundreds of billions of pages or those accessioning hundreds of millions of new items a day requires automated approaches. Fulltext geocoding refers to the process of using software algorithms to parse through a document, identify textual mentions of locations, and using databases of places and their approximate locations known as gazetteers, to convert those mentions into mappable geographic coordinates. The basic workflow of a fulltext geocoding system is presented, together with an overview of the GNS and GNIS gazetteers that lie at the heart of nearly every global geocoding system. Finally, a case study comparing manually-specified geographic indexing terms versus fulltext geocoding on the English-language edition of Wikipedia demonstrates the significant advantages of automated approaches, including finding that previous studies of Wikipedia's spatial focus using its human-provided spatial metadata have erroneously identified Europe as its focal point because of bias in the underlying metadata. Adapted from the source document. |
Author | Leetaru, Kalev H. |
Author_xml | – sequence: 1 givenname: Kalev H. surname: Leetaru fullname: Leetaru, Kalev H. |
BackLink | https://natlib-primo.hosted.exlibrisgroup.com/primo-explore/search?query=any,contains,997417853602837&tab=innz&search_scope=INNZ&vid=NLNZ&offset=0$$DView this record in NLNZ |
BookMark | eNqFkUtv1DAUhS1UJNqBn4BkiQ2btH4kcdKuqqotSINYMC3L6Ma-mTH12MF2-uDXN6PpgrJhde7ifGdxvyNy4INHQj5ydsxZWZ0kHDNue4yCcVE4xAxxekMOOWtE0TZKHvx1vyNHKf1iTLCybA5JuJqcy_iY6TUGHYz1a3qLMU2J_hghW3D027xnIAMdQqRLiGukqx1wHvXG3mM6pavwANEkCruRdYRxYzU490QvfbR6g4b-tHd2RGPhPXk7gEv44SUX5ObqcnXxpVh-v_56cb4stFD1VEAvoS_LqjJG1xyEaTlIzUwlOSBDJZAL0SJoxZiUA3KDtW76qhLDIHqs5YJ83u-OMfyeMOVua5NG58BjmFLHZanUDM_0gnzaV73zfzrrDT7O2baq5KqpZM1EI9XcOtu3dAwpRRw6bfP8oOBzBOs6zrqdjO6VjO5FxkxX_9BjtFuIT__hngHrQJgn |
CODEN | DLMAF7 |
CitedBy_id | crossref_primary_10_1007_s10708_015_9622_x crossref_primary_10_4018_IJSSMET_2020070101 crossref_primary_10_3390_ijgi4042246 crossref_primary_10_1080_08838151_2020_1796391 crossref_primary_10_2139_ssrn_2957362 crossref_primary_10_3390_fi16030087 |
ContentType | Journal Article |
DBID | AAYXX CITATION GOM LETOP E3H F2A |
DOI | 10.1045/september2012-leetaru |
DatabaseName | CrossRef Index New Zealand Index New Zealand (Open Access) Library & Information Sciences Abstracts (LISA) Library & Information Science Abstracts (LISA) |
DatabaseTitle | CrossRef Library and Information Science Abstracts (LISA) |
DatabaseTitleList | Library and Information Science Abstracts (LISA) |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Library & Information Science |
EISSN | 1082-9873 |
ExternalDocumentID | 997417853602837 10_1045_september2012_leetaru |
GroupedDBID | .DC 29F 2WC 5GY 77K 9I1 AAFWJ AAKPC AAYXX ACGFO ALMA_UNASSIGNED_HOLDINGS C1A CITATION CS3 DU5 EBS EJD GX1 KQ8 OK1 OVT P2P RNS TR2 XSB ~02 BBORY GOM LETOP M~E 77I E3H F2A |
ID | FETCH-LOGICAL-c276u-ab3ab4455ddc61a2d91a3c0d531ae0e72e1229eac70033fe1de6c8b552ff2be63 |
ISSN | 1082-9873 |
IngestDate | Fri Sep 05 03:42:13 EDT 2025 Thu Sep 21 18:41:42 EDT 2023 Tue Jul 01 00:51:10 EDT 2025 Thu Apr 24 23:10:04 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 9/10 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c276u-ab3ab4455ddc61a2d91a3c0d531ae0e72e1229eac70033fe1de6c8b552ff2be63 |
Notes | Includes illustration, maps, references Includes links to related electronic resources ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
OpenAccessLink | https://natlib-primo.hosted.exlibrisgroup.com/primo-explore/search?query=any,contains,997417853602837&tab=innz&search_scope=INNZ&vid=NLNZ&offset=0 |
PQID | 1347770003 |
PQPubID | 23477 |
ParticipantIDs | proquest_miscellaneous_1347770003 nlnz_indexnz_997417853602837 crossref_citationtrail_10_1045_september2012_leetaru crossref_primary_10_1045_september2012_leetaru |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20120901 |
PublicationDateYYYYMMDD | 2012-09-01 |
PublicationDate_xml | – month: 09 year: 2012 text: 20120901 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | D-Lib magazine |
PublicationYear | 2012 |
SSID | ssj0020448 |
Score | 2.0748262 |
Snippet | Presents the basic workflow of a fulltext geocoding system which uses software algorithms to parse through a document, identify textual mentions of locations,... The rise of 'born geographic' information and the increasing creation and mediation of information in a spatial context has given rise to a demand for... |
SourceID | proquest nlnz crossref |
SourceType | Aggregation Database Index Database Enrichment Source |
SubjectTerms | Algorithms Automatic indexing Data processing Gazetteers Geographical location codes Information storage and retrieval systems Location Subject indexing Workflow software |
Title | Fulltext Geocoding Versus Spatial Metadata for Large Text Archives: Towards a Geographically Enriched Wikipedia |
URI | https://natlib-primo.hosted.exlibrisgroup.com/primo-explore/search?query=any,contains,997417853602837&tab=innz&search_scope=INNZ&vid=NLNZ&offset=0 https://www.proquest.com/docview/1347770003 |
Volume | 18 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pb9MwGLVgXLggGCAKG_IkxCVKSRwnTrghYJsQg8sm9RY5toOmhbRak0nbX8_3OXa60mr8uKSVJTtR3svnz_Z7NiFvElbHdZ2pUOrchBD9dChT-B6FVjAQk5VOrVH45Ft2fMa_zNLZ6uBC6y7pqqm62eor-R9UoQxwRZfsPyA7NgoF8B_whSsgDNe_wtgrN4IfZq7m1p6CKot-GSxRKG2NIZ1EEahVEzao-g5sBek3nLW6DiudXQYS27E7WCNyzXUA90WlqIbYcXG-QI_J7Vz2U_j1vAp-3t6helD2dPKyH5Qajbly7gc3sYAKjcJPLAyxELKDsMiHk0amZkvZRgDtnUR8Iy5D4ogImEVn8JwTe7tmeJ5VR-QX34--n6wX2u62gJFPLCC7yDAlEvfJAyaEXZk_mo2qHhZxPrgf3XN60xZP3229-1o6stM27c1Gp2wzjdPH5JEbItAPA95PyD3T7pJ9ZzChb6lzkCGtqAvNT8nCc4GOXKADF6jjAvVcoFCdWi5QW8Fzgb6njglU0nUmUM8EOjLhGTk7_Hz68Th0p2mEiomsD2WVyIrzNNVaZbFkuohloiINQViayAhmYsYK6IcFnu9Xm1ibTOVVmrK6ZpXJkufwfuateUGoSLJa5CoSdVTzggsp4thInkNbPFEVmxDuX2qp3FbzeOJJU1rJA0_LNSxKh8WETMdqi2GvlT9V2EPESruxKPz-RpEJOfBIlhA0cSVMtmbeL0v0TwuB8wEv727iFXm4-jT2yE532Zt9yEK76rXl3S8s4Y91 |
linkProvider | Geneva Foundation for Medical Education and Research |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fulltext+geocoding+versus+spatial+metadata+for+large+text+archives+%3A+towards+a+geographically+enriched+Wikipedia&rft.jtitle=D-Lib+magazine&rft.au=Leetaru%2C+Kalev+H&rft.date=2012-09-01&rft.issn=1082-9873&rft.eissn=1082-9873&rft.volume=18&rft.issue=9&rft_id=info:doi/10.1045%2Fseptember2012-leetaru&rft.externalDBID=GOM&rft.externalDocID=997417853602837 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1082-9873&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1082-9873&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1082-9873&client=summon |