A Text Structural Analysis Model for Address Extraction

Textual data is being generated at an enormous pace in today’s world. Analyzing this data to extract actionable information is one of the biggest challenges faced by researchers. In this paper we tackle the problem of extracting addresses from unstructured texts. Postal address denotes the unique ge...

Full description

Saved in:
Bibliographic Details
Published inNatural Language Processing and Information Systems Vol. 13286; pp. 255 - 266
Main Authors Kumar, Rishabh, Jakhar, Kamal, Tiwari, Hemant, Purre, Naresh, Kumar, Priyanshu, Prakash, Jiban, Vala, Vanraj
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2022
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Textual data is being generated at an enormous pace in today’s world. Analyzing this data to extract actionable information is one of the biggest challenges faced by researchers. In this paper we tackle the problem of extracting addresses from unstructured texts. Postal address denotes the unique geographical information of a place, person or an organization. Extracting this information automatically with high precision is helpful for public administration & services, location based service companies, geo-spatial mapping, delivery companies, recommendation systems for tourists, OCR based event creation etc. Address formats varies widely based on countries and regions. Even within a same region people can choose to adopt different formats and notations, hence extracting addresses from text becomes a challenging and interesting task in NLP research. In this paper we propose a text structural analysis model, consisting of a novel gazetteer assisted CNN architecture. It uses structural pattern detection capabilities of a CNN to empirically prove that for address extraction task structural analysis is more efficient than pure semantic approach. We further did an ablation study to find the importance of external knowledge for our architecture.
ISBN:3031084721
9783031084720
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-031-08473-7_23