A Text Structural Analysis Model for Address Extraction
Textual data is being generated at an enormous pace in today’s world. Analyzing this data to extract actionable information is one of the biggest challenges faced by researchers. In this paper we tackle the problem of extracting addresses from unstructured texts. Postal address denotes the unique ge...
Saved in:
Published in | Natural Language Processing and Information Systems Vol. 13286; pp. 255 - 266 |
---|---|
Main Authors | , , , , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2022
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Textual data is being generated at an enormous pace in today’s world. Analyzing this data to extract actionable information is one of the biggest challenges faced by researchers. In this paper we tackle the problem of extracting addresses from unstructured texts. Postal address denotes the unique geographical information of a place, person or an organization. Extracting this information automatically with high precision is helpful for public administration & services, location based service companies, geo-spatial mapping, delivery companies, recommendation systems for tourists, OCR based event creation etc. Address formats varies widely based on countries and regions. Even within a same region people can choose to adopt different formats and notations, hence extracting addresses from text becomes a challenging and interesting task in NLP research. In this paper we propose a text structural analysis model, consisting of a novel gazetteer assisted CNN architecture. It uses structural pattern detection capabilities of a CNN to empirically prove that for address extraction task structural analysis is more efficient than pure semantic approach. We further did an ablation study to find the importance of external knowledge for our architecture. |
---|---|
ISBN: | 3031084721 9783031084720 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-031-08473-7_23 |