Transfer of Supervision for Improved Address Standardization

Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners can be easily trained for different data sources. However, training requires labeling large corpora for each data source which is time consuming and labor intensive to creat...

Full description

Saved in:
Bibliographic Details
Published in2010 20th International Conference on Pattern Recognition pp. 2178 - 2181
Main Authors Kothari, Govind, Faruquie, Tanveer A, Subramaniam, L Venkata, Prasad, K Hima, Mohania, Mukesh K
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Address Cleansing is very challenging, particularly for geographies with variability in writing addresses. Supervised learners can be easily trained for different data sources. However, training requires labeling large corpora for each data source which is time consuming and labor intensive to create. We propose a method to automatically transfer supervision from a given labeled source to a target unlabeled source using a hierarchical dirichlet process. Each dirichlet process models data from one source. The shared component distribution across these dirichlet processes captures the semantic relation between data sources. A feature projection on the component distributions from multiple sources is used to transfer supervision.
ISBN:1424475422
9781424475421
ISSN:1051-4651
2831-7475
DOI:10.1109/ICPR.2010.533