一种基于有限状态机的中文地址标准化方法

由于中文的内涵多义性和形式多样性的特点,使中文地址长期以来存在着难以标准化的问题,对进一步开展地址定位、区域网格分析和社情、舆情定位等工作都造成了较大的障碍。针对这个问题提出了基于地址分级模型和有限状态机驱动的新方法,并通过软件开发对这种方法的地址识别率和匹配准确率进行了验证,实验结果显示该方法对中文地址能够达到96%左右的识别率,匹配准确率也达到了85%左右,并且还能实现标准地址库的自动化更新。因此,采取该方法能够有效地解决中文地址标准化困难的问题,具有显著的实用性和研究参考价值。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 33; no. 12; pp. 3691 - 3695
Main Author 罗明 黄海量
Format Journal Article
LanguageChinese
Published 上海财经大学信息管理与工程学院,上海200433 2016
上海财经大学上海市金融信息技术研究重点实验室,上海200433
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:由于中文的内涵多义性和形式多样性的特点,使中文地址长期以来存在着难以标准化的问题,对进一步开展地址定位、区域网格分析和社情、舆情定位等工作都造成了较大的障碍。针对这个问题提出了基于地址分级模型和有限状态机驱动的新方法,并通过软件开发对这种方法的地址识别率和匹配准确率进行了验证,实验结果显示该方法对中文地址能够达到96%左右的识别率,匹配准确率也达到了85%左右,并且还能实现标准地址库的自动化更新。因此,采取该方法能够有效地解决中文地址标准化困难的问题,具有显著的实用性和研究参考价值。
Bibliography:51-1196/TP
Luo Minga,b, Huang Hailianga,b ( a. College oflnformation Management & Engineering, b. Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance & Economic, Shanghai 200433, China)
Because ambiguity and diversity are always existed in Chinese,these lead it to be a hard work in Chinese address standardization for a long time, and caused a huge difficulty in further carry out precise locating, geographic grid analysis and social situation and public sentiment locate. In order to solve this problem, this paper proposed a new method of Chinese address standardization, which based on address gradation and finite state machine theory. It verified the recognition ratio and correctly matching ratio of this method by software developing work. The experiment shows that this method can achieve more than 96% of recognition ratio, and more than 85 % matching ratio, it also can realize automatic stand address updating work. So this new method can solve the difficult problem in Chi
ISSN:1001-3695
DOI:10.3969/j.issn.1001-3695.2016.12.038