中文文本的地理空间关系标注
为有效地解决当前相关标准和标准数据匮乏的问题,通过分析中文文本中地理空间关系描述的语言特点,提出中文文本的地理空间关系标注体系,并以GATE(General Architecture for Text Engineering)为标注工具,以《中国大百科全书中国地理》为文本数据源,采用交叉校验方式建立了地理空间关系标注语料库。实现了中文文本中地理空间关系描述的结构化表达,提供了地理空间关系信息抽取的标准化测试数据。...
Saved in:
Published in | 测绘学报 Vol. 41; no. 3; pp. 468 - 474 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
2012
|
Subjects | |
Online Access | Get full text |
ISSN | 1001-1595 |
Cover
Summary: | 为有效地解决当前相关标准和标准数据匮乏的问题,通过分析中文文本中地理空间关系描述的语言特点,提出中文文本的地理空间关系标注体系,并以GATE(General Architecture for Text Engineering)为标注工具,以《中国大百科全书中国地理》为文本数据源,采用交叉校验方式建立了地理空间关系标注语料库。实现了中文文本中地理空间关系描述的结构化表达,提供了地理空间关系信息抽取的标准化测试数据。 |
---|---|
Bibliography: | ZHANG Xueying, ZHANG Chunju, ZHU Shaonan (Institute of Geographical Science, Nanjing Normal University, Nanjing 210046, China) 11-2089/P Corpus annotation is a task to provide both reference and training material for method development and benchmark data sets annotated witha given annotation scheme. After analysis of the linguistic characteristics, an annotation scheme is proposed for markup linguistic expressions for spatial relations in Chinese text. And then a natural language processing software-GATE(General Architecture for Text Engineering) is introduced as the anno- tation tool. Based on the proposed annotation scheme, a corpus with "Encyclopedia of China Geography" as the source data is annotated by means of cross-validation to so~ve the problem of annotation inconsistency, In order to realize the structurized representation of geographical spatial relations described in natural language, and to provide standard training and test data for their extraction. natural languages Chinese texts spatial relation |
ISSN: | 1001-1595 |