基于混合余弦相似度的中文文本层次关系挖掘
层次关系是中文文本概念间存在的最为重要的关系之一.对层次关系的正确判定是进行领域本体自动构建、文本数据挖掘等信息处理的基础研究内容.先将概念间可能存在的候选层次关系罗列出来,构建词性序列语义余弦相似度和关系词语余弦相似度混合的核函数分类器,将概念间层次关系的挖掘问题转换为分类问题;再通过对文本数据进行模板标注来训练分类器;最后输入预处理后的中文文本,使用核函数分类器对候选层次关系进行判定.以空军武器装备领域的中文文本为测试数据,通过实验表明,该方法简单可靠,具有较好的正确率和召回率....
Saved in:
Published in | 计算机应用研究 Vol. 34; no. 5; pp. 1406 - 1409 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
西北工业大学计算机学院,西安,710072
2017
|
Subjects | |
Online Access | Get full text |
ISSN | 1001-3695 |
DOI | 10.3969/j.issn.1001-3695.2017.05.029 |
Cover
Loading…
Summary: | 层次关系是中文文本概念间存在的最为重要的关系之一.对层次关系的正确判定是进行领域本体自动构建、文本数据挖掘等信息处理的基础研究内容.先将概念间可能存在的候选层次关系罗列出来,构建词性序列语义余弦相似度和关系词语余弦相似度混合的核函数分类器,将概念间层次关系的挖掘问题转换为分类问题;再通过对文本数据进行模板标注来训练分类器;最后输入预处理后的中文文本,使用核函数分类器对候选层次关系进行判定.以空军武器装备领域的中文文本为测试数据,通过实验表明,该方法简单可靠,具有较好的正确率和召回率. |
---|---|
Bibliography: | 51-1196/TP Hierarchy relation was one of the most important relationships between the Chinese text concepts. The correct de-termination of the hierarchical relationship was the basic research content of the domain text data mining and so on. Firstly, this paper listed the possibly candidate hierarchytion classifier which was based on the semantic cosine s imilarity of part-of-speech ing problems could be transformed into a hierarchy of classification. Then it trained the classifier by the nally ,it entered the Chinese text into the preprocessed,using the kernel function classifier to determintween the candidate hierarchy relations. Using the Chinese text in the field of Air Force Weaponsta, experiments show that the method is simple and rel iable, with good accuracy and recall rate. Dong Yangyi, Li Weihua,Yu Hui(School of Computer Science,Northwestern Poly technical University, Xi'an 710072, China) natural language processing ; hierarchical relations ; text mining ; mixed cosine similarity ; ontolog construction |
ISSN: | 1001-3695 |
DOI: | 10.3969/j.issn.1001-3695.2017.05.029 |