Semantic classification method for network Tibetan corpus

Tibetan web pages appear enormously. It is meaningful that the information processing technology is utilized to find the useful knowledge from the Tibetan web information. Tibetan semantic ontology can enrich the Tibetan digital resource and is helpful to improve the information processing performan...

Full description

Saved in:
Bibliographic Details
Published inCluster computing Vol. 20; no. 1; pp. 155 - 165
Main Authors Xu, Gui-Xian, Wang, Chang-Zhi, Wang, Li-Hui, Zhou, Yu-Hong, Li, Wei-Kang, Xu, Hao, Huang, Qing
Format Journal Article
LanguageEnglish
Published New York Springer US 01.03.2017
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Tibetan web pages appear enormously. It is meaningful that the information processing technology is utilized to find the useful knowledge from the Tibetan web information. Tibetan semantic ontology can enrich the Tibetan digital resource and is helpful to improve the information processing performance. In this paper, semantic classification of Tibetan network corpus is studied. Firstly Tibetan web pages are collected. Secondly preprocessing is conducted to extract the useful information from Web pages. Thirdly the word segmentation and text representation are introduced. Finally the text similarity classification algorithm is proposed to classify the text. During the experiment, the comparison between semantic classification and non semantic classification is conducted. The results show that the semantic classification performance is obviously superior to non semantic classification. This means that making full use of ontology semantic relationship can greatly enhance the classification accuracy. The research is useful and helpful to the study of Tibetan semantic information processing.
ISSN:1386-7857
1573-7543
DOI:10.1007/s10586-017-0742-6