Semantic classification method for network Tibetan corpus
Tibetan web pages appear enormously. It is meaningful that the information processing technology is utilized to find the useful knowledge from the Tibetan web information. Tibetan semantic ontology can enrich the Tibetan digital resource and is helpful to improve the information processing performan...
Saved in:
Published in | Cluster computing Vol. 20; no. 1; pp. 155 - 165 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.03.2017
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Tibetan web pages appear enormously. It is meaningful that the information processing technology is utilized to find the useful knowledge from the Tibetan web information. Tibetan semantic ontology can enrich the Tibetan digital resource and is helpful to improve the information processing performance. In this paper, semantic classification of Tibetan network corpus is studied. Firstly Tibetan web pages are collected. Secondly preprocessing is conducted to extract the useful information from Web pages. Thirdly the word segmentation and text representation are introduced. Finally the text similarity classification algorithm is proposed to classify the text. During the experiment, the comparison between semantic classification and non semantic classification is conducted. The results show that the semantic classification performance is obviously superior to non semantic classification. This means that making full use of ontology semantic relationship can greatly enhance the classification accuracy. The research is useful and helpful to the study of Tibetan semantic information processing. |
---|---|
ISSN: | 1386-7857 1573-7543 |
DOI: | 10.1007/s10586-017-0742-6 |