Categorization of Bangla Medical Text Documents Based on Hybrid Internal Feature
This paper aims to develop an automatic text categorization system that classifies Bangla medical and non-medical text documents based on two primary features, that is, word length and the presence of English equivalent words in the text documents. To start with, it has been shown that based on the...
Saved in:
Published in | Computational Intelligence, Communications, and Business Analytics Vol. 1031; pp. 181 - 192 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Singapore
Springer Singapore Pte. Limited
2019
Springer Singapore |
Series | Communications in Computer and Information Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper aims to develop an automatic text categorization system that classifies Bangla medical and non-medical text documents based on two primary features, that is, word length and the presence of English equivalent words in the text documents. To start with, it has been shown that based on the word length and the number of English equivalent words present in a particular text, Bangla medical text documents can be identified among other text documents of any domain. SGD (Stochastic Gradient Descent) classification algorithm is used and an accuracy of 97.75% has been achieved. Comparisons have also been done with other commonly used classifiers to test the system from which it has been observed that SGD performs better than those classifiers. |
---|---|
ISBN: | 9789811385803 9811385807 |
ISSN: | 1865-0929 1865-0937 |
DOI: | 10.1007/978-981-13-8581-0_15 |