Evaluation of Naive Bayes and Support Vector Machines for Wikipedia

Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we c...

Full description

Saved in:

Bibliographic Details
Published in	Applied artificial intelligence Vol. 31; no. 9-10; pp. 733 - 744
Main Authors	Mocherla, Sridhar, Danehy, Alexander, Impey, Christopher
Format	Journal Article
Language	English
Published	Philadelphia Taylor & Francis 26.11.2017 Taylor & Francis Ltd Taylor & Francis Group
Subjects	Bayesian analysis Categories Classification Random sampling Support vector machines Text editing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0883-9514 1087-6545
DOI:	10.1080/08839514.2018.1440907