Evaluation of Naive Bayes and Support Vector Machines for Wikipedia

Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we c...

Full description

Saved in:

Bibliographic Details
Published in	Applied artificial intelligence Vol. 31; no. 9-10; pp. 733 - 744
Main Authors	Mocherla, Sridhar, Danehy, Alexander, Impey, Christopher
Format	Journal Article
Language	English
Published	Philadelphia Taylor & Francis 26.11.2017 Taylor & Francis Ltd Taylor & Francis Group
Subjects	Bayesian analysis Categories Classification Random sampling Support vector machines Text editing
Online Access	Get full text
ISSN	0883-9514 1087-6545
DOI	10.1080/08839514.2018.1440907

Cover

Loading…

Abstract	Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.
AbstractList	Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.
Author	Impey, Christopher Danehy, Alexander Mocherla, Sridhar
Author_xml	– sequence: 1 givenname: Sridhar orcidid: 0000-0001-8311-0421 surname: Mocherla fullname: Mocherla, Sridhar email: srmocher@email.arizona.edu organization: Steward Observatory, University of Arizona – sequence: 2 givenname: Alexander surname: Danehy fullname: Danehy, Alexander organization: Steward Observatory, University of Arizona – sequence: 3 givenname: Christopher surname: Impey fullname: Impey, Christopher organization: College of Science, University of Arizona
BookMark	eNqFkM1u1DAUhS1UJKaFR0CKxDrDdWzHttgAo0IrtbDgb2nd-Ac8pHFwPEXz9ng6hQULWFn2Pd-51ndKTqY0eUKeUlhTUPAclGJaUL7ugKo15Rw0yAdkVYey7QUXJ2R1yLSH0CNyuixbAKBS0hXZnN_iuMMS09Sk0LzDeOub17j3S4OTaz7s5jnl0nz2tqTcXKP9Fqc6C_XyJX6Ps3cRH5OHAcfFP7k_z8inN-cfNxft1fu3l5tXV63lwEvrmECQgQ4eFGOuF471yvogmFB-AB60dwJtUF3XC6kgUKUtSCv1YIeBanZGLo-9LuHWzDneYN6bhNHcPaT81WAu0Y7eCClV72QnRaC880FJzUTQVvEBWbVQu54du-acfuz8Usw27fJUv2-qxU4D1bKrKXFM2ZyWJfvwZysFc3Bvfrs_UMrcu6_ci784G8ud5JIxjv-lXx7pOFXPN_gz5dGZgvsx5ZBxsnEx7N8VvwDrOZyt
CitedBy_id	crossref_primary_10_1177_0361198120967943 crossref_primary_10_1080_08839514_2019_1673037 crossref_primary_10_1002_srin_202200617 crossref_primary_10_12720_jait_15_4_519_531
Cites_doi	10.7551/mitpress/1130.003.0016 10.1017/CBO9780511809071 10.1016/j.ipm.2016.07.003 10.1007/s10115-008-0152-4 10.1007/BF00994018 10.14778/2536222.2536237
ContentType	Journal Article
Copyright	2017 Taylor & Francis 2017 2017 Taylor & Francis
Copyright_xml	– notice: 2017 Taylor & Francis 2017 – notice: 2017 Taylor & Francis
DBID	AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D DOA
DOI	10.1080/08839514.2018.1440907
DatabaseName	CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional
DatabaseTitleList	Computer and Information Systems Abstracts
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1087-6545
EndPage	744
ExternalDocumentID	oai_doaj_org_article_57786d7275f142ef87935f9c84ba3514 10_1080_08839514_2018_1440907 1440907
Genre	Article
GrantInformation_xml	– fundername: Howard Hughes Medical Institute grantid: HHMI-4215580 funderid: 10.13039/100000011
GroupedDBID	.4S .7F .DC .QJ 0YH 23M 2DF 30N 4.4 5GY 5VS 8VB AAENE AAJMT ABCCY ABDBF ABFIM ABHAV ABIVO ABPEM ABTAI ACGEJ ACGFS ACGOD ACNCT ACTIO ACUHS ADCVX ADXPE AEISY AEMOZ AENEX AEOZL AEPSL AEYOC AFKVX AGMYJ AHQJS AIJEM AJWEG AKVCP ALMA_UNASSIGNED_HOLDINGS ALQZU AQRUH ARCSS AVBZW AWYRJ BLEHA CCCUG CE4 CS3 DGEBU DKSSO EAP EBR EBS EBU ECS EDO EJD EMK EPL EST ESX E~A E~B F5P GTTXZ H13 HF~ HZ~ H~9 H~P I-F IPNFZ J.P K1G KYCEM M4Z MK~ NA5 NX~ O9- P2P PQQKQ QWB RIG S-T SNACF TFL TFT TFW TH9 TNC TTHFI TUS TWF UT5 UU3 ZL0 ~S~ AAFWJ AAYXX ADMLS AIYEW CITATION TDBHL 7SC 8FD JQ2 L7M L~C L~D GROUPED_DOAJ
ID	FETCH-LOGICAL-c404t-d35a07f1be0833d65d368cef5358eb04f9ed5acf82265780f189c07c79bcbb193
IEDL.DBID	DOA
ISSN	0883-9514
IngestDate	Wed Aug 27 01:29:18 EDT 2025 Sun Jun 29 12:24:34 EDT 2025 Tue Jul 01 02:03:59 EDT 2025 Thu Apr 24 23:03:35 EDT 2025 Wed Dec 25 09:03:36 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	9-10
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c404t-d35a07f1be0833d65d368cef5358eb04f9ed5acf82265780f189c07c79bcbb193
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-8311-0421
OpenAccessLink	https://doaj.org/article/57786d7275f142ef87935f9c84ba3514
PQID	2012901972
PQPubID	53050
PageCount	12
ParticipantIDs	crossref_citationtrail_10_1080_08839514_2018_1440907 crossref_primary_10_1080_08839514_2018_1440907 doaj_primary_oai_doaj_org_article_57786d7275f142ef87935f9c84ba3514 informaworld_taylorfrancis_310_1080_08839514_2018_1440907 proquest_journals_2012901972
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2017-11-26
PublicationDateYYYYMMDD	2017-11-26
PublicationDate_xml	– month: 11 year: 2017 text: 2017-11-26 day: 26
PublicationDecade	2010
PublicationPlace	Philadelphia
PublicationPlace_xml	– name: Philadelphia
PublicationTitle	Applied artificial intelligence
PublicationYear	2017
Publisher	Taylor & Francis Taylor & Francis Ltd Taylor & Francis Group
Publisher_xml	– name: Taylor & Francis – name: Taylor & Francis Ltd – name: Taylor & Francis Group
References	Pedregosa F. (CIT0011) 2011; 12 CIT0012 CIT0014 CIT0002 CIT0004 CIT0007 CIT0009
References_xml	– volume: 12 start-page: 2825 year: 2011 ident: CIT0011 publication-title: Jmlr – ident: CIT0012 doi: 10.7551/mitpress/1130.003.0016 – ident: CIT0007 doi: 10.1017/CBO9780511809071 – ident: CIT0009 doi: 10.1016/j.ipm.2016.07.003 – ident: CIT0014 doi: 10.1007/s10115-008-0152-4 – ident: CIT0002 doi: 10.1007/BF00994018 – ident: CIT0004 doi: 10.14778/2536222.2536237
SSID	ssj0001771
Score	2.1986303
Snippet	Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with...
SourceID	doaj proquest crossref informaworld
SourceType	Open Website Aggregation Database Enrichment Source Index Database Publisher
StartPage	733
SubjectTerms	Bayesian analysis Categories Classification Random sampling Support vector machines Text editing
Title	Evaluation of Naive Bayes and Support Vector Machines for Wikipedia
URI	https://www.tandfonline.com/doi/abs/10.1080/08839514.2018.1440907 https://www.proquest.com/docview/2012901972 https://doaj.org/article/57786d7275f142ef87935f9c84ba3514
Volume	31
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LTyQhECbqyYu6q8ZZXcPBKwaa93E1GrOJc_J1I0BDYjSj0fHgv7eg6cnEPcxlr3STVIp6fEDxFUInzAchYpcIS1wREXJPvKWRxI6arKixuZIkXU_V1a34-yAfllp9lZqwgR54UBxs2LVRPWRZmZnoUjZgUDLbaETwpQq9RF_IeeNmqsVgputWC1yIE8AQYny7U1i1YawMlbIuU-82beklu5SVKnn_N-rSf0J1zT-XO2irAUf8ZxD4B1pLs59oe2zKgJuP7qLziwV_N37JeOohnOEz_5nesZ_1uHTxBMSN7-ppPb6utZTwDWTA949Pj6_lJckeur28uDm_Iq1TAomCijnpufRUZxYSICreK9lzZWLKkkuTAhXZpl76mAENKHBRmpmxkeqobYghAIbbRxuzl1k6QBjwmRaCixR1D5sZao1iUQYTvdZScT5BYtSUi41GvHSzeHZsZBttCnZFwa4peIJOF9NeBx6NVRPOyjIsfi402HUAjMM143CrjGOC7PIiunk9BclDyxLHVwhwNK64a379Xn4oF89Wd7_-h3yHaLMrMIEx0qkjtDF_-0i_AeTMwzFa53R6XK36C-947yc
linkProvider	Directory of Open Access Journals
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9wwELYQPdBL6VNdXvWhVy92_D4CAm1bdk_QcrNix64QaBdBONBfX4-TrHgIceCaeJz4MePP9sw3CH1ntRciVJGwyBURPjWktjSQUFGTFDU2FZKk6UxNTsXPM3l2LxYG3CphD506oohiq0G54TB6cInbzZrBMzKAIxFmyvWkhYDyN9IqDVkMOJ0trTHTZdMFIgRkhiie56p5sD4VGv9HJKZPjHZZiY7WURja0DmgXIxvWz8O_x7RO76uke_Rux6o4r1uZn1AK3H-Ea0PSSBwbxM-oYPDJV84XiQ8q7P5xPv1XbzB-dsYsoZmhI9_l9sBPC2-m_ldbin-c35xfgWRK5_R6dHhycGE9JkZSBBUtKThsqY6MR8zguONkg1XJsQkuTTRU5FsbGQdUkYfKpsEmpixgeqgrQ_eZ8z4Ba3OF_P4FeGMB7UQXMSgm7x5otYoFqQ3odZaKs5HSAzj4UJPWw7ZMy4dG9hN-45y0FGu76gRGi_FrjrejpcE9mGwl4WBdrs8WFz_db0WOwlse02GfDIxUcVksnWTyQYjfA0hESNk708V15ZTl9SlSHH8hR_YGuaV6-3IDRSAi26rq41XVP0NrU1Opsfu-Mfs1yZ6WwEyYYxUaguttte3cTvjqtbvFMX5DxMcD_A
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwELZQKyEulPIQCwV84OrFjt9H-liVR1ccKHCzYseuqqLdVTc9wK_H4zgrKKp66DXxOPFjxp_tmW8QestaL0RoImGRKyJ86khraSChoSYpamwqJEknc3V8Kj7-kKM34bq6VcIeOg1EEcVWg3KvujR6xL3LisEzMIATEWbK7aSFePJtBeThEMVB5xtjzHTZc4EIAZkxiOemav5ZngqL_zUO0_9sdlmIZjvIj00Y_E8uple9n4bf19gd79TGR-hhhan4_TCvdtG9uHiMdsYUELhahCfo4GjDFo6XCc_bbDzxfvsrrnH-NIacoRnf42_lbgCfFM_N_C43FH8_vzhfQdzKU3Q6O_p6cExqXgYSBBU96bhsqU7Mx4zfeKdkx5UJMUkuTfRUJBs72YaUsYfKBoEmZmygOmjrg_cZMT5DW4vlIj5HOKNBLQQXMegub52oNYoF6U1otZaK8wkS43C4UEnLIXfGT8dGbtPaUQ46ytWOmqDpRmw1sHbcJrAPY70pDKTb5cHy8sxVHXYSuPa6DPhkYqKJyWTbJpMNRvgWAiImyP49U1xfzlzSkCDF8Vt-YG-cVq5akTUUgGtuq5sXd6j6Dbr_5XDmPn-Yf3qJHjQASxgjjdpDW_3lVXyVQVXvXxe1-QOXng6U
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluation+of+Naive+Bayes+and+Support+Vector+Machines+for+Wikipedia&rft.jtitle=Applied+artificial+intelligence&rft.au=Mocherla%2C+Sridhar&rft.au=Danehy%2C+Alexander&rft.au=Impey%2C+Christopher&rft.date=2017-11-26&rft.pub=Taylor+%26+Francis+Ltd&rft.issn=0883-9514&rft.eissn=1087-6545&rft.volume=31&rft.issue=9-10&rft.spage=733&rft_id=info:doi/10.1080%2F08839514.2018.1440907&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0883-9514&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0883-9514&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0883-9514&client=summon