Evaluation of Naive Bayes and Support Vector Machines for Wikipedia

Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we c...

Full description

Saved in:
Bibliographic Details
Published inApplied artificial intelligence Vol. 31; no. 9-10; pp. 733 - 744
Main Authors Mocherla, Sridhar, Danehy, Alexander, Impey, Christopher
Format Journal Article
LanguageEnglish
Published Philadelphia Taylor & Francis 26.11.2017
Taylor & Francis Ltd
Taylor & Francis Group
Subjects
Online AccessGet full text
ISSN0883-9514
1087-6545
DOI10.1080/08839514.2018.1440907

Cover

Loading…
Abstract Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.
AbstractList Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.
Author Impey, Christopher
Danehy, Alexander
Mocherla, Sridhar
Author_xml – sequence: 1
  givenname: Sridhar
  orcidid: 0000-0001-8311-0421
  surname: Mocherla
  fullname: Mocherla, Sridhar
  email: srmocher@email.arizona.edu
  organization: Steward Observatory, University of Arizona
– sequence: 2
  givenname: Alexander
  surname: Danehy
  fullname: Danehy, Alexander
  organization: Steward Observatory, University of Arizona
– sequence: 3
  givenname: Christopher
  surname: Impey
  fullname: Impey, Christopher
  organization: College of Science, University of Arizona
BookMark eNqFkM1u1DAUhS1UJKaFR0CKxDrDdWzHttgAo0IrtbDgb2nd-Ac8pHFwPEXz9ng6hQULWFn2Pd-51ndKTqY0eUKeUlhTUPAclGJaUL7ugKo15Rw0yAdkVYey7QUXJ2R1yLSH0CNyuixbAKBS0hXZnN_iuMMS09Sk0LzDeOub17j3S4OTaz7s5jnl0nz2tqTcXKP9Fqc6C_XyJX6Ps3cRH5OHAcfFP7k_z8inN-cfNxft1fu3l5tXV63lwEvrmECQgQ4eFGOuF471yvogmFB-AB60dwJtUF3XC6kgUKUtSCv1YIeBanZGLo-9LuHWzDneYN6bhNHcPaT81WAu0Y7eCClV72QnRaC880FJzUTQVvEBWbVQu54du-acfuz8Usw27fJUv2-qxU4D1bKrKXFM2ZyWJfvwZysFc3Bvfrs_UMrcu6_ci784G8ud5JIxjv-lXx7pOFXPN_gz5dGZgvsx5ZBxsnEx7N8VvwDrOZyt
CitedBy_id crossref_primary_10_1177_0361198120967943
crossref_primary_10_1080_08839514_2019_1673037
crossref_primary_10_1002_srin_202200617
crossref_primary_10_12720_jait_15_4_519_531
Cites_doi 10.7551/mitpress/1130.003.0016
10.1017/CBO9780511809071
10.1016/j.ipm.2016.07.003
10.1007/s10115-008-0152-4
10.1007/BF00994018
10.14778/2536222.2536237
ContentType Journal Article
Copyright 2017 Taylor & Francis 2017
2017 Taylor & Francis
Copyright_xml – notice: 2017 Taylor & Francis 2017
– notice: 2017 Taylor & Francis
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOA
DOI 10.1080/08839514.2018.1440907
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1087-6545
EndPage 744
ExternalDocumentID oai_doaj_org_article_57786d7275f142ef87935f9c84ba3514
10_1080_08839514_2018_1440907
1440907
Genre Article
GrantInformation_xml – fundername: Howard Hughes Medical Institute
  grantid: HHMI-4215580
  funderid: 10.13039/100000011
GroupedDBID .4S
.7F
.DC
.QJ
0YH
23M
2DF
30N
4.4
5GY
5VS
8VB
AAENE
AAJMT
ABCCY
ABDBF
ABFIM
ABHAV
ABIVO
ABPEM
ABTAI
ACGEJ
ACGFS
ACGOD
ACNCT
ACTIO
ACUHS
ADCVX
ADXPE
AEISY
AEMOZ
AENEX
AEOZL
AEPSL
AEYOC
AFKVX
AGMYJ
AHQJS
AIJEM
AJWEG
AKVCP
ALMA_UNASSIGNED_HOLDINGS
ALQZU
AQRUH
ARCSS
AVBZW
AWYRJ
BLEHA
CCCUG
CE4
CS3
DGEBU
DKSSO
EAP
EBR
EBS
EBU
ECS
EDO
EJD
EMK
EPL
EST
ESX
E~A
E~B
F5P
GTTXZ
H13
HF~
HZ~
H~9
H~P
I-F
IPNFZ
J.P
K1G
KYCEM
M4Z
MK~
NA5
NX~
O9-
P2P
PQQKQ
QWB
RIG
S-T
SNACF
TFL
TFT
TFW
TH9
TNC
TTHFI
TUS
TWF
UT5
UU3
ZL0
~S~
AAFWJ
AAYXX
ADMLS
AIYEW
CITATION
TDBHL
7SC
8FD
JQ2
L7M
L~C
L~D
GROUPED_DOAJ
ID FETCH-LOGICAL-c404t-d35a07f1be0833d65d368cef5358eb04f9ed5acf82265780f189c07c79bcbb193
IEDL.DBID DOA
ISSN 0883-9514
IngestDate Wed Aug 27 01:29:18 EDT 2025
Sun Jun 29 12:24:34 EDT 2025
Tue Jul 01 02:03:59 EDT 2025
Thu Apr 24 23:03:35 EDT 2025
Wed Dec 25 09:03:36 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9-10
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c404t-d35a07f1be0833d65d368cef5358eb04f9ed5acf82265780f189c07c79bcbb193
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-8311-0421
OpenAccessLink https://doaj.org/article/57786d7275f142ef87935f9c84ba3514
PQID 2012901972
PQPubID 53050
PageCount 12
ParticipantIDs crossref_citationtrail_10_1080_08839514_2018_1440907
crossref_primary_10_1080_08839514_2018_1440907
doaj_primary_oai_doaj_org_article_57786d7275f142ef87935f9c84ba3514
informaworld_taylorfrancis_310_1080_08839514_2018_1440907
proquest_journals_2012901972
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2017-11-26
PublicationDateYYYYMMDD 2017-11-26
PublicationDate_xml – month: 11
  year: 2017
  text: 2017-11-26
  day: 26
PublicationDecade 2010
PublicationPlace Philadelphia
PublicationPlace_xml – name: Philadelphia
PublicationTitle Applied artificial intelligence
PublicationYear 2017
Publisher Taylor & Francis
Taylor & Francis Ltd
Taylor & Francis Group
Publisher_xml – name: Taylor & Francis
– name: Taylor & Francis Ltd
– name: Taylor & Francis Group
References Pedregosa F. (CIT0011) 2011; 12
CIT0012
CIT0014
CIT0002
CIT0004
CIT0007
CIT0009
References_xml – volume: 12
  start-page: 2825
  year: 2011
  ident: CIT0011
  publication-title: Jmlr
– ident: CIT0012
  doi: 10.7551/mitpress/1130.003.0016
– ident: CIT0007
  doi: 10.1017/CBO9780511809071
– ident: CIT0009
  doi: 10.1016/j.ipm.2016.07.003
– ident: CIT0014
  doi: 10.1007/s10115-008-0152-4
– ident: CIT0002
  doi: 10.1007/BF00994018
– ident: CIT0004
  doi: 10.14778/2536222.2536237
SSID ssj0001771
Score 2.1986303
Snippet Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with...
SourceID doaj
proquest
crossref
informaworld
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 733
SubjectTerms Bayesian analysis
Categories
Classification
Random sampling
Support vector machines
Text editing
Title Evaluation of Naive Bayes and Support Vector Machines for Wikipedia
URI https://www.tandfonline.com/doi/abs/10.1080/08839514.2018.1440907
https://www.proquest.com/docview/2012901972
https://doaj.org/article/57786d7275f142ef87935f9c84ba3514
Volume 31
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LTyQhECbqyYu6q8ZZXcPBKwaa93E1GrOJc_J1I0BDYjSj0fHgv7eg6cnEPcxlr3STVIp6fEDxFUInzAchYpcIS1wREXJPvKWRxI6arKixuZIkXU_V1a34-yAfllp9lZqwgR54UBxs2LVRPWRZmZnoUjZgUDLbaETwpQq9RF_IeeNmqsVgputWC1yIE8AQYny7U1i1YawMlbIuU-82beklu5SVKnn_N-rSf0J1zT-XO2irAUf8ZxD4B1pLs59oe2zKgJuP7qLziwV_N37JeOohnOEz_5nesZ_1uHTxBMSN7-ppPb6utZTwDWTA949Pj6_lJckeur28uDm_Iq1TAomCijnpufRUZxYSICreK9lzZWLKkkuTAhXZpl76mAENKHBRmpmxkeqobYghAIbbRxuzl1k6QBjwmRaCixR1D5sZao1iUQYTvdZScT5BYtSUi41GvHSzeHZsZBttCnZFwa4peIJOF9NeBx6NVRPOyjIsfi402HUAjMM143CrjGOC7PIiunk9BclDyxLHVwhwNK64a379Xn4oF89Wd7_-h3yHaLMrMIEx0qkjtDF_-0i_AeTMwzFa53R6XK36C-947yc
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9wwELYQPdBL6VNdXvWhVy92_D4CAm1bdk_QcrNix64QaBdBONBfX4-TrHgIceCaeJz4MePP9sw3CH1ntRciVJGwyBURPjWktjSQUFGTFDU2FZKk6UxNTsXPM3l2LxYG3CphD506oohiq0G54TB6cInbzZrBMzKAIxFmyvWkhYDyN9IqDVkMOJ0trTHTZdMFIgRkhiie56p5sD4VGv9HJKZPjHZZiY7WURja0DmgXIxvWz8O_x7RO76uke_Rux6o4r1uZn1AK3H-Ea0PSSBwbxM-oYPDJV84XiQ8q7P5xPv1XbzB-dsYsoZmhI9_l9sBPC2-m_ldbin-c35xfgWRK5_R6dHhycGE9JkZSBBUtKThsqY6MR8zguONkg1XJsQkuTTRU5FsbGQdUkYfKpsEmpixgeqgrQ_eZ8z4Ba3OF_P4FeGMB7UQXMSgm7x5otYoFqQ3odZaKs5HSAzj4UJPWw7ZMy4dG9hN-45y0FGu76gRGi_FrjrejpcE9mGwl4WBdrs8WFz_db0WOwlse02GfDIxUcVksnWTyQYjfA0hESNk708V15ZTl9SlSHH8hR_YGuaV6-3IDRSAi26rq41XVP0NrU1Opsfu-Mfs1yZ6WwEyYYxUaguttte3cTvjqtbvFMX5DxMcD_A
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwELZQKyEulPIQCwV84OrFjt9H-liVR1ccKHCzYseuqqLdVTc9wK_H4zgrKKp66DXxOPFjxp_tmW8QestaL0RoImGRKyJ86khraSChoSYpamwqJEknc3V8Kj7-kKM34bq6VcIeOg1EEcVWg3KvujR6xL3LisEzMIATEWbK7aSFePJtBeThEMVB5xtjzHTZc4EIAZkxiOemav5ZngqL_zUO0_9sdlmIZjvIj00Y_E8uple9n4bf19gd79TGR-hhhan4_TCvdtG9uHiMdsYUELhahCfo4GjDFo6XCc_bbDzxfvsrrnH-NIacoRnf42_lbgCfFM_N_C43FH8_vzhfQdzKU3Q6O_p6cExqXgYSBBU96bhsqU7Mx4zfeKdkx5UJMUkuTfRUJBs72YaUsYfKBoEmZmygOmjrg_cZMT5DW4vlIj5HOKNBLQQXMegub52oNYoF6U1otZaK8wkS43C4UEnLIXfGT8dGbtPaUQ46ytWOmqDpRmw1sHbcJrAPY70pDKTb5cHy8sxVHXYSuPa6DPhkYqKJyWTbJpMNRvgWAiImyP49U1xfzlzSkCDF8Vt-YG-cVq5akTUUgGtuq5sXd6j6Dbr_5XDmPn-Yf3qJHjQASxgjjdpDW_3lVXyVQVXvXxe1-QOXng6U
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluation+of+Naive+Bayes+and+Support+Vector+Machines+for+Wikipedia&rft.jtitle=Applied+artificial+intelligence&rft.au=Mocherla%2C+Sridhar&rft.au=Danehy%2C+Alexander&rft.au=Impey%2C+Christopher&rft.date=2017-11-26&rft.pub=Taylor+%26+Francis+Ltd&rft.issn=0883-9514&rft.eissn=1087-6545&rft.volume=31&rft.issue=9-10&rft.spage=733&rft_id=info:doi/10.1080%2F08839514.2018.1440907&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0883-9514&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0883-9514&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0883-9514&client=summon