Evaluation of Naive Bayes and Support Vector Machines for Wikipedia
Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we c...
Saved in:
Published in | Applied artificial intelligence Vol. 31; no. 9-10; pp. 733 - 744 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Philadelphia
Taylor & Francis
26.11.2017
Taylor & Francis Ltd Taylor & Francis Group |
Subjects | |
Online Access | Get full text |
ISSN | 0883-9514 1087-6545 |
DOI | 10.1080/08839514.2018.1440907 |
Cover
Loading…
Abstract | Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM. |
---|---|
AbstractList | Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM. |
Author | Impey, Christopher Danehy, Alexander Mocherla, Sridhar |
Author_xml | – sequence: 1 givenname: Sridhar orcidid: 0000-0001-8311-0421 surname: Mocherla fullname: Mocherla, Sridhar email: srmocher@email.arizona.edu organization: Steward Observatory, University of Arizona – sequence: 2 givenname: Alexander surname: Danehy fullname: Danehy, Alexander organization: Steward Observatory, University of Arizona – sequence: 3 givenname: Christopher surname: Impey fullname: Impey, Christopher organization: College of Science, University of Arizona |
BookMark | eNqFkM1u1DAUhS1UJKaFR0CKxDrDdWzHttgAo0IrtbDgb2nd-Ac8pHFwPEXz9ng6hQULWFn2Pd-51ndKTqY0eUKeUlhTUPAclGJaUL7ugKo15Rw0yAdkVYey7QUXJ2R1yLSH0CNyuixbAKBS0hXZnN_iuMMS09Sk0LzDeOub17j3S4OTaz7s5jnl0nz2tqTcXKP9Fqc6C_XyJX6Ps3cRH5OHAcfFP7k_z8inN-cfNxft1fu3l5tXV63lwEvrmECQgQ4eFGOuF471yvogmFB-AB60dwJtUF3XC6kgUKUtSCv1YIeBanZGLo-9LuHWzDneYN6bhNHcPaT81WAu0Y7eCClV72QnRaC880FJzUTQVvEBWbVQu54du-acfuz8Usw27fJUv2-qxU4D1bKrKXFM2ZyWJfvwZysFc3Bvfrs_UMrcu6_ci784G8ud5JIxjv-lXx7pOFXPN_gz5dGZgvsx5ZBxsnEx7N8VvwDrOZyt |
CitedBy_id | crossref_primary_10_1177_0361198120967943 crossref_primary_10_1080_08839514_2019_1673037 crossref_primary_10_1002_srin_202200617 crossref_primary_10_12720_jait_15_4_519_531 |
Cites_doi | 10.7551/mitpress/1130.003.0016 10.1017/CBO9780511809071 10.1016/j.ipm.2016.07.003 10.1007/s10115-008-0152-4 10.1007/BF00994018 10.14778/2536222.2536237 |
ContentType | Journal Article |
Copyright | 2017 Taylor & Francis 2017 2017 Taylor & Francis |
Copyright_xml | – notice: 2017 Taylor & Francis 2017 – notice: 2017 Taylor & Francis |
DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D DOA |
DOI | 10.1080/08839514.2018.1440907 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Computer and Information Systems Abstracts |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1087-6545 |
EndPage | 744 |
ExternalDocumentID | oai_doaj_org_article_57786d7275f142ef87935f9c84ba3514 10_1080_08839514_2018_1440907 1440907 |
Genre | Article |
GrantInformation_xml | – fundername: Howard Hughes Medical Institute grantid: HHMI-4215580 funderid: 10.13039/100000011 |
GroupedDBID | .4S .7F .DC .QJ 0YH 23M 2DF 30N 4.4 5GY 5VS 8VB AAENE AAJMT ABCCY ABDBF ABFIM ABHAV ABIVO ABPEM ABTAI ACGEJ ACGFS ACGOD ACNCT ACTIO ACUHS ADCVX ADXPE AEISY AEMOZ AENEX AEOZL AEPSL AEYOC AFKVX AGMYJ AHQJS AIJEM AJWEG AKVCP ALMA_UNASSIGNED_HOLDINGS ALQZU AQRUH ARCSS AVBZW AWYRJ BLEHA CCCUG CE4 CS3 DGEBU DKSSO EAP EBR EBS EBU ECS EDO EJD EMK EPL EST ESX E~A E~B F5P GTTXZ H13 HF~ HZ~ H~9 H~P I-F IPNFZ J.P K1G KYCEM M4Z MK~ NA5 NX~ O9- P2P PQQKQ QWB RIG S-T SNACF TFL TFT TFW TH9 TNC TTHFI TUS TWF UT5 UU3 ZL0 ~S~ AAFWJ AAYXX ADMLS AIYEW CITATION TDBHL 7SC 8FD JQ2 L7M L~C L~D GROUPED_DOAJ |
ID | FETCH-LOGICAL-c404t-d35a07f1be0833d65d368cef5358eb04f9ed5acf82265780f189c07c79bcbb193 |
IEDL.DBID | DOA |
ISSN | 0883-9514 |
IngestDate | Wed Aug 27 01:29:18 EDT 2025 Sun Jun 29 12:24:34 EDT 2025 Tue Jul 01 02:03:59 EDT 2025 Thu Apr 24 23:03:35 EDT 2025 Wed Dec 25 09:03:36 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 9-10 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c404t-d35a07f1be0833d65d368cef5358eb04f9ed5acf82265780f189c07c79bcbb193 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-8311-0421 |
OpenAccessLink | https://doaj.org/article/57786d7275f142ef87935f9c84ba3514 |
PQID | 2012901972 |
PQPubID | 53050 |
PageCount | 12 |
ParticipantIDs | crossref_citationtrail_10_1080_08839514_2018_1440907 crossref_primary_10_1080_08839514_2018_1440907 doaj_primary_oai_doaj_org_article_57786d7275f142ef87935f9c84ba3514 informaworld_taylorfrancis_310_1080_08839514_2018_1440907 proquest_journals_2012901972 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2017-11-26 |
PublicationDateYYYYMMDD | 2017-11-26 |
PublicationDate_xml | – month: 11 year: 2017 text: 2017-11-26 day: 26 |
PublicationDecade | 2010 |
PublicationPlace | Philadelphia |
PublicationPlace_xml | – name: Philadelphia |
PublicationTitle | Applied artificial intelligence |
PublicationYear | 2017 |
Publisher | Taylor & Francis Taylor & Francis Ltd Taylor & Francis Group |
Publisher_xml | – name: Taylor & Francis – name: Taylor & Francis Ltd – name: Taylor & Francis Group |
References | Pedregosa F. (CIT0011) 2011; 12 CIT0012 CIT0014 CIT0002 CIT0004 CIT0007 CIT0009 |
References_xml | – volume: 12 start-page: 2825 year: 2011 ident: CIT0011 publication-title: Jmlr – ident: CIT0012 doi: 10.7551/mitpress/1130.003.0016 – ident: CIT0007 doi: 10.1017/CBO9780511809071 – ident: CIT0009 doi: 10.1016/j.ipm.2016.07.003 – ident: CIT0014 doi: 10.1007/s10115-008-0152-4 – ident: CIT0002 doi: 10.1007/BF00994018 – ident: CIT0004 doi: 10.14778/2536222.2536237 |
SSID | ssj0001771 |
Score | 2.1986303 |
Snippet | Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with... |
SourceID | doaj proquest crossref informaworld |
SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 733 |
SubjectTerms | Bayesian analysis Categories Classification Random sampling Support vector machines Text editing |
Title | Evaluation of Naive Bayes and Support Vector Machines for Wikipedia |
URI | https://www.tandfonline.com/doi/abs/10.1080/08839514.2018.1440907 https://www.proquest.com/docview/2012901972 https://doaj.org/article/57786d7275f142ef87935f9c84ba3514 |
Volume | 31 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LTyQhECbqyYu6q8ZZXcPBKwaa93E1GrOJc_J1I0BDYjSj0fHgv7eg6cnEPcxlr3STVIp6fEDxFUInzAchYpcIS1wREXJPvKWRxI6arKixuZIkXU_V1a34-yAfllp9lZqwgR54UBxs2LVRPWRZmZnoUjZgUDLbaETwpQq9RF_IeeNmqsVgputWC1yIE8AQYny7U1i1YawMlbIuU-82beklu5SVKnn_N-rSf0J1zT-XO2irAUf8ZxD4B1pLs59oe2zKgJuP7qLziwV_N37JeOohnOEz_5nesZ_1uHTxBMSN7-ppPb6utZTwDWTA949Pj6_lJckeur28uDm_Iq1TAomCijnpufRUZxYSICreK9lzZWLKkkuTAhXZpl76mAENKHBRmpmxkeqobYghAIbbRxuzl1k6QBjwmRaCixR1D5sZao1iUQYTvdZScT5BYtSUi41GvHSzeHZsZBttCnZFwa4peIJOF9NeBx6NVRPOyjIsfi402HUAjMM143CrjGOC7PIiunk9BclDyxLHVwhwNK64a379Xn4oF89Wd7_-h3yHaLMrMIEx0qkjtDF_-0i_AeTMwzFa53R6XK36C-947yc |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT9wwELYQPdBL6VNdXvWhVy92_D4CAm1bdk_QcrNix64QaBdBONBfX4-TrHgIceCaeJz4MePP9sw3CH1ntRciVJGwyBURPjWktjSQUFGTFDU2FZKk6UxNTsXPM3l2LxYG3CphD506oohiq0G54TB6cInbzZrBMzKAIxFmyvWkhYDyN9IqDVkMOJ0trTHTZdMFIgRkhiie56p5sD4VGv9HJKZPjHZZiY7WURja0DmgXIxvWz8O_x7RO76uke_Rux6o4r1uZn1AK3H-Ea0PSSBwbxM-oYPDJV84XiQ8q7P5xPv1XbzB-dsYsoZmhI9_l9sBPC2-m_ldbin-c35xfgWRK5_R6dHhycGE9JkZSBBUtKThsqY6MR8zguONkg1XJsQkuTTRU5FsbGQdUkYfKpsEmpixgeqgrQ_eZ8z4Ba3OF_P4FeGMB7UQXMSgm7x5otYoFqQ3odZaKs5HSAzj4UJPWw7ZMy4dG9hN-45y0FGu76gRGi_FrjrejpcE9mGwl4WBdrs8WFz_db0WOwlse02GfDIxUcVksnWTyQYjfA0hESNk708V15ZTl9SlSHH8hR_YGuaV6-3IDRSAi26rq41XVP0NrU1Opsfu-Mfs1yZ6WwEyYYxUaguttte3cTvjqtbvFMX5DxMcD_A |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwELZQKyEulPIQCwV84OrFjt9H-liVR1ccKHCzYseuqqLdVTc9wK_H4zgrKKp66DXxOPFjxp_tmW8QestaL0RoImGRKyJ86khraSChoSYpamwqJEknc3V8Kj7-kKM34bq6VcIeOg1EEcVWg3KvujR6xL3LisEzMIATEWbK7aSFePJtBeThEMVB5xtjzHTZc4EIAZkxiOemav5ZngqL_zUO0_9sdlmIZjvIj00Y_E8uple9n4bf19gd79TGR-hhhan4_TCvdtG9uHiMdsYUELhahCfo4GjDFo6XCc_bbDzxfvsrrnH-NIacoRnf42_lbgCfFM_N_C43FH8_vzhfQdzKU3Q6O_p6cExqXgYSBBU96bhsqU7Mx4zfeKdkx5UJMUkuTfRUJBs72YaUsYfKBoEmZmygOmjrg_cZMT5DW4vlIj5HOKNBLQQXMegub52oNYoF6U1otZaK8wkS43C4UEnLIXfGT8dGbtPaUQ46ytWOmqDpRmw1sHbcJrAPY70pDKTb5cHy8sxVHXYSuPa6DPhkYqKJyWTbJpMNRvgWAiImyP49U1xfzlzSkCDF8Vt-YG-cVq5akTUUgGtuq5sXd6j6Dbr_5XDmPn-Yf3qJHjQASxgjjdpDW_3lVXyVQVXvXxe1-QOXng6U |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluation+of+Naive+Bayes+and+Support+Vector+Machines+for+Wikipedia&rft.jtitle=Applied+artificial+intelligence&rft.au=Mocherla%2C+Sridhar&rft.au=Danehy%2C+Alexander&rft.au=Impey%2C+Christopher&rft.date=2017-11-26&rft.pub=Taylor+%26+Francis+Ltd&rft.issn=0883-9514&rft.eissn=1087-6545&rft.volume=31&rft.issue=9-10&rft.spage=733&rft_id=info:doi/10.1080%2F08839514.2018.1440907&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0883-9514&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0883-9514&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0883-9514&client=summon |