Classification of Text Documents based on Naive Bayes using N-Gram Features

Document classification is basically the process of categorizing documents in certain categories correctly. This process, which is usually used in the field of text mining, automatically classifies documents with large dimensions. In this paper, Turkish document classification was performed by using...

Full description

Saved in:
Bibliographic Details
Published in2018 International Conference on Artificial Intelligence and Data Processing (IDAP) pp. 1 - 5
Main Author BAYGIN, Mehmet
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2018
Subjects
Online AccessGet full text
DOI10.1109/IDAP.2018.8620853

Cover

Abstract Document classification is basically the process of categorizing documents in certain categories correctly. This process, which is usually used in the field of text mining, automatically classifies documents with large dimensions. In this paper, Turkish document classification was performed by using Naïve Bayes approach which is one of the machine learning methods. With this approach, which basically uses 5 different categories, Turkish documents are classified quickly and automatically. In addition, the performance of the proposed approach was measured according to the basic evaluation criteria of precision, recall, accuracy and f-measure, and achieved a success rate of 92%. Also, the source codes of the application developed in this paper are presented as open source at https://drive.google.com/open?id=1Idp5VK1Q91vyqb940WjeoMpB9dVQuVC9.
AbstractList Document classification is basically the process of categorizing documents in certain categories correctly. This process, which is usually used in the field of text mining, automatically classifies documents with large dimensions. In this paper, Turkish document classification was performed by using Naïve Bayes approach which is one of the machine learning methods. With this approach, which basically uses 5 different categories, Turkish documents are classified quickly and automatically. In addition, the performance of the proposed approach was measured according to the basic evaluation criteria of precision, recall, accuracy and f-measure, and achieved a success rate of 92%. Also, the source codes of the application developed in this paper are presented as open source at https://drive.google.com/open?id=1Idp5VK1Q91vyqb940WjeoMpB9dVQuVC9.
Author BAYGIN, Mehmet
Author_xml – sequence: 1
  givenname: Mehmet
  surname: BAYGIN
  fullname: BAYGIN, Mehmet
  organization: Computer Engineering Department, Ardahan University, Ardahan, 75000, Turkey
BookMark eNotj8tKw0AUQEfQha1-gLiZH0icm8m8ljW1tVjaLrIvd9I7MtAkkknE_r2CXZ3FgQNnxm67viPGnkDkAMK9bJaLQ14IsLnVhbBK3rAZKGm1tsaqe_ZRnTGlGGKDY-w73gde08_Il30ztdSNiXtMdOJ_aofxm_grXijxKcXuk--y9YAtXxGO00Dpgd0FPCd6vHLO6tVbXb1n2_16Uy22WXRizLzR9iQteOMJAZzyZVAuqAACpDQySNDktW-CglDa0jREosFSGgiFc1bO2fN_NhLR8WuILQ6X43VP_gI50Ejp
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IDAP.2018.8620853
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1538668785
9781538668788
EndPage 5
ExternalDocumentID 8620853
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i90t-b768d381b7bea1195b4f59f5f1013373f316eb6bcf51f4847cee0ca4371f29983
IEDL.DBID RIE
IngestDate Thu Jun 29 18:39:22 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-b768d381b7bea1195b4f59f5f1013373f316eb6bcf51f4847cee0ca4371f29983
PageCount 5
ParticipantIDs ieee_primary_8620853
PublicationCentury 2000
PublicationDate 2018-Sept.
PublicationDateYYYYMMDD 2018-09-01
PublicationDate_xml – month: 09
  year: 2018
  text: 2018-Sept.
PublicationDecade 2010
PublicationTitle 2018 International Conference on Artificial Intelligence and Data Processing (IDAP)
PublicationTitleAbbrev IDAP
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7115316
Snippet Document classification is basically the process of categorizing documents in certain categories correctly. This process, which is usually used in the field of...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Bayes methods
Data mining
document classification
Feature extraction
Machine learning
Naïve Bayes
Sentiment analysis
Sports
Training
Title Classification of Text Documents based on Naive Bayes using N-Gram Features
URI https://ieeexplore.ieee.org/document/8620853
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED21nZgAtYhveWDEaVI7qTMCpRRQow5F6lbZjo0Qoq3aZIBfz9kJRSAGtlMSJdY58bsXv7sDuEi1FallhvZErijPdU6FREulTMiIS818utg4S0ZP_GEWzxpwuc2FMcZ48ZkJnOn38vOlLt2vsi5G3xghsCY08TWrcrXqjcooTLv3g6uJ02qJoL7uR8MUjxfDXRh_PamSibwGZaEC_fGrCON_h7IHne_MPDLZYs4-NMyiDY--s6XT_Hg3k6UlU1xzyaC-zYY4rMoJnsokrm7kWr6bDXGS92eS0bu1fCMuFCyRendgOryd3oxo3SSBvqRhQRXShRxRV_WVka58m-I2Tm1s8VNjrM8sixKjEqVtHFmOUIQDDLXkrB9ZRCLBDqC1WC7MIZBEITPVTGjBkcb0kHIryzD6sCLUzjqCtvPDfFWVwZjXLjj--_AJ7Li5qORYp9Aq1qU5Q_wu1LmfuE9CMp0W
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED2VMsAEqEV844GRpEmdpM4IlNLSNuoQpG6V7dioQrSoTQb49ZydUARiYDvFVmKdFb8737s7gKtYahZrqpw2y4QTZDJzGEdJxJRxP-CS2nSxcRL1n4LHaTitwfUmF0YpZclnyjWijeVnS1mYq7IWWt9oIdAt2EbcD8IyW6sKVfpe3Bp0byaGrcXcauaPlikWMXp7MP76VkkUeXGLXLjy41cZxv8uZh-a37l5ZLJBnQOoqUUDhra3pWH9WEWTpSYpnrqkW71mTQxaZQSHEo7nG7nl72pNDOn9mSTOw4q_EmMMFuh8NyHt3ad3fadqk-DMYy93BDoMGeKu6AjFTQE3Eegw1qHGn43SDtXUj5SIhNShrwMEI1ygJ3lAO75GLGL0EOqL5UIdAYkE-qaSMslQvayNTrfQFO0PzTxppGNoGD3M3spCGLNKBSd_P76EnX46Hs1Gg2R4CrtmX0py1hnU81WhzhHNc3FhN_ETrR6gYw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+International+Conference+on+Artificial+Intelligence+and+Data+Processing+%28IDAP%29&rft.atitle=Classification+of+Text+Documents+based+on+Naive+Bayes+using+N-Gram+Features&rft.au=BAYGIN%2C+Mehmet&rft.date=2018-09-01&rft.pub=IEEE&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FIDAP.2018.8620853&rft.externalDocID=8620853