Bangla news classification using naive Bayes classifier

Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user's need....

Full description

Saved in:
Bibliographic Details
Published in16th Int'l Conf. Computer and Information Technology pp. 366 - 371
Main Authors Chy, Abu Nowshed, Seddiqui, Md Hanif, Das, Sowmitra
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2014
Subjects
Online AccessGet full text
DOI10.1109/ICCITechn.2014.6997369

Cover

Abstract Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user's need. Removing these heterogeneity and classifying the news articles according to user preference is a formidable task. In this paper, we propose an approach that provides a user to find out news articles which are related to a specific classification. We use our own developed web crawler to extract useful text from HTML pages of news article contents to construct a Full-Text-RSS. Each news article contents is tokenized with a modified light-weight Bangla Stemmer. In order to achieve better classification result, we remove the less significant words i.e. stop - word from the document. We apply the naive Bayes classifier for classification of Bangla news article contents based on news code of IPTC. Our experimental result shows the effectiveness of our classification system.
AbstractList Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user's need. Removing these heterogeneity and classifying the news articles according to user preference is a formidable task. In this paper, we propose an approach that provides a user to find out news articles which are related to a specific classification. We use our own developed web crawler to extract useful text from HTML pages of news article contents to construct a Full-Text-RSS. Each news article contents is tokenized with a modified light-weight Bangla Stemmer. In order to achieve better classification result, we remove the less significant words i.e. stop - word from the document. We apply the naive Bayes classifier for classification of Bangla news article contents based on news code of IPTC. Our experimental result shows the effectiveness of our classification system.
Author Das, Sowmitra
Seddiqui, Md Hanif
Chy, Abu Nowshed
Author_xml – sequence: 1
  givenname: Abu Nowshed
  surname: Chy
  fullname: Chy, Abu Nowshed
  email: nowshed@skeim.org
  organization: Dept. of Comput. Sci. & Eng., Univ. of Chittagong, Chittagong, Bangladesh
– sequence: 2
  givenname: Md Hanif
  surname: Seddiqui
  fullname: Seddiqui, Md Hanif
  email: hanif@cu.ac.bd
  organization: Dept. of Comput. Sci. & Eng., Univ. of Chittagong, Chittagong, Bangladesh
– sequence: 3
  givenname: Sowmitra
  surname: Das
  fullname: Das, Sowmitra
  email: sowmitra@skeim.org
  organization: Dept. of Comput. Sci. & Eng., Univ. of Chittagong, Chittagong, Bangladesh
BookMark eNpFj91Kw0AQhVfQC619goLkBRL3N5O5tMGfQMGb3JfpdNIuxK1kq9K3t2DBqwOHj8N37tR1OiRR6sHoyhiNj13bdr3wPlVWG1_ViOBqvFJzhMZ4QHT-3NwqWFLajVQk-ckFj5RzHCLTMR5S8ZVj2hWJ4rcUSzrJPyDTvboZaMwyv-RM9S_PfftWrt5fu_ZpVUbUx5I8-I1nJDKhYQK3ZQvAMEgdBIFNE_RZhSx5r8kCuwABcbO1wjYY72Zq8TcbRWT9OcUPmk7ryxv3C7QORPc
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICCITechn.2014.6997369
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781479934973
1479934976
EndPage 371
ExternalDocumentID 6997369
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i90t-a474b4c9aa158ca73dc277c7fe65e97c1850993a2a440a27c357599bd2ec25143
IEDL.DBID RIE
IngestDate Thu Jun 29 18:37:37 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-a474b4c9aa158ca73dc277c7fe65e97c1850993a2a440a27c357599bd2ec25143
PageCount 6
ParticipantIDs ieee_primary_6997369
PublicationCentury 2000
PublicationDate 2014-March
PublicationDateYYYYMMDD 2014-03-01
PublicationDate_xml – month: 03
  year: 2014
  text: 2014-March
PublicationDecade 2010
PublicationTitle 16th Int'l Conf. Computer and Information Technology
PublicationTitleAbbrev ICCITechn
PublicationYear 2014
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8018326
Snippet Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different...
SourceID ieee
SourceType Publisher
StartPage 366
SubjectTerms Computers
Dictionaries
Information technology
Layout
Taxonomy
Training
Vectors
Title Bangla news classification using naive Bayes classifier
URI https://ieeexplore.ieee.org/document/6997369
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH5sO3lS2cTf5ODRdmmTNs11w7EJEw8Tdhvp64uI0MnoDvOvN2nrhuJB6CGUQH405PuSft97AHc2w8LRVBnoNMrdASU3gUkyHkQCI0PcOkzz3uH5Uzp9kY_LZNmB-70Xhohq8RmFvlj_yy_WuPVXZcNUayVS3YWuW2aNV6s1_UZcD2fj8ay-j_aCLRm2lX9kTalBY3IM8-_mGq3Ie7it8hA_f0Vi_G9_TmBwsOex5z3wnEKHyj6okfH5OJinyQw9J_YioHremRe3v7LSuJ2NjcyODhVoM4DF5GExngZtVoTgTfMqMFLJXKI2JkoyNEoUGCuFylKakFbo8NeRPmFiIyU3sULhU3DqvIgJY8-OzqBXrks6B8atQJ2RJU5aah8oxlorI4WIInLPBfT9mFcfTdyLVTvcy79fX8GRn_dGn3UNvWqzpRsH2FV-W3-pL6bUmDQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MedCTyib-NgePtkvbtGmuG8qm2_AwYbeRvr6ICJ2M7qB_vUlbNxQPQg4hBNIkkO9L-n3vAdyYFHNLU4WnkiCzF5RMezpOuRdEGGjixmKa8w5PpsnwWTzM43kLbjdeGCKqxGfku2r1Lz9f4to9lfUSpWSUqB3Ytbgv4tqt1dh-A656o8FgVL1IO8mW8JvuP_KmVLBxfwCT7wFrtcibvy4zHz9_xWL87xcdQndr0GNPG-g5ghYVHZB97TJyMEeUGTpW7GRA1cozJ29_YYW2Zxvr6w_adqBVF2b3d7PB0GvyInivipeeFlJkApXWQZyillGOoZQoDSUxKYkWgS3ti3SoheA6lBi5JJwqy0PC0PGjY2gXy4JOgHEToUrJECcllAsVY4wRgUTEKLDlFDpuzov3OvLFopnu2d_N17A3nE3Gi_Fo-ngO-24ParXWBbTL1ZouLXyX2VW1a1_QO5uB
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=16th+Int%27l+Conf.+Computer+and+Information+Technology&rft.atitle=Bangla+news+classification+using+naive+Bayes+classifier&rft.au=Chy%2C+Abu+Nowshed&rft.au=Seddiqui%2C+Md+Hanif&rft.au=Das%2C+Sowmitra&rft.date=2014-03-01&rft.pub=IEEE&rft.spage=366&rft.epage=371&rft_id=info:doi/10.1109%2FICCITechn.2014.6997369&rft.externalDocID=6997369