Bangla news classification using naive Bayes classifier
Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user's need....
Saved in:
Published in | 16th Int'l Conf. Computer and Information Technology pp. 366 - 371 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.03.2014
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ICCITechn.2014.6997369 |
Cover
Abstract | Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user's need. Removing these heterogeneity and classifying the news articles according to user preference is a formidable task. In this paper, we propose an approach that provides a user to find out news articles which are related to a specific classification. We use our own developed web crawler to extract useful text from HTML pages of news article contents to construct a Full-Text-RSS. Each news article contents is tokenized with a modified light-weight Bangla Stemmer. In order to achieve better classification result, we remove the less significant words i.e. stop - word from the document. We apply the naive Bayes classifier for classification of Bangla news article contents based on news code of IPTC. Our experimental result shows the effectiveness of our classification system. |
---|---|
AbstractList | Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user's need. Removing these heterogeneity and classifying the news articles according to user preference is a formidable task. In this paper, we propose an approach that provides a user to find out news articles which are related to a specific classification. We use our own developed web crawler to extract useful text from HTML pages of news article contents to construct a Full-Text-RSS. Each news article contents is tokenized with a modified light-weight Bangla Stemmer. In order to achieve better classification result, we remove the less significant words i.e. stop - word from the document. We apply the naive Bayes classifier for classification of Bangla news article contents based on news code of IPTC. Our experimental result shows the effectiveness of our classification system. |
Author | Das, Sowmitra Seddiqui, Md Hanif Chy, Abu Nowshed |
Author_xml | – sequence: 1 givenname: Abu Nowshed surname: Chy fullname: Chy, Abu Nowshed email: nowshed@skeim.org organization: Dept. of Comput. Sci. & Eng., Univ. of Chittagong, Chittagong, Bangladesh – sequence: 2 givenname: Md Hanif surname: Seddiqui fullname: Seddiqui, Md Hanif email: hanif@cu.ac.bd organization: Dept. of Comput. Sci. & Eng., Univ. of Chittagong, Chittagong, Bangladesh – sequence: 3 givenname: Sowmitra surname: Das fullname: Das, Sowmitra email: sowmitra@skeim.org organization: Dept. of Comput. Sci. & Eng., Univ. of Chittagong, Chittagong, Bangladesh |
BookMark | eNpFj91Kw0AQhVfQC619goLkBRL3N5O5tMGfQMGb3JfpdNIuxK1kq9K3t2DBqwOHj8N37tR1OiRR6sHoyhiNj13bdr3wPlVWG1_ViOBqvFJzhMZ4QHT-3NwqWFLajVQk-ckFj5RzHCLTMR5S8ZVj2hWJ4rcUSzrJPyDTvboZaMwyv-RM9S_PfftWrt5fu_ZpVUbUx5I8-I1nJDKhYQK3ZQvAMEgdBIFNE_RZhSx5r8kCuwABcbO1wjYY72Zq8TcbRWT9OcUPmk7ryxv3C7QORPc |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICCITechn.2014.6997369 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9781479934973 1479934976 |
EndPage | 371 |
ExternalDocumentID | 6997369 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i90t-a474b4c9aa158ca73dc277c7fe65e97c1850993a2a440a27c357599bd2ec25143 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:37:37 EDT 2023 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i90t-a474b4c9aa158ca73dc277c7fe65e97c1850993a2a440a27c357599bd2ec25143 |
PageCount | 6 |
ParticipantIDs | ieee_primary_6997369 |
PublicationCentury | 2000 |
PublicationDate | 2014-March |
PublicationDateYYYYMMDD | 2014-03-01 |
PublicationDate_xml | – month: 03 year: 2014 text: 2014-March |
PublicationDecade | 2010 |
PublicationTitle | 16th Int'l Conf. Computer and Information Technology |
PublicationTitleAbbrev | ICCITechn |
PublicationYear | 2014 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.8018326 |
Snippet | Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 366 |
SubjectTerms | Computers Dictionaries Information technology Layout Taxonomy Training Vectors |
Title | Bangla news classification using naive Bayes classifier |
URI | https://ieeexplore.ieee.org/document/6997369 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH5sO3lS2cTf5ODRdmmTNs11w7EJEw8Tdhvp64uI0MnoDvOvN2nrhuJB6CGUQH405PuSft97AHc2w8LRVBnoNMrdASU3gUkyHkQCI0PcOkzz3uH5Uzp9kY_LZNmB-70Xhohq8RmFvlj_yy_WuPVXZcNUayVS3YWuW2aNV6s1_UZcD2fj8ay-j_aCLRm2lX9kTalBY3IM8-_mGq3Ie7it8hA_f0Vi_G9_TmBwsOex5z3wnEKHyj6okfH5OJinyQw9J_YioHremRe3v7LSuJ2NjcyODhVoM4DF5GExngZtVoTgTfMqMFLJXKI2JkoyNEoUGCuFylKakFbo8NeRPmFiIyU3sULhU3DqvIgJY8-OzqBXrks6B8atQJ2RJU5aah8oxlorI4WIInLPBfT9mFcfTdyLVTvcy79fX8GRn_dGn3UNvWqzpRsH2FV-W3-pL6bUmDQ |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MedCTyib-NgePtkvbtGmuG8qm2_AwYbeRvr6ICJ2M7qB_vUlbNxQPQg4hBNIkkO9L-n3vAdyYFHNLU4WnkiCzF5RMezpOuRdEGGjixmKa8w5PpsnwWTzM43kLbjdeGCKqxGfku2r1Lz9f4to9lfUSpWSUqB3Ytbgv4tqt1dh-A656o8FgVL1IO8mW8JvuP_KmVLBxfwCT7wFrtcibvy4zHz9_xWL87xcdQndr0GNPG-g5ghYVHZB97TJyMEeUGTpW7GRA1cozJ29_YYW2Zxvr6w_adqBVF2b3d7PB0GvyInivipeeFlJkApXWQZyillGOoZQoDSUxKYkWgS3ti3SoheA6lBi5JJwqy0PC0PGjY2gXy4JOgHEToUrJECcllAsVY4wRgUTEKLDlFDpuzov3OvLFopnu2d_N17A3nE3Gi_Fo-ngO-24ParXWBbTL1ZouLXyX2VW1a1_QO5uB |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=16th+Int%27l+Conf.+Computer+and+Information+Technology&rft.atitle=Bangla+news+classification+using+naive+Bayes+classifier&rft.au=Chy%2C+Abu+Nowshed&rft.au=Seddiqui%2C+Md+Hanif&rft.au=Das%2C+Sowmitra&rft.date=2014-03-01&rft.pub=IEEE&rft.spage=366&rft.epage=371&rft_id=info:doi/10.1109%2FICCITechn.2014.6997369&rft.externalDocID=6997369 |