Features extraction based on Naive Bayes algorithm and TF-IDF for news classification
The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and ha...
Saved in:
Published in | PloS one Vol. 20; no. 7; p. e0327347 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
United States
Public Library of Science
30.07.2025
Public Library of Science (PLoS) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation( p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries. |
---|---|
AbstractList | The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation( p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries. The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation( p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries. The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation(p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries.The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation(p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries. |
Audience | Academic |
Author | Zhang, Li |
AuthorAffiliation | School of Artificial Intelligence, Zhejiang College of Security Technology, Wenzhou, Zhejiang, China Philadelphia University, JORDAN |
AuthorAffiliation_xml | – name: School of Artificial Intelligence, Zhejiang College of Security Technology, Wenzhou, Zhejiang, China – name: Philadelphia University, JORDAN |
Author_xml | – sequence: 1 givenname: Li orcidid: 0009-0003-1535-4987 surname: Zhang fullname: Zhang, Li |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40737302$$D View this record in MEDLINE/PubMed |
BookMark | eNqNk1Fv0zAQxyM0xLbCN0AQCQnBQ4sdx47zhMagUGliEgxeratzbl2lcbGTwb49zppNDdoD8oOd889_x_-7O02OGtdgkjynZEZZQd9tXOcbqGe7GJ4RlhUsLx4lJ7Rk2VRkhB0drI-T0xA2hHAmhXiSHOekYAUj2UnyY47Qdh5Din9aD7q1rkmXELBK4-Ir2GtMP8BN3Id65bxt19sUmiq9mk8XH-epcT5t8HdIdQ0hWGM19ApPk8cG6oDPhnkS7_l0df5lenH5eXF-djHVnJfFlBkQEiWTFUdjkLMqfiDjFQeUWjAjzFJoIjNdlhpkToBWIjOYF6gFAcMmycu97q52QQ2OBMUylpclzyiJxGJPVA42auftFvyNcmDVbcD5lQLfWl2j4oZyzfMcgVU5EARYIgrKiASQhkPUej_c1i23WGlsomP1SHS809i1WrlrRTNGCYkZmiRvBgXvfnUYWrW1QWNdQ4Ouu_1xHlMpCxHRV_-gDz9voFYQX2Ab4_ok9qLqTOalpCLLemr2ABVHhVurY_0YG-OjA29HByLTxvpYQReCWnz_9v_s5c8x-_qAXSPU7Tq4uutrJozBF4dW33t8V7gRyPeA9i4Ej-YeoUT1_XFnl-r7Qw39wf4Cix8BhA |
Cites_doi | 10.1109/ETCS.2010.248 10.3115/v1/D14-1181 10.1109/TSSA48701.2019.8985458 10.1109/ICCITechn.2014.6997369 10.1016/j.neucom.2019.01.078 10.1109/ICWS55610.2022.00064 10.1109/CONIT59222.2023.10205870 10.1109/SmartNets58706.2023.10215867 10.1109/MECO58584.2023.10155036 10.1007/978-3-642-86659-3 10.1109/ICCECT57938.2023.10141054 |
ContentType | Journal Article |
Copyright | Copyright: © 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. COPYRIGHT 2025 Public Library of Science 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. 2025 Zhang 2025 Zhang 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: Copyright: © 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. – notice: COPYRIGHT 2025 Public Library of Science – notice: 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: 2025 Zhang 2025 Zhang – notice: 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM IOV ISR 3V. 7QG 7QL 7QO 7RV 7SN 7SS 7T5 7TG 7TM 7U9 7X2 7X7 7XB 88E 8AO 8C1 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABJCF ABUWG AEUYN AFKRA ARAPS ATCPS AZQEC BBNVY BENPR BGLVJ BHPHI C1K CCPQU D1I DWQXO FR3 FYUFA GHDGH GNUQQ H94 HCIFZ K9. KB. KB0 KL. L6V LK8 M0K M0S M1P M7N M7P M7S NAPCQ P5Z P62 P64 PATMY PDBOC PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PTHSS PYCSY RC3 7X8 5PM DOA |
DOI | 10.1371/journal.pone.0327347 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Opposing Viewpoints Gale In Context: Science ProQuest Central (Corporate) Animal Behavior Abstracts Bacteriology Abstracts (Microbiology B) Biotechnology Research Abstracts Nursing & Allied Health Database Ecology Abstracts Entomology Abstracts (Full archive) Immunology Abstracts Meteorological & Geoastrophysical Abstracts Nucleic Acids Abstracts Virology and AIDS Abstracts Agricultural Science Collection Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) ProQuest Pharma Collection Public Health Database Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) Materials Science & Engineering Collection (ProQuest) ProQuest Central (Alumni) ProQuest One Sustainability ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection Agricultural & Environmental Science Collection ProQuest Central Essentials Biological Science Collection ProQuest Central Technology Collection Natural Science Collection Environmental Sciences and Pollution Management ProQuest One Community College ProQuest Materials Science Collection ProQuest Central Korea Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student AIDS and Cancer Research Abstracts ProQuest SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) Materials Science Database Nursing & Allied Health Database (Alumni Edition) Meteorological & Geoastrophysical Abstracts - Academic ProQuest Engineering Collection ProQuest Biological Science Collection Agricultural Science Database ProQuest Health & Medical Collection Medical Database Algology Mycology and Protozoology Abstracts (Microbiology C) Biological Science Database Engineering Database ProQuest Nursing & Allied Health Premium Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts Environmental Science Database Materials Science Collection (ProQuest) ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition Engineering Collection Environmental Science Collection Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ (Directory of Open Access Journals) |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Agricultural Science Database Publicly Available Content Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials Nucleic Acids Abstracts SciTech Premium Collection Environmental Sciences and Pollution Management ProQuest One Applied & Life Sciences ProQuest One Sustainability Health Research Premium Collection Meteorological & Geoastrophysical Abstracts Natural Science Collection Health & Medical Research Collection Biological Science Collection ProQuest Central (New) ProQuest Medical Library (Alumni) Engineering Collection Advanced Technologies & Aerospace Collection Engineering Database Virology and AIDS Abstracts ProQuest Biological Science Collection ProQuest One Academic Eastern Edition Agricultural Science Collection ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database Ecology Abstracts ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts Environmental Science Collection Entomology Abstracts Nursing & Allied Health Premium ProQuest Health & Medical Complete ProQuest One Academic UKI Edition Environmental Science Database ProQuest Nursing & Allied Health Source (Alumni) Engineering Research Database ProQuest One Academic Meteorological & Geoastrophysical Abstracts - Academic ProQuest One Academic (New) Technology Collection Technology Research Database ProQuest One Academic Middle East (New) Materials Science Collection ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Pharma Collection ProQuest Central ProQuest Health & Medical Research Collection Genetics Abstracts ProQuest Engineering Collection Biotechnology Research Abstracts Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Bacteriology Abstracts (Microbiology B) Algology Mycology and Protozoology Abstracts (Microbiology C) Agricultural & Environmental Science Collection AIDS and Cancer Research Abstracts Materials Science Database ProQuest Materials Science Collection ProQuest Public Health ProQuest Nursing & Allied Health Source ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest Medical Library Animal Behavior Abstracts Materials Science & Engineering Collection Immunology Abstracts ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | Agricultural Science Database MEDLINE MEDLINE - Academic CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ (Directory of Open Access Journals) url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 4 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Sciences (General) |
DocumentTitleAlternate | Naive Bayes-TF-IDF for news classification |
EISSN | 1932-6203 |
ExternalDocumentID | 3234995210 oai_doaj_org_article_5f15c544ea3d4a0eaabee61308aa8f5a PMC12310027 A849816220 40737302 10_1371_journal_pone_0327347 |
Genre | Journal Article |
GeographicLocations | China |
GeographicLocations_xml | – name: China |
GrantInformation_xml | – grantid: 26NDJC123YB – grantid: 243049 |
GroupedDBID | --- 123 29O 2WC 53G 5VS 7RV 7X2 7X7 7XC 88E 8AO 8C1 8CJ 8FE 8FG 8FH 8FI 8FJ A8Z AAFWJ AAUCC AAWOE AAYXX ABDBF ABIVO ABJCF ABUWG ACGFO ACIHN ACIWK ACPRK ACUHS ADBBV AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHMBA ALMA_UNASSIGNED_HOLDINGS AOIJS APEBS ARAPS ATCPS BAWUL BBNVY BCNDV BENPR BGLVJ BHPHI BKEYQ BPHCQ BVXVI BWKFM CCPQU CITATION CS3 D1I D1J D1K DIK DU5 E3Z EAP EAS EBD EMOBN ESX EX3 F5P FPL FYUFA GROUPED_DOAJ GX1 HCIFZ HH5 HMCUK HYE IAO IEA IGS IHR IHW INH INR IOV IPY ISE ISR ITC K6- KB. KQ8 L6V LK5 LK8 M0K M1P M48 M7P M7R M7S M~E NAPCQ O5R O5S OK1 OVT P2P P62 PATMY PDBOC PHGZM PHGZT PIMPY PJZUB PPXIY PQGLB PQQKQ PROAC PSQYO PTHSS PV9 PYCSY RNS RPM RZL SV3 TR2 UKHRP WOQ WOW ~02 ~KM ADRAZ ALIPV CGR CUY CVF ECM EIF IPNFZ NPM RIG 3V. 7QG 7QL 7QO 7SN 7SS 7T5 7TG 7TM 7U9 7XB 8FD 8FK AZQEC C1K DWQXO FR3 GNUQQ H94 K9. KL. M7N P64 PKEHL PQEST PQUKI RC3 7X8 5PM PUEGO |
ID | FETCH-LOGICAL-c5597-3fa68e838d5effe53de83e35d5ae8c63f6fb6c082c99ca840a1d62fe47ec60af3 |
IEDL.DBID | M48 |
ISSN | 1932-6203 |
IngestDate | Sun Aug 31 00:08:01 EDT 2025 Wed Aug 27 01:31:58 EDT 2025 Thu Aug 21 18:33:36 EDT 2025 Thu Jul 31 18:30:47 EDT 2025 Fri Aug 01 05:20:51 EDT 2025 Wed Aug 13 23:53:30 EDT 2025 Tue Aug 12 03:41:20 EDT 2025 Sat Aug 09 03:20:51 EDT 2025 Sat Aug 09 03:21:01 EDT 2025 Tue Aug 12 02:16:02 EDT 2025 Sun Aug 03 01:50:47 EDT 2025 Wed Aug 20 07:46:04 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 7 |
Language | English |
License | Copyright: © 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Creative Commons Attribution License |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c5597-3fa68e838d5effe53de83e35d5ae8c63f6fb6c082c99ca840a1d62fe47ec60af3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors have declared that no competing interests exist. |
ORCID | 0009-0003-1535-4987 |
OpenAccessLink | http://journals.scholarsportal.info/openUrl.xqy?doi=10.1371/journal.pone.0327347 |
PMID | 40737302 |
PQID | 3234995210 |
PQPubID | 1436336 |
PageCount | e0327347 |
ParticipantIDs | plos_journals_3234995210 doaj_primary_oai_doaj_org_article_5f15c544ea3d4a0eaabee61308aa8f5a pubmedcentral_primary_oai_pubmedcentral_nih_gov_12310027 proquest_miscellaneous_3235032876 proquest_journals_3234995210 gale_infotracmisc_A849816220 gale_infotracacademiconefile_A849816220 gale_incontextgauss_ISR_A849816220 gale_incontextgauss_IOV_A849816220 gale_healthsolutions_A849816220 pubmed_primary_40737302 crossref_primary_10_1371_journal_pone_0327347 |
PublicationCentury | 2000 |
PublicationDate | 20250730 |
PublicationDateYYYYMMDD | 2025-07-30 |
PublicationDate_xml | – month: 7 year: 2025 text: 20250730 day: 30 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: San Francisco – name: San Francisco, CA USA |
PublicationTitle | PloS one |
PublicationTitleAlternate | PLoS One |
PublicationYear | 2025 |
Publisher | Public Library of Science Public Library of Science (PLoS) |
Publisher_xml | – name: Public Library of Science – name: Public Library of Science (PLoS) |
References | pone.0327347.ref011 pone.0327347.ref010 SR Waheed (pone.0327347.ref013) 2023 pone.0327347.ref032 X Rong (pone.0327347.ref012) 2014 pone.0327347.ref035 pone.0327347.ref034 A Vaswani (pone.0327347.ref025) 2023; 78 A Vaswani (pone.0327347.ref027) 2023; 78 pone.0327347.ref031 X Rong (pone.0327347.ref026) 2023; 55 pone.0327347.ref030 pone.0327347.ref019 pone.0327347.ref015 S Kumar (pone.0327347.ref024) 2024; 19 pone.0327347.ref014 pone.0327347.ref036 pone.0327347.ref017 pone.0327347.ref016 R Gupta (pone.0327347.ref033) 2023; 24 (pone.0327347.ref021) 2024; 11 BH Li (pone.0327347.ref001) 2022; 33 HF Zhang (pone.0327347.ref008) 2022; 42 pone.0327347.ref023 M Das (pone.0327347.ref009) 2023 pone.0327347.ref020 S Ohno (pone.0327347.ref002) 1970 pone.0327347.ref007 pone.0327347.ref004 pone.0327347.ref003 pone.0327347.ref028 pone.0327347.ref005 N Lestari (pone.0327347.ref018) 2023; 14 X Zhang (pone.0327347.ref029) 2023; 34 G Liu (pone.0327347.ref006) 2019; 337 Y Wang (pone.0327347.ref022) 2025; 29 |
References_xml | – ident: pone.0327347.ref003 – volume: 78 start-page: 345 issue: 2 year: 2023 ident: pone.0327347.ref025 article-title: Interpretable feature selection with BERT publication-title: J Artif Intell Res – ident: pone.0327347.ref028 – year: 2014 ident: pone.0327347.ref012 article-title: Word2vec parameter learning explained publication-title: arXiv preprint – ident: pone.0327347.ref030 – start-page: 1 year: 2023 ident: pone.0327347.ref013 article-title: CNN deep learning-based image to vector depiction publication-title: Multim Tools Appl – ident: pone.0327347.ref032 – ident: pone.0327347.ref007 – ident: pone.0327347.ref015 doi: 10.1109/ETCS.2010.248 – ident: pone.0327347.ref005 doi: 10.3115/v1/D14-1181 – volume: 42 start-page: 1116 issue: 4 year: 2022 ident: pone.0327347.ref008 article-title: News topic text classification method based on BERT and feature projection network publication-title: J Comput Appl – ident: pone.0327347.ref020 doi: 10.1109/TSSA48701.2019.8985458 – ident: pone.0327347.ref011 doi: 10.1109/ICCITechn.2014.6997369 – volume: 337 start-page: 325 year: 2019 ident: pone.0327347.ref006 article-title: Bidirectional LSTM with attention mechanism and convolutional layer for text classification publication-title: Neurocomputing doi: 10.1016/j.neucom.2019.01.078 – ident: pone.0327347.ref019 doi: 10.1109/ICWS55610.2022.00064 – volume: 24 start-page: 123 issue: 3 year: 2023 ident: pone.0327347.ref033 article-title: Explainable AI for news classification: a case study on BERT and LIME publication-title: J Mach Learn Res – volume: 33 start-page: 3565 issue: 10 year: 2022 ident: pone.0327347.ref001 article-title: Short text classification model combining knowledge aware and dual attention publication-title: J Softw – ident: pone.0327347.ref035 – volume: 19 issue: 2 year: 2024 ident: pone.0327347.ref024 article-title: CPU-optimized transformer inference publication-title: ACM Trans Archit – year: 2023 ident: pone.0327347.ref009 article-title: A comparative study on tf-idf feature weighting method and its analysis using unstructured dataset publication-title: arXiv preprint – ident: pone.0327347.ref004 – ident: pone.0327347.ref031 – volume: 14 start-page: 88 issue: 1 year: 2023 ident: pone.0327347.ref018 article-title: Implementation of text mining and pattern discovery with Naive Bayes algorithm for classification of text documents publication-title: Digital Zone: Jurnal Teknologi Informasi dan Komunikasi – ident: pone.0327347.ref016 doi: 10.1109/CONIT59222.2023.10205870 – ident: pone.0327347.ref023 – ident: pone.0327347.ref010 doi: 10.1109/SmartNets58706.2023.10215867 – volume: 55 start-page: 1 issue: 1 year: 2023 ident: pone.0327347.ref026 article-title: Dimensionality reduction in NLP: tradeoffs publication-title: Neural Process. Lett – volume: 11 start-page: 35550 year: 2024 ident: pone.0327347.ref021 article-title: BERT-enhanced TF-IDF for legal document classification publication-title: IEEE Access – volume: 78 start-page: 345 issue: 2 year: 2023 ident: pone.0327347.ref027 article-title: Transformer-based models for multimodal news classification: a comprehensive review publication-title: J Artif Intell Res – ident: pone.0327347.ref017 doi: 10.1109/MECO58584.2023.10155036 – volume: 34 start-page: 789 issue: 5 year: 2023 ident: pone.0327347.ref029 article-title: Cross-lingual news classification using multilingual BERT publication-title: IEEE Trans Neural Netw Learn Syst – ident: pone.0327347.ref036 – volume: 29 start-page: 112 issue: 3 year: 2025 ident: pone.0327347.ref022 article-title: Quantifying contextual shift in news categories publication-title: IEEE J-BHI – ident: pone.0327347.ref034 – volume-title: Evolution by gene duplication year: 1970 ident: pone.0327347.ref002 doi: 10.1007/978-3-642-86659-3 – ident: pone.0327347.ref014 doi: 10.1109/ICCECT57938.2023.10141054 |
SSID | ssj0053866 |
Score | 2.481495 |
Snippet | The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation.... |
SourceID | plos doaj pubmedcentral proquest gale pubmed crossref |
SourceType | Open Website Open Access Repository Aggregation Database Index Database |
StartPage | e0327347 |
SubjectTerms | Accuracy Algorithms Bayes Theorem Biology and Life Sciences Classification Classification systems Computational linguistics Computer and Information Sciences Datasets Deep learning Documents Electronic news gathering Engineering and Technology Humans Identification and classification Inference Knowledge representation Language processing Latency Machine Learning Methods Natural language interfaces Natural Language Processing Neural networks News Physical Sciences Real time Research and Analysis Methods Semantics Social Sciences Statistical analysis Support vector machines Text categorization |
SummonAdditionalLinks | – databaseName: DOAJ (Directory of Open Access Journals) dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9NAEF6hnLggyqumBRaEBBzc2t6H7WMLVC0SRQKKeluN99FWKnaUx4F_z8zasWpUCQ7c4szYSua13ySz3zL2OgMZhPdNKjW4VNo6S2urqtSC0pCpIEI8ReHzqT4-k5_O1fmNo75oJqynB-4Nt69CrqyS0oNwEjIP0HhPoLcCqIKK0AjXvE0z1ddgzGKth41yosz3B7_szbvW72WCKF3KyUIU-frHqjybX3fL2yDnn5OTN5aio_vs3oAh-UH_2bfYHd8-YFtDli7524FK-t1DdkYIb40dNccavOj3MHBauBzHF6eApY4fwi-Uw_VFt7haXf7k0DqOMXvy4YgjoOUEu7kljE1DRdGPj_C5H7-_P06HgxRSSw1DKgLoyleicoqmRJRweOGFcgp8ZbUIOjTaIhiwdW0BWz7InS6Cl6W3OoMgHrNZi6bbZryxHgGYKHxoalk3BXhXK2gcVsmQe2ETlm6sauY9X4aJf5qV2Gf05jHkBTN4IWGHZPpRl9iu4xsYA2aIAfO3GEjYC3Kc6beOjjlrDipZV7kuiixhr6IGMV60NFJzAevl0px8-fEPSt--TpTeDEqhI8fBsI0BvxMxaU00dyeamLd2It6mMNtYZWlEIbD9RDhFd25C73bxy1FMD6UxudZ366ijiCCx1Al70kfqaFns3AWW8yJh1SSGJ6afStqry0g4nlMTkBXl0__hrB12t6AzlOn38WyXzVaLtX-GwG7VPI85_Bv0CU7h priority: 102 providerName: Directory of Open Access Journals – databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9NAEF5BuHBBlFdNCywICTi4tb3e9fqEWiC0SBQJKOrNGu8jRSp2GicH_j0z9iZgVCFuSWZi2fPab9azM4w9TyD3wrk6zhXYODdlEpdG6tiAVJBIL3w_ReHjiTo6zT-cybOw4daFssp1TOwDtW0N7ZHvi0wgOMfFJnk9v4xpahS9XQ0jNK6zGymuNFTSpafv15EYfVmpcFxOFOl-0M7evG3cXiKosUsxWo76rv2b2DyZX7TdVcDz7_rJPxak6W12KyBJfjCofotdc80dthV8teMvQ0PpV3fZKeG8FebVHCPxYjjJwGn5shw_nAAGPH4IP5EOFzN85uX5Dw6N5Wi5x2-nHGEtJ_DNDSFtKi3qtXkPr_vu65ujOIxTiA2lDbHwoLTTQltJtSJSWPzihLQSnDZKeOVrZRASmLI0gIkfpFZl3uWFMyoBL-6zSYOi22a8Ng5hmMicr8u8rDNwtpRQW4yVPnXCRCxeS7WaD10zqv7VWYHZxiCeirRQBS1E7JBEv-Glntf9D-1iVgUXqqRPpZF57kDYHBIHUDtH6Y8G0F5CxJ6Q4qrhAOnGc6sDnZc6VVmWROxZz0F9LxoqrJnBquuq40_f_oPpy-cR04vA5FtSHITDDPhM1E9rxLk74kTvNSPyNpnZWipd9dvO8Z9r07ua_HRDpotSsVzj2lXPI6lNYqEi9mCw1I1kMX8XGNSziOmRDY9EP6Y038_7tuMppQJJVjz8933tsJsZzUim_e9kl02Wi5V7hMBtWT_uvfMXQW5E6Q priority: 102 providerName: ProQuest |
Title | Features extraction based on Naive Bayes algorithm and TF-IDF for news classification |
URI | https://www.ncbi.nlm.nih.gov/pubmed/40737302 https://www.proquest.com/docview/3234995210 https://www.proquest.com/docview/3235032876 https://pubmed.ncbi.nlm.nih.gov/PMC12310027 https://doaj.org/article/5f15c544ea3d4a0eaabee61308aa8f5a http://dx.doi.org/10.1371/journal.pone.0327347 |
Volume | 20 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELe27oUXxPhaxigGIQEPqZI4dpIHhNaxsiGtoEHR3qKLY3eTSlKaVmL_PXdpGhFUpL1ESe9iqffl38X2HWOvPQitMCZzQwW5G-rEcxMtY1eDVOBJK2zdReFirM4m4ecrebXDNj1bGwFWW1M76ic1WcwGv3_dfkCHf193bYj8zUuDeVmYgSeoYEu0y_Zwboqop8FF2K4roHfXq5eEWlwVeKI5TPe_UTqTVV3Tv43cvfmsrLbB0n93V_41XY0esPsNzuTHa8PYZzumeMj2G0-u-Num3PS7R2xCKHCFWTfHOL1Yn3PgNLnlHG_GgOGQD-EW6TCbloub5fVPDkXO0a7PP444gl5O0JxrwuG08ajW9WMc9_T7yZnbNFtwNSUVrrCgYhOLOJe0k0SKHB-MkLkEE2slrLKZ0ggYdJJowLQQ_FwF1oSR0coDK56wXoGiO2A80wZBmgiMzZIwyQIweSIhyzGSWt8I7TB3I9V0vq6pkdYLaxHmImvxpKSFtNGCw4Yk-paXKmLXP5SLado4WCqtL7UMQwMiD8EzAJkxlBzFALGV4LAXpLh0fby09ev0OA6T2FdB4DnsVc1BVTEK2nYzhVVVpedfftyB6dtlh-lNw2RLUhw0Rx3wP1G1rQ7nUYcTfVt3yAdkZhupVKkIBKaoCLnozY3pbSe_bMk0KG2lK0y5qnkkFVGMlMOeri21lSxm9wJDfuCwuGPDHdF3KcXNdV2U3KdEwQuiw7tI8Rm7F1AfZfpG7h2x3nKxMs8R3C2zPtuNriK8xic-XUef-mxveDr-etmvP5f0a3_-A6ujU84 |
linkProvider | Scholars Portal |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6VcIALorxqKHRBIODg1vZ61_YBoZYSEvpAghb1Zsbr3RSp2CFOhPqn-I3M-BEwqhCX3uLMeJXMzn4zY8-DsacehFYYk7mhgtwNdeK5iZaxq0Eq8KQVtp6icHCoRsfh-xN5ssJ-drUwlFbZYWIN1Hmp6Rn5lggEOudobLzX0-8uTY2it6vdCI1GLfbM-Q8M2apX413c32dBMHx79GbktlMFXE3esyssqNjEIs4lpUxIkeOFETKXYGKthFU2Uxoto04SDRj_gJ-rwJowMlp5YAWue4VdDQVacqpMH77rkB-xQ6m2PE9E_larDZvTsjCbnqBGMlHP_NVTApa2YDA9K6uLHN2_8zX_MIDDm-xG67ny7UbVVtmKKW6x1RYbKv6ibWD98jY7Jr9ygXE8R-SfNZUTnMxlzvHDISDA8h04RzqcTVDG89NvHIqc40kZ7w45utGcnH2uybOnVKZae-7gupch6LtsUKDo1hjPtEG3TwTGZkmYZAGYPJGQ5YjN1jdCO8ztpJpOmy4daf2qLsLophFPSruQtrvgsB0S_ZKXemzXX5SzSdoe2VRaX2oZhgZEHoJnADJjKNyKAWIrwWEbtHFpU7C6RIp0Ow6T2FdB4DnsSc1BfTYKSuSZwKKq0vGHz__B9Oljj-l5y2RL2jhoiyfwP1H_rh7neo8T0UL3yGukZp1UqvT3ucI7O9W7mPx4SaZFKTmvMOWi5pHUljFSDrvXaOpSsiGaEDQigcPing73RN-nFF9P6zbnPoUeXhDd__fv2mDXRkcH--n--HDvAbse0HxmevburbPBfLYwD9FpnGeP6pPK2ZfLhoZfAuqDnw |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3LbtNAcFWChLggyquGQhcEAg5uba93bR8QahuihkJAQFFvZrzeTZGKE-JEqL_G1zHjFxhViEtvcWa8SuY99jwYe-xBaIUxmRsqyN1QJ56baBm7GqQCT1phqy0Kbyfq4Ch8fSyP19jPtheGyipbm1gZ6nym6Rn5jggEBueSOk5sUxbxfjh6Of_u0gYpetPartOoReTQnP3A9K18MR4ir58EwejVp_0Dt9kw4GqKpF1hQcUmFnEuqXxCihwvjJC5BBNrJayymdLoJXWSaMBcCPxcBdaEkdHKAyvw3EvsciSimHQs3u_KS9COKNW06onI32kkY3s-K8y2J2ioTNRzhdXGgM4vDOans_K8oPfv2s0_nOHoOrvWRLF8txa7dbZmihtsvbETJX_WDLN-fpMdUYy5wpyeIykXdRcFJ9eZc_wwATS2fA_OEA6nU6Tx8uQbhyLnqDXj4YhjSM0p8Oeaonwqa6ok6RaeexGEvs0GBZJug_FMGwwBRWBsloRJFoDJEwlZjnba-kZoh7ktVdN5PbEjrV7bRZjp1ORJiQtpwwWH7RHpO1yat119MVtM00Z9U2l9qWUYGhB5CJ4ByIyh1CsGiK0Eh20R49K6ebWzGuluHCaxr4LAc9ijCoNmbhQkvVNYlWU6fvf5P5A-fughPW2Q7IwYB00jBf4nmuXVw9zsYaLl0D3wBolZS5Uy_a1jeGcreueDH3ZgOpQK9QozW1U4kkY0Rsphd2pJ7SgbojtBhxI4LO7JcI_0fUjx9aQaee5TGuIF0d1__64tdgWNQvpmPDm8x64GtKqZHsN7m2ywXKzMfYwfl9mDSlE5-3LRluEXfBKHoA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Features+extraction+based+on+Naive+Bayes+algorithm+and+TF-IDF+for+news+classification&rft.jtitle=PloS+one&rft.au=Zhang%2C+Li&rft.date=2025-07-30&rft.pub=Public+Library+of+Science&rft.issn=1932-6203&rft.eissn=1932-6203&rft.volume=20&rft.issue=7&rft.spage=e0327347&rft_id=info:doi/10.1371%2Fjournal.pone.0327347&rft.externalDBID=IOV&rft.externalDocID=A849816220 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6203&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6203&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6203&client=summon |