Features extraction based on Naive Bayes algorithm and TF-IDF for news classification

The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and ha...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 20; no. 7; p. e0327347
Main Author Zhang, Li
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 30.07.2025
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation( p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries.
AbstractList The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation( p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries.
The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation( p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries.
The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation(p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries.The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation. Although traditional methods like TF-IDF with Naive Bayes provide foundational solutions, their limitations in capturing semantic nuances and handling real-time demands hinder practical applications. This study proposes a hybrid news classification framework that integrates classical machine learning with modern advances in NLP to address these challenges. Our methodology introduces three key innovations: (1) Domain-Specific Feature Engineering, combining tailored n-grams and entity-aware TF-IDF weighting to amplify discriminative terms; (2) BERT-Guided Feature Selection, leveraging distilled BERT to identify contextually important words and resolve rare-term ambiguities; and (3) Computationally Efficient Deployment, achieving 95.2% of the accuracy of BERT at 1/52.4th of the inference cost. Evaluated on a balanced corpus of Sina News articles in 11 categories, the system demonstrates a test precision of 95.12% (vs. 84.43% for SVM+TF-IDF baseline), with statistically significant improvements confirmed by 5-fold cross-validation(p < 0.01). The critical findings reveal strong performance in distinguishing semantically distinct categories, while exposing challenges in fine-grained differentiation. The efficiency of the framework (2.1 inference latency) and scalability (linear utilization of CPU resources) validate its practicality for real-world deployment. This work bridges the gap between traditional feature engineering and transformer-based models, offering a cost-effective solution for news platforms. Future research will explore hierarchical classification and the adaptation of dynamic topics to further refine semantic boundaries.
Audience Academic
Author Zhang, Li
AuthorAffiliation School of Artificial Intelligence, Zhejiang College of Security Technology, Wenzhou, Zhejiang, China
Philadelphia University, JORDAN
AuthorAffiliation_xml – name: School of Artificial Intelligence, Zhejiang College of Security Technology, Wenzhou, Zhejiang, China
– name: Philadelphia University, JORDAN
Author_xml – sequence: 1
  givenname: Li
  orcidid: 0009-0003-1535-4987
  surname: Zhang
  fullname: Zhang, Li
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40737302$$D View this record in MEDLINE/PubMed
BookMark eNqNk1Fv0zAQxyM0xLbCN0AQCQnBQ4sdx47zhMagUGliEgxeratzbl2lcbGTwb49zppNDdoD8oOd889_x_-7O02OGtdgkjynZEZZQd9tXOcbqGe7GJ4RlhUsLx4lJ7Rk2VRkhB0drI-T0xA2hHAmhXiSHOekYAUj2UnyY47Qdh5Din9aD7q1rkmXELBK4-Ir2GtMP8BN3Id65bxt19sUmiq9mk8XH-epcT5t8HdIdQ0hWGM19ApPk8cG6oDPhnkS7_l0df5lenH5eXF-djHVnJfFlBkQEiWTFUdjkLMqfiDjFQeUWjAjzFJoIjNdlhpkToBWIjOYF6gFAcMmycu97q52QQ2OBMUylpclzyiJxGJPVA42auftFvyNcmDVbcD5lQLfWl2j4oZyzfMcgVU5EARYIgrKiASQhkPUej_c1i23WGlsomP1SHS809i1WrlrRTNGCYkZmiRvBgXvfnUYWrW1QWNdQ4Ouu_1xHlMpCxHRV_-gDz9voFYQX2Ab4_ok9qLqTOalpCLLemr2ABVHhVurY_0YG-OjA29HByLTxvpYQReCWnz_9v_s5c8x-_qAXSPU7Tq4uutrJozBF4dW33t8V7gRyPeA9i4Ej-YeoUT1_XFnl-r7Qw39wf4Cix8BhA
Cites_doi 10.1109/ETCS.2010.248
10.3115/v1/D14-1181
10.1109/TSSA48701.2019.8985458
10.1109/ICCITechn.2014.6997369
10.1016/j.neucom.2019.01.078
10.1109/ICWS55610.2022.00064
10.1109/CONIT59222.2023.10205870
10.1109/SmartNets58706.2023.10215867
10.1109/MECO58584.2023.10155036
10.1007/978-3-642-86659-3
10.1109/ICCECT57938.2023.10141054
ContentType Journal Article
Copyright Copyright: © 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
COPYRIGHT 2025 Public Library of Science
2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2025 Zhang 2025 Zhang
2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: Copyright: © 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
– notice: COPYRIGHT 2025 Public Library of Science
– notice: 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2025 Zhang 2025 Zhang
– notice: 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
IOV
ISR
3V.
7QG
7QL
7QO
7RV
7SN
7SS
7T5
7TG
7TM
7U9
7X2
7X7
7XB
88E
8AO
8C1
8FD
8FE
8FG
8FH
8FI
8FJ
8FK
ABJCF
ABUWG
AEUYN
AFKRA
ARAPS
ATCPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
C1K
CCPQU
D1I
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
H94
HCIFZ
K9.
KB.
KB0
KL.
L6V
LK8
M0K
M0S
M1P
M7N
M7P
M7S
NAPCQ
P5Z
P62
P64
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PTHSS
PYCSY
RC3
7X8
5PM
DOA
DOI 10.1371/journal.pone.0327347
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Opposing Viewpoints
Gale In Context: Science
ProQuest Central (Corporate)
Animal Behavior Abstracts
Bacteriology Abstracts (Microbiology B)
Biotechnology Research Abstracts
Nursing & Allied Health Database
Ecology Abstracts
Entomology Abstracts (Full archive)
Immunology Abstracts
Meteorological & Geoastrophysical Abstracts
Nucleic Acids Abstracts
Virology and AIDS Abstracts
Agricultural Science Collection
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest Pharma Collection
Public Health Database
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection (ProQuest)
ProQuest Central (Alumni)
ProQuest One Sustainability
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
Agricultural & Environmental Science Collection
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Technology Collection
Natural Science Collection
Environmental Sciences and Pollution Management
ProQuest One Community College
ProQuest Materials Science Collection
ProQuest Central Korea
Engineering Research Database
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
AIDS and Cancer Research Abstracts
ProQuest SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
Materials Science Database
Nursing & Allied Health Database (Alumni Edition)
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest Engineering Collection
ProQuest Biological Science Collection
Agricultural Science Database
ProQuest Health & Medical Collection
Medical Database
Algology Mycology and Protozoology Abstracts (Microbiology C)
Biological Science Database
Engineering Database
ProQuest Nursing & Allied Health Premium
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
Biotechnology and BioEngineering Abstracts
Environmental Science Database
Materials Science Collection (ProQuest)
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
Engineering Collection
Environmental Science Collection
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ (Directory of Open Access Journals)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Agricultural Science Database
Publicly Available Content Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Nucleic Acids Abstracts
SciTech Premium Collection
Environmental Sciences and Pollution Management
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Meteorological & Geoastrophysical Abstracts
Natural Science Collection
Health & Medical Research Collection
Biological Science Collection
ProQuest Central (New)
ProQuest Medical Library (Alumni)
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
Virology and AIDS Abstracts
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Agricultural Science Collection
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
Ecology Abstracts
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
Environmental Science Collection
Entomology Abstracts
Nursing & Allied Health Premium
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Environmental Science Database
ProQuest Nursing & Allied Health Source (Alumni)
Engineering Research Database
ProQuest One Academic
Meteorological & Geoastrophysical Abstracts - Academic
ProQuest One Academic (New)
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
Materials Science Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Pharma Collection
ProQuest Central
ProQuest Health & Medical Research Collection
Genetics Abstracts
ProQuest Engineering Collection
Biotechnology Research Abstracts
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Bacteriology Abstracts (Microbiology B)
Algology Mycology and Protozoology Abstracts (Microbiology C)
Agricultural & Environmental Science Collection
AIDS and Cancer Research Abstracts
Materials Science Database
ProQuest Materials Science Collection
ProQuest Public Health
ProQuest Nursing & Allied Health Source
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest Medical Library
Animal Behavior Abstracts
Materials Science & Engineering Collection
Immunology Abstracts
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList


Agricultural Science Database

MEDLINE

MEDLINE - Academic
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ (Directory of Open Access Journals)
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 4
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
DocumentTitleAlternate Naive Bayes-TF-IDF for news classification
EISSN 1932-6203
ExternalDocumentID 3234995210
oai_doaj_org_article_5f15c544ea3d4a0eaabee61308aa8f5a
PMC12310027
A849816220
40737302
10_1371_journal_pone_0327347
Genre Journal Article
GeographicLocations China
GeographicLocations_xml – name: China
GrantInformation_xml – grantid: 26NDJC123YB
– grantid: 243049
GroupedDBID ---
123
29O
2WC
53G
5VS
7RV
7X2
7X7
7XC
88E
8AO
8C1
8CJ
8FE
8FG
8FH
8FI
8FJ
A8Z
AAFWJ
AAUCC
AAWOE
AAYXX
ABDBF
ABIVO
ABJCF
ABUWG
ACGFO
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHMBA
ALMA_UNASSIGNED_HOLDINGS
AOIJS
APEBS
ARAPS
ATCPS
BAWUL
BBNVY
BCNDV
BENPR
BGLVJ
BHPHI
BKEYQ
BPHCQ
BVXVI
BWKFM
CCPQU
CITATION
CS3
D1I
D1J
D1K
DIK
DU5
E3Z
EAP
EAS
EBD
EMOBN
ESX
EX3
F5P
FPL
FYUFA
GROUPED_DOAJ
GX1
HCIFZ
HH5
HMCUK
HYE
IAO
IEA
IGS
IHR
IHW
INH
INR
IOV
IPY
ISE
ISR
ITC
K6-
KB.
KQ8
L6V
LK5
LK8
M0K
M1P
M48
M7P
M7R
M7S
M~E
NAPCQ
O5R
O5S
OK1
OVT
P2P
P62
PATMY
PDBOC
PHGZM
PHGZT
PIMPY
PJZUB
PPXIY
PQGLB
PQQKQ
PROAC
PSQYO
PTHSS
PV9
PYCSY
RNS
RPM
RZL
SV3
TR2
UKHRP
WOQ
WOW
~02
~KM
ADRAZ
ALIPV
CGR
CUY
CVF
ECM
EIF
IPNFZ
NPM
RIG
3V.
7QG
7QL
7QO
7SN
7SS
7T5
7TG
7TM
7U9
7XB
8FD
8FK
AZQEC
C1K
DWQXO
FR3
GNUQQ
H94
K9.
KL.
M7N
P64
PKEHL
PQEST
PQUKI
RC3
7X8
5PM
PUEGO
ID FETCH-LOGICAL-c5597-3fa68e838d5effe53de83e35d5ae8c63f6fb6c082c99ca840a1d62fe47ec60af3
IEDL.DBID M48
ISSN 1932-6203
IngestDate Sun Aug 31 00:08:01 EDT 2025
Wed Aug 27 01:31:58 EDT 2025
Thu Aug 21 18:33:36 EDT 2025
Thu Jul 31 18:30:47 EDT 2025
Fri Aug 01 05:20:51 EDT 2025
Wed Aug 13 23:53:30 EDT 2025
Tue Aug 12 03:41:20 EDT 2025
Sat Aug 09 03:20:51 EDT 2025
Sat Aug 09 03:21:01 EDT 2025
Tue Aug 12 02:16:02 EDT 2025
Sun Aug 03 01:50:47 EDT 2025
Wed Aug 20 07:46:04 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License Copyright: © 2025 Zhang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Creative Commons Attribution License
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c5597-3fa68e838d5effe53de83e35d5ae8c63f6fb6c082c99ca840a1d62fe47ec60af3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
ORCID 0009-0003-1535-4987
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.1371/journal.pone.0327347
PMID 40737302
PQID 3234995210
PQPubID 1436336
PageCount e0327347
ParticipantIDs plos_journals_3234995210
doaj_primary_oai_doaj_org_article_5f15c544ea3d4a0eaabee61308aa8f5a
pubmedcentral_primary_oai_pubmedcentral_nih_gov_12310027
proquest_miscellaneous_3235032876
proquest_journals_3234995210
gale_infotracmisc_A849816220
gale_infotracacademiconefile_A849816220
gale_incontextgauss_ISR_A849816220
gale_incontextgauss_IOV_A849816220
gale_healthsolutions_A849816220
pubmed_primary_40737302
crossref_primary_10_1371_journal_pone_0327347
PublicationCentury 2000
PublicationDate 20250730
PublicationDateYYYYMMDD 2025-07-30
PublicationDate_xml – month: 7
  year: 2025
  text: 20250730
  day: 30
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: San Francisco
– name: San Francisco, CA USA
PublicationTitle PloS one
PublicationTitleAlternate PLoS One
PublicationYear 2025
Publisher Public Library of Science
Public Library of Science (PLoS)
Publisher_xml – name: Public Library of Science
– name: Public Library of Science (PLoS)
References pone.0327347.ref011
pone.0327347.ref010
SR Waheed (pone.0327347.ref013) 2023
pone.0327347.ref032
X Rong (pone.0327347.ref012) 2014
pone.0327347.ref035
pone.0327347.ref034
A Vaswani (pone.0327347.ref025) 2023; 78
A Vaswani (pone.0327347.ref027) 2023; 78
pone.0327347.ref031
X Rong (pone.0327347.ref026) 2023; 55
pone.0327347.ref030
pone.0327347.ref019
pone.0327347.ref015
S Kumar (pone.0327347.ref024) 2024; 19
pone.0327347.ref014
pone.0327347.ref036
pone.0327347.ref017
pone.0327347.ref016
R Gupta (pone.0327347.ref033) 2023; 24
(pone.0327347.ref021) 2024; 11
BH Li (pone.0327347.ref001) 2022; 33
HF Zhang (pone.0327347.ref008) 2022; 42
pone.0327347.ref023
M Das (pone.0327347.ref009) 2023
pone.0327347.ref020
S Ohno (pone.0327347.ref002) 1970
pone.0327347.ref007
pone.0327347.ref004
pone.0327347.ref003
pone.0327347.ref028
pone.0327347.ref005
N Lestari (pone.0327347.ref018) 2023; 14
X Zhang (pone.0327347.ref029) 2023; 34
G Liu (pone.0327347.ref006) 2019; 337
Y Wang (pone.0327347.ref022) 2025; 29
References_xml – ident: pone.0327347.ref003
– volume: 78
  start-page: 345
  issue: 2
  year: 2023
  ident: pone.0327347.ref025
  article-title: Interpretable feature selection with BERT
  publication-title: J Artif Intell Res
– ident: pone.0327347.ref028
– year: 2014
  ident: pone.0327347.ref012
  article-title: Word2vec parameter learning explained
  publication-title: arXiv preprint
– ident: pone.0327347.ref030
– start-page: 1
  year: 2023
  ident: pone.0327347.ref013
  article-title: CNN deep learning-based image to vector depiction
  publication-title: Multim Tools Appl
– ident: pone.0327347.ref032
– ident: pone.0327347.ref007
– ident: pone.0327347.ref015
  doi: 10.1109/ETCS.2010.248
– ident: pone.0327347.ref005
  doi: 10.3115/v1/D14-1181
– volume: 42
  start-page: 1116
  issue: 4
  year: 2022
  ident: pone.0327347.ref008
  article-title: News topic text classification method based on BERT and feature projection network
  publication-title: J Comput Appl
– ident: pone.0327347.ref020
  doi: 10.1109/TSSA48701.2019.8985458
– ident: pone.0327347.ref011
  doi: 10.1109/ICCITechn.2014.6997369
– volume: 337
  start-page: 325
  year: 2019
  ident: pone.0327347.ref006
  article-title: Bidirectional LSTM with attention mechanism and convolutional layer for text classification
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2019.01.078
– ident: pone.0327347.ref019
  doi: 10.1109/ICWS55610.2022.00064
– volume: 24
  start-page: 123
  issue: 3
  year: 2023
  ident: pone.0327347.ref033
  article-title: Explainable AI for news classification: a case study on BERT and LIME
  publication-title: J Mach Learn Res
– volume: 33
  start-page: 3565
  issue: 10
  year: 2022
  ident: pone.0327347.ref001
  article-title: Short text classification model combining knowledge aware and dual attention
  publication-title: J Softw
– ident: pone.0327347.ref035
– volume: 19
  issue: 2
  year: 2024
  ident: pone.0327347.ref024
  article-title: CPU-optimized transformer inference
  publication-title: ACM Trans Archit
– year: 2023
  ident: pone.0327347.ref009
  article-title: A comparative study on tf-idf feature weighting method and its analysis using unstructured dataset
  publication-title: arXiv preprint
– ident: pone.0327347.ref004
– ident: pone.0327347.ref031
– volume: 14
  start-page: 88
  issue: 1
  year: 2023
  ident: pone.0327347.ref018
  article-title: Implementation of text mining and pattern discovery with Naive Bayes algorithm for classification of text documents
  publication-title: Digital Zone: Jurnal Teknologi Informasi dan Komunikasi
– ident: pone.0327347.ref016
  doi: 10.1109/CONIT59222.2023.10205870
– ident: pone.0327347.ref023
– ident: pone.0327347.ref010
  doi: 10.1109/SmartNets58706.2023.10215867
– volume: 55
  start-page: 1
  issue: 1
  year: 2023
  ident: pone.0327347.ref026
  article-title: Dimensionality reduction in NLP: tradeoffs
  publication-title: Neural Process. Lett
– volume: 11
  start-page: 35550
  year: 2024
  ident: pone.0327347.ref021
  article-title: BERT-enhanced TF-IDF for legal document classification
  publication-title: IEEE Access
– volume: 78
  start-page: 345
  issue: 2
  year: 2023
  ident: pone.0327347.ref027
  article-title: Transformer-based models for multimodal news classification: a comprehensive review
  publication-title: J Artif Intell Res
– ident: pone.0327347.ref017
  doi: 10.1109/MECO58584.2023.10155036
– volume: 34
  start-page: 789
  issue: 5
  year: 2023
  ident: pone.0327347.ref029
  article-title: Cross-lingual news classification using multilingual BERT
  publication-title: IEEE Trans Neural Netw Learn Syst
– ident: pone.0327347.ref036
– volume: 29
  start-page: 112
  issue: 3
  year: 2025
  ident: pone.0327347.ref022
  article-title: Quantifying contextual shift in news categories
  publication-title: IEEE J-BHI
– ident: pone.0327347.ref034
– volume-title: Evolution by gene duplication
  year: 1970
  ident: pone.0327347.ref002
  doi: 10.1007/978-3-642-86659-3
– ident: pone.0327347.ref014
  doi: 10.1109/ICCECT57938.2023.10141054
SSID ssj0053866
Score 2.481495
Snippet The rapid proliferation of online news demands robust automated classification systems to enhance information organization and personalized recommendation....
SourceID plos
doaj
pubmedcentral
proquest
gale
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
StartPage e0327347
SubjectTerms Accuracy
Algorithms
Bayes Theorem
Biology and Life Sciences
Classification
Classification systems
Computational linguistics
Computer and Information Sciences
Datasets
Deep learning
Documents
Electronic news gathering
Engineering and Technology
Humans
Identification and classification
Inference
Knowledge representation
Language processing
Latency
Machine Learning
Methods
Natural language interfaces
Natural Language Processing
Neural networks
News
Physical Sciences
Real time
Research and Analysis Methods
Semantics
Social Sciences
Statistical analysis
Support vector machines
Text categorization
SummonAdditionalLinks – databaseName: DOAJ (Directory of Open Access Journals)
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Lb9NAEF6hnLggyqumBRaEBBzc2t6H7WMLVC0SRQKKeluN99FWKnaUx4F_z8zasWpUCQ7c4szYSua13ySz3zL2OgMZhPdNKjW4VNo6S2urqtSC0pCpIEI8ReHzqT4-k5_O1fmNo75oJqynB-4Nt69CrqyS0oNwEjIP0HhPoLcCqIKK0AjXvE0z1ddgzGKth41yosz3B7_szbvW72WCKF3KyUIU-frHqjybX3fL2yDnn5OTN5aio_vs3oAh-UH_2bfYHd8-YFtDli7524FK-t1DdkYIb40dNccavOj3MHBauBzHF6eApY4fwi-Uw_VFt7haXf7k0DqOMXvy4YgjoOUEu7kljE1DRdGPj_C5H7-_P06HgxRSSw1DKgLoyleicoqmRJRweOGFcgp8ZbUIOjTaIhiwdW0BWz7InS6Cl6W3OoMgHrNZi6bbZryxHgGYKHxoalk3BXhXK2gcVsmQe2ETlm6sauY9X4aJf5qV2Gf05jHkBTN4IWGHZPpRl9iu4xsYA2aIAfO3GEjYC3Kc6beOjjlrDipZV7kuiixhr6IGMV60NFJzAevl0px8-fEPSt--TpTeDEqhI8fBsI0BvxMxaU00dyeamLd2It6mMNtYZWlEIbD9RDhFd25C73bxy1FMD6UxudZ366ijiCCx1Al70kfqaFns3AWW8yJh1SSGJ6afStqry0g4nlMTkBXl0__hrB12t6AzlOn38WyXzVaLtX-GwG7VPI85_Bv0CU7h
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: ProQuest Technology Collection
  dbid: 8FG
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9NAEF5BuHBBlFdNCywICTi4tb3e9fqEWiC0SBQJKOrNGu8jRSp2GicH_j0z9iZgVCFuSWZi2fPab9azM4w9TyD3wrk6zhXYODdlEpdG6tiAVJBIL3w_ReHjiTo6zT-cybOw4daFssp1TOwDtW0N7ZHvi0wgOMfFJnk9v4xpahS9XQ0jNK6zGymuNFTSpafv15EYfVmpcFxOFOl-0M7evG3cXiKosUsxWo76rv2b2DyZX7TdVcDz7_rJPxak6W12KyBJfjCofotdc80dthV8teMvQ0PpV3fZKeG8FebVHCPxYjjJwGn5shw_nAAGPH4IP5EOFzN85uX5Dw6N5Wi5x2-nHGEtJ_DNDSFtKi3qtXkPr_vu65ujOIxTiA2lDbHwoLTTQltJtSJSWPzihLQSnDZKeOVrZRASmLI0gIkfpFZl3uWFMyoBL-6zSYOi22a8Ng5hmMicr8u8rDNwtpRQW4yVPnXCRCxeS7WaD10zqv7VWYHZxiCeirRQBS1E7JBEv-Glntf9D-1iVgUXqqRPpZF57kDYHBIHUDtH6Y8G0F5CxJ6Q4qrhAOnGc6sDnZc6VVmWROxZz0F9LxoqrJnBquuq40_f_oPpy-cR04vA5FtSHITDDPhM1E9rxLk74kTvNSPyNpnZWipd9dvO8Z9r07ua_HRDpotSsVzj2lXPI6lNYqEi9mCw1I1kMX8XGNSziOmRDY9EP6Y038_7tuMppQJJVjz8933tsJsZzUim_e9kl02Wi5V7hMBtWT_uvfMXQW5E6Q
  priority: 102
  providerName: ProQuest
Title Features extraction based on Naive Bayes algorithm and TF-IDF for news classification
URI https://www.ncbi.nlm.nih.gov/pubmed/40737302
https://www.proquest.com/docview/3234995210
https://www.proquest.com/docview/3235032876
https://pubmed.ncbi.nlm.nih.gov/PMC12310027
https://doaj.org/article/5f15c544ea3d4a0eaabee61308aa8f5a
http://dx.doi.org/10.1371/journal.pone.0327347
Volume 20
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELe27oUXxPhaxigGIQEPqZI4dpIHhNaxsiGtoEHR3qKLY3eTSlKaVmL_PXdpGhFUpL1ESe9iqffl38X2HWOvPQitMCZzQwW5G-rEcxMtY1eDVOBJK2zdReFirM4m4ecrebXDNj1bGwFWW1M76ic1WcwGv3_dfkCHf193bYj8zUuDeVmYgSeoYEu0y_Zwboqop8FF2K4roHfXq5eEWlwVeKI5TPe_UTqTVV3Tv43cvfmsrLbB0n93V_41XY0esPsNzuTHa8PYZzumeMj2G0-u-Num3PS7R2xCKHCFWTfHOL1Yn3PgNLnlHG_GgOGQD-EW6TCbloub5fVPDkXO0a7PP444gl5O0JxrwuG08ajW9WMc9_T7yZnbNFtwNSUVrrCgYhOLOJe0k0SKHB-MkLkEE2slrLKZ0ggYdJJowLQQ_FwF1oSR0coDK56wXoGiO2A80wZBmgiMzZIwyQIweSIhyzGSWt8I7TB3I9V0vq6pkdYLaxHmImvxpKSFtNGCw4Yk-paXKmLXP5SLado4WCqtL7UMQwMiD8EzAJkxlBzFALGV4LAXpLh0fby09ev0OA6T2FdB4DnsVc1BVTEK2nYzhVVVpedfftyB6dtlh-lNw2RLUhw0Rx3wP1G1rQ7nUYcTfVt3yAdkZhupVKkIBKaoCLnozY3pbSe_bMk0KG2lK0y5qnkkFVGMlMOeri21lSxm9wJDfuCwuGPDHdF3KcXNdV2U3KdEwQuiw7tI8Rm7F1AfZfpG7h2x3nKxMs8R3C2zPtuNriK8xic-XUef-mxveDr-etmvP5f0a3_-A6ujU84
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF6VcIALorxqKHRBIODg1vZ61_YBoZYSEvpAghb1Zsbr3RSp2CFOhPqn-I3M-BEwqhCX3uLMeJXMzn4zY8-DsacehFYYk7mhgtwNdeK5iZaxq0Eq8KQVtp6icHCoRsfh-xN5ssJ-drUwlFbZYWIN1Hmp6Rn5lggEOudobLzX0-8uTY2it6vdCI1GLfbM-Q8M2apX413c32dBMHx79GbktlMFXE3esyssqNjEIs4lpUxIkeOFETKXYGKthFU2Uxoto04SDRj_gJ-rwJowMlp5YAWue4VdDQVacqpMH77rkB-xQ6m2PE9E_larDZvTsjCbnqBGMlHP_NVTApa2YDA9K6uLHN2_8zX_MIDDm-xG67ny7UbVVtmKKW6x1RYbKv6ibWD98jY7Jr9ygXE8R-SfNZUTnMxlzvHDISDA8h04RzqcTVDG89NvHIqc40kZ7w45utGcnH2uybOnVKZae-7gupch6LtsUKDo1hjPtEG3TwTGZkmYZAGYPJGQ5YjN1jdCO8ztpJpOmy4daf2qLsLophFPSruQtrvgsB0S_ZKXemzXX5SzSdoe2VRaX2oZhgZEHoJnADJjKNyKAWIrwWEbtHFpU7C6RIp0Ow6T2FdB4DnsSc1BfTYKSuSZwKKq0vGHz__B9Oljj-l5y2RL2jhoiyfwP1H_rh7neo8T0UL3yGukZp1UqvT3ucI7O9W7mPx4SaZFKTmvMOWi5pHUljFSDrvXaOpSsiGaEDQigcPing73RN-nFF9P6zbnPoUeXhDd__fv2mDXRkcH--n--HDvAbse0HxmevburbPBfLYwD9FpnGeP6pPK2ZfLhoZfAuqDnw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3LbtNAcFWChLggyquGQhcEAg5uba93bR8QahuihkJAQFFvZrzeTZGKE-JEqL_G1zHjFxhViEtvcWa8SuY99jwYe-xBaIUxmRsqyN1QJ56baBm7GqQCT1phqy0Kbyfq4Ch8fSyP19jPtheGyipbm1gZ6nym6Rn5jggEBueSOk5sUxbxfjh6Of_u0gYpetPartOoReTQnP3A9K18MR4ir58EwejVp_0Dt9kw4GqKpF1hQcUmFnEuqXxCihwvjJC5BBNrJayymdLoJXWSaMBcCPxcBdaEkdHKAyvw3EvsciSimHQs3u_KS9COKNW06onI32kkY3s-K8y2J2ioTNRzhdXGgM4vDOans_K8oPfv2s0_nOHoOrvWRLF8txa7dbZmihtsvbETJX_WDLN-fpMdUYy5wpyeIykXdRcFJ9eZc_wwATS2fA_OEA6nU6Tx8uQbhyLnqDXj4YhjSM0p8Oeaonwqa6ok6RaeexGEvs0GBZJug_FMGwwBRWBsloRJFoDJEwlZjnba-kZoh7ktVdN5PbEjrV7bRZjp1ORJiQtpwwWH7RHpO1yat119MVtM00Z9U2l9qWUYGhB5CJ4ByIyh1CsGiK0Eh20R49K6ebWzGuluHCaxr4LAc9ijCoNmbhQkvVNYlWU6fvf5P5A-fughPW2Q7IwYB00jBf4nmuXVw9zsYaLl0D3wBolZS5Uy_a1jeGcreueDH3ZgOpQK9QozW1U4kkY0Rsphd2pJ7SgbojtBhxI4LO7JcI_0fUjx9aQaee5TGuIF0d1__64tdgWNQvpmPDm8x64GtKqZHsN7m2ywXKzMfYwfl9mDSlE5-3LRluEXfBKHoA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Features+extraction+based+on+Naive+Bayes+algorithm+and+TF-IDF+for+news+classification&rft.jtitle=PloS+one&rft.au=Zhang%2C+Li&rft.date=2025-07-30&rft.pub=Public+Library+of+Science&rft.issn=1932-6203&rft.eissn=1932-6203&rft.volume=20&rft.issue=7&rft.spage=e0327347&rft_id=info:doi/10.1371%2Fjournal.pone.0327347&rft.externalDBID=IOV&rft.externalDocID=A849816220
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1932-6203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1932-6203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1932-6203&client=summon