Japanese Short Text Classification Based on CNN-BiLSTM-Attention

Due to the limited context information of the text, the traditional statistical feature-based method is difficult to effectively model the semantic relationship in the Japanese short text classification task, resulting in limited classification effect. To this end, this paper introduces the CNN-BiLS...

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 262; pp. 320 - 329
Main Authors	Chen, Tianyang, Xie, Zexian
Format	Journal Article
Language	English
Published	Elsevier B.V 2025
Subjects	CNN-BiLSTM-Attention Global Context Modeling Japanese Short Text Classification Local Feature Extraction Self-Attention Mechanism CNN-BiLSTM-Attention Self-Attention Mechanism Japanese Short Text Classification Local Feature Extraction Global Context Modeling
Online Access	Get full text

Cover

Loading…

Abstract	Due to the limited context information of the text, the traditional statistical feature-based method is difficult to effectively model the semantic relationship in the Japanese short text classification task, resulting in limited classification effect. To this end, this paper introduces the CNN-BiLSTM-Attention fusion model, which aims to fully extract the local and global features in the short text and improve the classification accuracy. First, Convolutional Neural Networks (CNNs) are used to extract local n-gram features and identify phrase patterns. Then, the global context information of the text is modeled by Bidirectional Long Short-Term Memory (BiLSTM) to capture the influence of special structures such as auxiliary words and honorifics. Finally, the self-attention mechanism (Self-Attention) assigns weights to different words, so that the model focuses on the key information of classification and reduces the interference of grammatical vocabulary. In addition, Dropout regularization and Softmax classification layer are introduced to enhance the model’s robustness and capacity for adaptation. Experimental results show that the CNN-BiLSTM-Attention model achieves the best performance in all structures, and the overall WOSS (Word Order Sensitivity Score) score is higher than other models. In the SVO (Subject-Verb-Object) structure, the model reaches 0.94, which is 20.5% higher than CNN’s 0.78, indicating that it has a more accurate understanding of standard word order sentences.
AbstractList	Due to the limited context information of the text, the traditional statistical feature-based method is difficult to effectively model the semantic relationship in the Japanese short text classification task, resulting in limited classification effect. To this end, this paper introduces the CNN-BiLSTM-Attention fusion model, which aims to fully extract the local and global features in the short text and improve the classification accuracy. First, Convolutional Neural Networks (CNNs) are used to extract local n-gram features and identify phrase patterns. Then, the global context information of the text is modeled by Bidirectional Long Short-Term Memory (BiLSTM) to capture the influence of special structures such as auxiliary words and honorifics. Finally, the self-attention mechanism (Self-Attention) assigns weights to different words, so that the model focuses on the key information of classification and reduces the interference of grammatical vocabulary. In addition, Dropout regularization and Softmax classification layer are introduced to enhance the model’s robustness and capacity for adaptation. Experimental results show that the CNN-BiLSTM-Attention model achieves the best performance in all structures, and the overall WOSS (Word Order Sensitivity Score) score is higher than other models. In the SVO (Subject-Verb-Object) structure, the model reaches 0.94, which is 20.5% higher than CNN’s 0.78, indicating that it has a more accurate understanding of standard word order sentences.
Author	Chen, Tianyang Xie, Zexian
Author_xml	– sequence: 1 givenname: Tianyang surname: Chen fullname: Chen, Tianyang email: x2978043261@163.com organization: School of Japanese and International Studies, Beijing Foreign Studies University, Beijing 100089, China – sequence: 2 givenname: Zexian surname: Xie fullname: Xie, Zexian organization: School of Cyberspace Security, University of International Relations, Beijing, 100091, China
BookMark	eNp9UMtOwzAQtFCRKKVfwCU_kGA7dR0fkGgjngrl0HC2HHstHBUnsiMEf09COXBiNdLOajWj3TlHM995QOiS4Ixgsr5qsz50OmYUU5bhCeIEzUnBeYoZFrM__AwtY2zxWHlRCMLn6OZJ9cpDhGT_1oUhqeFzSMqDitFZp9XgOp9sVQSTjKTc7dKtq_b1c7oZBvDT9gKdWnWIsPztC_R6d1uXD2n1cv9YbqpUU8JEClgIo43QjNhmTQtGG8IsFw1VRDEozAoMt5BbRjUTzYoX49xoQ8angBqSL1B-9NWhizGAlX1w7yp8SYLllINs5U8OcspB4gliVF0fVTCe9uEgyKgdeA3GBdCDNJ37V_8Nk2lozQ
Cites_doi	10.1007/s12325-022-02397-7 10.1111/jwip.12285 10.3390/make5030059 10.1007/s11042-022-13937-2 10.1007/s11604-023-01413-2 10.1007/s00521-023-08629-3 10.1007/s11042-022-14112-3 10.1007/s11063-022-10990-8 10.1007/s10462-023-10393-8 10.1109/TETCI.2023.3301774
ContentType	Journal Article
Copyright	2025
Copyright_xml	– notice: 2025
DBID	6I. AAFTH AAYXX CITATION
DOI	10.1016/j.procs.2025.05.059
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1877-0509
EndPage	329
ExternalDocumentID	10_1016_j_procs_2025_05_059 S1877050925019064
GroupedDBID	--K 0R~ 1B1 457 5VS 6I. 71M AAEDT AAEDW AAFTH AAIKJ AALRI AAQFI AAXUO AAYWO ABMAC ABWVN ACGFS ACRPL ACVFH ADBBV ADCNI ADEZE ADNMO ADVLN AEUPX AEXQZ AFPUW AFTJW AGHFR AIGII AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ E3Z EBS EJD EP3 FDB FNPLU HZ~ IXB KQ8 M41 M~E O-L O9- OK1 P2P RIG ROL SES SSZ AAYXX CITATION
ID	FETCH-LOGICAL-c2159-e099dcd9c51fb62852b15f79b2a1a5e8d4ed7fe3f52c59b478ed7bcd1025e2d13
IEDL.DBID	IXB
ISSN	1877-0509
IngestDate	Thu Jul 24 01:54:10 EDT 2025 Sat Aug 16 17:01:11 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	CNN-BiLSTM-Attention Self-Attention Mechanism Japanese Short Text Classification Local Feature Extraction Global Context Modeling
Language	English
License	This is an open access article under the CC BY-NC-ND license.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c2159-e099dcd9c51fb62852b15f79b2a1a5e8d4ed7fe3f52c59b478ed7bcd1025e2d13
OpenAccessLink	https://www.sciencedirect.com/science/article/pii/S1877050925019064
PageCount	10
ParticipantIDs	crossref_primary_10_1016_j_procs_2025_05_059 elsevier_sciencedirect_doi_10_1016_j_procs_2025_05_059
PublicationCentury	2000
PublicationDate	2025 2025-00-00
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– year: 2025 text: 2025
PublicationDecade	2020
PublicationTitle	Procedia computer science
PublicationYear	2025
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Dermawan (bib8) 2024; 27 Kuzman, Mozetič, Ljubešić (bib11) 2023; 5 Yan, Huang, Jin (bib2) 2023; 8 Manias, Mavrogiorgou, Kiourtis (bib9) 2023; 35 Jain, Kashyap (bib6) 2023; 82 Araki, Matsumoto, Togo (bib10) 2023; 40 Doi, Takegawa, Yui (bib4) 2023; 41 Liu, Shi, Zhou (bib7) 2023; 10 Alyafeai, Al-shaibani, Ghaleb (bib5) 2023; 55 Duarte, Berton (bib1) 2023; 56 Ullah, Khan, Nawi (bib3) 2023; 82 Liu (10.1016/j.procs.2025.05.059_bib7) 2023; 10 Araki (10.1016/j.procs.2025.05.059_bib10) 2023; 40 Alyafeai (10.1016/j.procs.2025.05.059_bib5) 2023; 55 Duarte (10.1016/j.procs.2025.05.059_bib1) 2023; 56 Jain (10.1016/j.procs.2025.05.059_bib6) 2023; 82 Manias (10.1016/j.procs.2025.05.059_bib9) 2023; 35 Yan (10.1016/j.procs.2025.05.059_bib2) 2023; 8 Dermawan (10.1016/j.procs.2025.05.059_bib8) 2024; 27 Kuzman (10.1016/j.procs.2025.05.059_bib11) 2023; 5 Ullah (10.1016/j.procs.2025.05.059_bib3) 2023; 82 Doi (10.1016/j.procs.2025.05.059_bib4) 2023; 41
References_xml	– volume: 82 start-page: 8137 year: 2023 end-page: 8193 ident: bib3 article-title: Review on sentiment analysis for text classification techniques from 2010 to 2021[J] publication-title: Multimedia Tools and Applications – volume: 10 start-page: 1 year: 2023 end-page: 9 ident: bib7 article-title: Emotion classification for short texts: an improved multi-label method[J] publication-title: Humanities and Social Sciences Communications – volume: 55 start-page: 2911 year: 2023 end-page: 2933 ident: bib5 article-title: Evaluating various tokenizers for Arabic text classification[J] publication-title: Neural Processing Letters – volume: 41 start-page: 900 year: 2023 end-page: 908 ident: bib4 article-title: Deep learning-based detection of patients with bone metastasis from Japanese radiology reports[J] publication-title: Japanese Journal of Radiology – volume: 35 start-page: 21415 year: 2023 end-page: 21431 ident: bib9 article-title: Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data[J] publication-title: Neural Computing and Applications – volume: 40 start-page: 934 year: 2023 end-page: 950 ident: bib10 article-title: Developement artificial intelligence models for extracting oncologic outcomes from japanese electronic health records[J] publication-title: Advances in Therapy – volume: 56 start-page: 9401 year: 2023 end-page: 9469 ident: bib1 article-title: A review of semi-supervised learning for text classification[J] publication-title: Artificial intelligence review – volume: 5 start-page: 1149 year: 2023 end-page: 1175 ident: bib11 article-title: Automatic genre identification for robust enrichment of massive text collections: Investigation of classification methods in the era of large language models[J] publication-title: Machine Learning and Knowledge Extraction – volume: 8 start-page: 350 year: 2023 end-page: 363 ident: bib2 article-title: Neural architecture search via multi-hashing embedding and graph tensor networks for multilingual text classification[J] publication-title: IEEE Transactions on Emerging Topics in Computational Intelligence – volume: 82 start-page: 16839 year: 2023 end-page: 16859 ident: bib6 article-title: Ensemble hybrid model for Hindi COVID-19 text classification with metaheuristic optimization algorithm[J] publication-title: Multimedia Tools and Applications – volume: 27 start-page: 44 year: 2024 end-page: 68 ident: bib8 article-title: Text and data mining exceptions in the development of generative AI models: What the EU member states could learn from the Japanese “nonenjoyment” purposes?[J] publication-title: The Journal of World Intellectual Property – volume: 40 start-page: 934 issue: 3 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib10 article-title: Developement artificial intelligence models for extracting oncologic outcomes from japanese electronic health records[J] publication-title: Advances in Therapy doi: 10.1007/s12325-022-02397-7 – volume: 27 start-page: 44 issue: 1 year: 2024 ident: 10.1016/j.procs.2025.05.059_bib8 article-title: Text and data mining exceptions in the development of generative AI models: What the EU member states could learn from the Japanese “nonenjoyment” purposes?[J] publication-title: The Journal of World Intellectual Property doi: 10.1111/jwip.12285 – volume: 5 start-page: 1149 issue: 3 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib11 article-title: Automatic genre identification for robust enrichment of massive text collections: Investigation of classification methods in the era of large language models[J] publication-title: Machine Learning and Knowledge Extraction doi: 10.3390/make5030059 – volume: 82 start-page: 16839 issue: 11 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib6 article-title: Ensemble hybrid model for Hindi COVID-19 text classification with metaheuristic optimization algorithm[J] publication-title: Multimedia Tools and Applications doi: 10.1007/s11042-022-13937-2 – volume: 10 start-page: 1 issue: 1 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib7 article-title: Emotion classification for short texts: an improved multi-label method[J] publication-title: Humanities and Social Sciences Communications – volume: 41 start-page: 900 issue: 8 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib4 article-title: Deep learning-based detection of patients with bone metastasis from Japanese radiology reports[J] publication-title: Japanese Journal of Radiology doi: 10.1007/s11604-023-01413-2 – volume: 35 start-page: 21415 issue: 29 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib9 article-title: Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data[J] publication-title: Neural Computing and Applications doi: 10.1007/s00521-023-08629-3 – volume: 82 start-page: 8137 issue: 6 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib3 article-title: Review on sentiment analysis for text classification techniques from 2010 to 2021[J] publication-title: Multimedia Tools and Applications doi: 10.1007/s11042-022-14112-3 – volume: 55 start-page: 2911 issue: 3 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib5 article-title: Evaluating various tokenizers for Arabic text classification[J] publication-title: Neural Processing Letters doi: 10.1007/s11063-022-10990-8 – volume: 56 start-page: 9401 issue: 9 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib1 article-title: A review of semi-supervised learning for text classification[J] publication-title: Artificial intelligence review doi: 10.1007/s10462-023-10393-8 – volume: 8 start-page: 350 issue: 1 year: 2023 ident: 10.1016/j.procs.2025.05.059_bib2 article-title: Neural architecture search via multi-hashing embedding and graph tensor networks for multilingual text classification[J] publication-title: IEEE Transactions on Emerging Topics in Computational Intelligence doi: 10.1109/TETCI.2023.3301774
SSID	ssj0000388917
Score	2.342104
Snippet	Due to the limited context information of the text, the traditional statistical feature-based method is difficult to effectively model the semantic...
SourceID	crossref elsevier
SourceType	Index Database Publisher
StartPage	320
SubjectTerms	CNN-BiLSTM-Attention Global Context Modeling Japanese Short Text Classification Local Feature Extraction Self-Attention Mechanism
Title	Japanese Short Text Classification Based on CNN-BiLSTM-Attention
URI	https://dx.doi.org/10.1016/j.procs.2025.05.059
Volume	262
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELaqsrDwRjwrD4xYxUmcxBttRVW1tANtRTerfokyhAqF_8-dkyCQEANShjjyRdE5ue-zc_6OkBsPjG1tecJQa5wlsbEsNyZlhqP4m4ZZdigGM52lo2UyXolViwyavTCYVlnH_iqmh2hdX-nW3uxuN5vunOdZhuolAOKAailqgsZJHjbxrfpf6yyodiJD4V3sz9CgER8KaV6IEyjbHYmg4Imapb8B1DfQGR6QvZot0l71QIek5Yojst9UYqD1h3lM7seAeVhLks5fgE_TBYRcGupdYiZQcD7tA15ZCieD2Yz1N4_zxZT1yrJKdzwhy-HDYjBidW0EZgCkJXPA7Kyx0gjuNW6DjDQXPpM6WvO1cLlNnM28i72IjJA6yXJoa2OBTwgXWR6fknbxVrgzQrWBe3mDf2CBG0mnrfYpN04bGEVn5Dm5bRyitpUEhmpyw15V8J9C_6k7PKB72jhN_RhJBUH6L8OL_xpekl1sVQsjV6Rdvn-4a6AKpe6Qnd7k6XnSCe_EJ9revXo
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3JTsMwELVKOcCFHbHjA9ywStI4ywGJtlB1zaWp1Jupl4hyKBUEIb6LH2TGSRBIiAMSUg5ZLWs8mvfsjN8QcpYCY5tqx2OoNc68utIsVMpnykHxNwmzbFsMZhj7nbHXm_BJhbyXe2EwrbKI_XlMt9G6uFMrrFlbzGa1kRMGAaqXAIgDqvlekVnZN2-vMG97vurewCCfu277Nml1WFFagCnAuIgZIEZa6UhxJ5W4i9CVDk-DSLpTZ8pNqD2jg9TUU-4qHkkvCOFaKg1wzI2rnTq0u0SWgX0EGA26k-bnwg7Kq0S20i92kGEPS7Ujm1eGwIQ64S63kqEokvoTIn5BufYGWSvoKW3kFtgkFTPfIutl6QdaRIJtct0DkMXilXR0DwSeJhDjqS2wialHdrRpEwBSUzhpxTFrzgajZMgaWZbnV-6Q8b9YbJdU549zs0eoVNBWqvCXL5CxyEgtU99RRipwG6OifXJRGkQscs0NUSajPQhrP4H2E5d4wOt-aTTxzXUEoMJvHx789cNTstJJhgMx6Mb9Q7KKT_JVmSNSzZ5ezDHwlEyeWL-g5O6_HfEDxUT55w
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Japanese+Short+Text+Classification+Based+on+CNN-BiLSTM-Attention&rft.jtitle=Procedia+computer+science&rft.au=Chen%2C+Tianyang&rft.au=Xie%2C+Zexian&rft.date=2025&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=262&rft.spage=320&rft.epage=329&rft_id=info:doi/10.1016%2Fj.procs.2025.05.059&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_procs_2025_05_059
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon