AB-LaBSE: Uyghur Sentiment Analysis via the Pre-Training Model with BiLSTM

In recent years, more and more attention has been paid to text sentiment analysis, which has gradually become a research hotspot in information extraction, data mining, Natural Language Processing (NLP), and other fields. With the gradual popularization of the Internet, sentiment analysis of Uyghur...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 12; no. 3; p. 1182
Main Authors	Pei, Yijie, Chen, Siqi, Ke, Zunwang, Silamu, Wushour, Guo, Qinglang
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.02.2022
Subjects	Accuracy BiLSTM cross-lingual pre-trained language model data augmentation Datasets Deep learning Dictionaries Language low-resource Machine learning Methods Neural networks Semantics Sentiment analysis Sparsity
Online Access	Get full text

Cover

Loading…

Abstract	In recent years, more and more attention has been paid to text sentiment analysis, which has gradually become a research hotspot in information extraction, data mining, Natural Language Processing (NLP), and other fields. With the gradual popularization of the Internet, sentiment analysis of Uyghur texts has great research and application value in online public opinion. For low-resource languages, most state-of-the-art systems require tens of thousands of annotated sentences to get high performance. However, there is minimal annotated data available about Uyghur sentiment analysis tasks. There are also specificities in each task—differences in words and word order across languages make it a challenging problem. In this paper, we present an effective solution to providing a meaningful and easy-to-use feature extractor for sentiment analysis tasks: using the pre-trained language model with BiLSTM layer. Firstly, data augmentation is carried out by AEDA (An Easier Data Augmentation), and the augmented dataset is constructed to improve the performance of text classification tasks. Then, a pretraining model LaBSE is used to encode the input data. Then, BiLSTM is used to learn more context information. Finally, the validity of the model is verified via two categories datasets for sentiment analysis and five categories datasets for emotion analysis. We evaluated our approach on two datasets, which showed wonderful performance compared to some strong baselines. We close with an overview of the resources for sentiment analysis tasks and some of the open research questions. Therefore, we propose a combined deep learning and cross-language pretraining model for two low resource expectations.
AbstractList	In recent years, more and more attention has been paid to text sentiment analysis, which has gradually become a research hotspot in information extraction, data mining, Natural Language Processing (NLP), and other fields. With the gradual popularization of the Internet, sentiment analysis of Uyghur texts has great research and application value in online public opinion. For low-resource languages, most state-of-the-art systems require tens of thousands of annotated sentences to get high performance. However, there is minimal annotated data available about Uyghur sentiment analysis tasks. There are also specificities in each task—differences in words and word order across languages make it a challenging problem. In this paper, we present an effective solution to providing a meaningful and easy-to-use feature extractor for sentiment analysis tasks: using the pre-trained language model with BiLSTM layer. Firstly, data augmentation is carried out by AEDA (An Easier Data Augmentation), and the augmented dataset is constructed to improve the performance of text classification tasks. Then, a pretraining model LaBSE is used to encode the input data. Then, BiLSTM is used to learn more context information. Finally, the validity of the model is verified via two categories datasets for sentiment analysis and five categories datasets for emotion analysis. We evaluated our approach on two datasets, which showed wonderful performance compared to some strong baselines. We close with an overview of the resources for sentiment analysis tasks and some of the open research questions. Therefore, we propose a combined deep learning and cross-language pretraining model for two low resource expectations.
Author	Ke, Zunwang Chen, Siqi Guo, Qinglang Pei, Yijie Silamu, Wushour
Author_xml	– sequence: 1 givenname: Yijie orcidid: 0000-0002-0974-4378 surname: Pei fullname: Pei, Yijie – sequence: 2 givenname: Siqi surname: Chen fullname: Chen, Siqi – sequence: 3 givenname: Zunwang orcidid: 0000-0002-2589-8377 surname: Ke fullname: Ke, Zunwang – sequence: 4 givenname: Wushour surname: Silamu fullname: Silamu, Wushour – sequence: 5 givenname: Qinglang surname: Guo fullname: Guo, Qinglang
BookMark	eNptUdlqGzEUFSWBpk6e-gOCPpZJtM2ivtkhTVJsUrDzLLRc2TKTkSvJKf77TOoWQsmFu3A553CXT-hkiAMg9JmSS84ludK7HWWEU9qxD-iMkbapuKDtyZv6I7rIeUtGk5R3lJyhH9NZNdez5c03_HhYb_YJL2Eo4WkMeDro_pBDxs9B47IB_DNBtUo6DGFY40V00OPfoWzwLMyXq8U5OvW6z3DxN0_Q4_eb1fVdNX-4vb-ezivLG1Eqa7SgXStrQ43kghBrCPMaiG-811RKMrqTApgl0korXGta6YwFaGoLnk_Q_VHXRb1VuxSedDqoqIP604hprXQqwfagOpDON7XxDojoPEjqTVtTa5jTQjs-an05au1S_LWHXNQ27tO4d1as4Q1lXNRsRNEjyqaYcwKvbCi6hDiU8Rq9okS9fkC9-cDI-fof59-k76FfAHNKh58
CitedBy_id	crossref_primary_10_1109_ACCESS_2023_3289295 crossref_primary_10_3390_app12041840 crossref_primary_10_1007_s11042_023_16062_w crossref_primary_10_1371_journal_pone_0308317 crossref_primary_10_3390_electronics11121906 crossref_primary_10_1016_j_jbi_2022_104145 crossref_primary_10_1016_j_knosys_2023_110838 crossref_primary_10_3390_electronics11213513 crossref_primary_10_1080_24751839_2023_2173843 crossref_primary_10_3390_data8030046
Cites_doi	10.3115/v1/P15-2060 10.18653/v1/D16-1058 10.1007/s12559-021-09831-y 10.18653/v1/2021.acl-long.265 10.18653/v1/D17-1047 10.1609/aaai.v35i16.17659 10.3115/v1/W14-4012 10.18653/v1/2020.coling-main.305 10.1007/s11042-019-07788-7 10.18653/v1/2020.acl-main.341 10.18653/v1/2020.acl-main.747 10.1007/978-981-10-8569-7_23 10.18653/v1/2021.findings-emnlp.234 10.1109/ACCESS.2020.2978511 10.1007/s12652-020-01791-9 10.1016/j.neunet.2005.06.042 10.18653/v1/D19-1670 10.1162/tacl_a_00288 10.1609/aaai.v35i15.17597 10.18653/v1/D19-1410 10.18653/v1/N18-1202 10.1007/978-981-13-2354-6_7 10.18653/v1/2020.findings-emnlp.156 10.1016/j.asej.2014.04.011 10.18653/v1/P16-1162
ContentType	Journal Article
Copyright	2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	AAYXX CITATION ABUWG AFKRA AZQEC BENPR CCPQU DWQXO PHGZM PHGZT PIMPY PKEHL PQEST PQQKQ PQUKI DOA
DOI	10.3390/app12031182
DatabaseName	CrossRef ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central ProQuest One ProQuest Central ProQuest Central Premium ProQuest One Academic Publicly Available Content (ProQuest) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef Publicly Available Content Database ProQuest Central ProQuest One Academic Middle East (New) ProQuest One Academic UKI Edition ProQuest Central Essentials ProQuest Central Korea ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New)
DatabaseTitleList	Publicly Available Content Database CrossRef
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Sciences (General)
EISSN	2076-3417
ExternalDocumentID	oai_doaj_org_article_8e9df65bfde048fe91fb751cb2da4ad3 10_3390_app12031182
GroupedDBID	.4S 2XV 5VS 7XC 8CJ 8FE 8FG 8FH AADQD AAFWJ AAYXX ADBBV ADMLS AFKRA AFPKN AFZYC ALMA_UNASSIGNED_HOLDINGS APEBS ARCSS BCNDV BENPR CCPQU CITATION CZ9 D1I D1J D1K GROUPED_DOAJ IAO IGS ITC K6- K6V KC. KQ8 L6V LK5 LK8 M7R MODMG M~E OK1 P62 PHGZM PHGZT PIMPY PROAC TUS ABUWG AZQEC DWQXO PKEHL PQEST PQQKQ PQUKI PUEGO
ID	FETCH-LOGICAL-c364t-cba418795b1b93400cb02fae0f6ffa1990199d94e2c09c9c4d7b79dbcee65cef3
IEDL.DBID	DOA
ISSN	2076-3417
IngestDate	Wed Aug 27 01:31:58 EDT 2025 Mon Jun 30 11:14:09 EDT 2025 Thu Apr 24 22:54:51 EDT 2025 Tue Jul 01 00:51:29 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	3
Language	English
License	https://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c364t-cba418795b1b93400cb02fae0f6ffa1990199d94e2c09c9c4d7b79dbcee65cef3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-0974-4378 0000-0002-2589-8377
OpenAccessLink	https://doaj.org/article/8e9df65bfde048fe91fb751cb2da4ad3
PQID	2636123452
PQPubID	2032433
ParticipantIDs	doaj_primary_oai_doaj_org_article_8e9df65bfde048fe91fb751cb2da4ad3 proquest_journals_2636123452 crossref_citationtrail_10_3390_app12031182 crossref_primary_10_3390_app12031182
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2022-02-01
PublicationDateYYYYMMDD	2022-02-01
PublicationDate_xml	– month: 02 year: 2022 text: 2022-02-01 day: 01
PublicationDecade	2020
PublicationPlace	Basel
PublicationPlace_xml	– name: Basel
PublicationTitle	Applied sciences
PublicationYear	2022
Publisher	MDPI AG
Publisher_xml	– name: MDPI AG
References	ref_36 ref_13 ref_35 ref_12 ref_34 ref_11 ref_10 ref_32 ref_31 ref_30 Khanchandani (ref_2) 2022; 14 ref_19 ref_18 ref_17 ref_16 ref_37 Medhat (ref_1) 2014; 5 Graves (ref_5) 2005; 18 Li (ref_14) 2020; 8 Rehman (ref_8) 2019; 78 Artetxe (ref_33) 2019; 7 ref_25 ref_24 ref_23 ref_22 ref_21 ref_20 Ahmad (ref_3) 2018; 9 Ain (ref_15) 2017; 8 Sangeetha (ref_28) 2021; 12 ref_29 ref_27 ref_26 ref_9 ref_4 ref_7 ref_6
References_xml	– ident: ref_6 doi: 10.3115/v1/P15-2060 – ident: ref_26 doi: 10.18653/v1/D16-1058 – volume: 14 start-page: 425 year: 2022 ident: ref_2 article-title: Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction publication-title: Cogn. Comput. doi: 10.1007/s12559-021-09831-y – volume: 8 start-page: 424 year: 2017 ident: ref_15 article-title: Sentiment analysis using deep learning techniques: A review publication-title: Int. J. Adv. Comput. Sci. Appl. – ident: ref_11 – ident: ref_36 doi: 10.18653/v1/2021.acl-long.265 – volume: 9 start-page: 393 year: 2018 ident: ref_3 article-title: SVM optimization for sentiment analysis publication-title: Int. J. Adv. Comput. Sci. Appl. – ident: ref_27 doi: 10.18653/v1/D17-1047 – ident: ref_12 doi: 10.1609/aaai.v35i16.17659 – ident: ref_7 doi: 10.3115/v1/W14-4012 – ident: ref_19 doi: 10.18653/v1/2020.coling-main.305 – volume: 78 start-page: 26597 year: 2019 ident: ref_8 article-title: A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis publication-title: Multimed. Tools Appl. doi: 10.1007/s11042-019-07788-7 – ident: ref_37 – ident: ref_18 – ident: ref_30 doi: 10.18653/v1/2020.acl-main.341 – ident: ref_23 – ident: ref_21 – ident: ref_35 doi: 10.18653/v1/2020.acl-main.747 – ident: ref_4 doi: 10.1007/978-981-10-8569-7_23 – ident: ref_16 doi: 10.18653/v1/2021.findings-emnlp.234 – volume: 8 start-page: 46868 year: 2020 ident: ref_14 article-title: Enhancing BERT Representation With Context-aware Embedding For Aspect-Based Sentiment Analysis publication-title: IEEE Access doi: 10.1109/ACCESS.2020.2978511 – volume: 12 start-page: 4117 year: 2021 ident: ref_28 article-title: Sentiment analysis of student feedback using multi-head attention fusion model of word and context embedding for LSTM publication-title: J. Ambient. Intell. Humaniz. Comput. doi: 10.1007/s12652-020-01791-9 – volume: 18 start-page: 602 year: 2005 ident: ref_5 article-title: Framewise phoneme classification with bidirectional LSTM and other neural network architectures publication-title: Neural Netw. doi: 10.1016/j.neunet.2005.06.042 – ident: ref_25 – ident: ref_31 doi: 10.18653/v1/D19-1670 – ident: ref_29 – volume: 7 start-page: 597 year: 2019 ident: ref_33 article-title: Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond publication-title: Trans. Assoc. Comput. Linguist. doi: 10.1162/tacl_a_00288 – ident: ref_13 doi: 10.1609/aaai.v35i15.17597 – ident: ref_34 doi: 10.18653/v1/D19-1410 – ident: ref_10 doi: 10.18653/v1/N18-1202 – ident: ref_17 – ident: ref_24 doi: 10.1007/978-981-13-2354-6_7 – ident: ref_9 doi: 10.18653/v1/2020.findings-emnlp.156 – ident: ref_22 – volume: 5 start-page: 1093 year: 2014 ident: ref_1 article-title: Sentiment analysis algorithms and applications: A survey publication-title: Ain Shams Eng. J. doi: 10.1016/j.asej.2014.04.011 – ident: ref_20 – ident: ref_32 doi: 10.18653/v1/P16-1162
SSID	ssj0000913810
Score	2.2900245
Snippet	In recent years, more and more attention has been paid to text sentiment analysis, which has gradually become a research hotspot in information extraction,...
SourceID	doaj proquest crossref
SourceType	Open Website Aggregation Database Enrichment Source Index Database
StartPage	1182
SubjectTerms	Accuracy BiLSTM cross-lingual pre-trained language model data augmentation Datasets Deep learning Dictionaries Language low-resource Machine learning Methods Neural networks Semantics Sentiment analysis Sparsity
SummonAdditionalLinks	– databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSysxFA4-NroQ9Xq59UUWLlQITh6TmbgRRyoiKnLbgrshT68grbetgv_enGlaFcXtTFYn531Ovg-hvcJaVuZGEOqdB1BtT5TIOClKKxWXZUz5oaF_fSMveuLyLr9LDbdRWquc-sTGUbuBhR75EZMckEJEzk6e_hNgjYLpaqLQmEeL0QWXsfharNo3t39nXRZAvSxpNnmYx2N9D3NhyqIm05J9CkUNYv8Xh9xEmfNVtJLSQ3w6uc81NOf762j5A2jgOlpL5jjC-wkz-uAXujytyJWuOu1j3Hu9__c8xB1YA4LWH57ijuCXB41jvodvh550EzUEBjK0RwztWFw9XHW61xuod97unl2QRJNALJdiTKzRouEMN9QoHm3SmowF7bMgQ9AUBl9KOSU8s5myygpXmEI5E8OjzK0P_Dda6A_6_g_CzholS6G1F1TkGTdBi2C0pwV3NCjaQodTidU2YYgDlcVjHWsJEG_9QbwttDc7_DSBzvj-WAWinx0BvOvmw2B4XyfzqUuvXJC5Cc5HlxO8osEUObWGOS204y20Pb24OhnhqH5Xmc2ff2-hJQavGppl7G20MB4--52Ya4zNblKoN9lM00k priority: 102 providerName: ProQuest
Title	AB-LaBSE: Uyghur Sentiment Analysis via the Pre-Training Model with BiLSTM
URI	https://www.proquest.com/docview/2636123452 https://doaj.org/article/8e9df65bfde048fe91fb751cb2da4ad3
Volume	12
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1JS8NAFH64XPQgrliXMgcPKgQzSyYZb0ZaS1ER24K3MKsKUqWL4L93JkmlouDFa3iQ4S3zlpn5PoCjVGuSJYpF2BobQLVtJFhMozTTXFCe-ZI_DPRvbnlnwLoPycMc1Ve4E1bBA1eKO8usMI4nyhnrnc1ZgZ1KE6wVMZJJU-J8-pw310yVe7DAAbqqepBHfV8fzoMx8R6MM_ItBZVI_T824jK7tNdhrS4L0UW1nA1YsMNNWJ0DC9yEjToMx-i4xoo-2YLuRR5dy7zXOkeDj8en6Qj1wvWfMPJDM7wR9P4ska_z0N3IRv2aEgIFErQXFMawKH--7vVvtmHQbvUvO1FNjxBpytkk0kqykitcYSWoj0WtYuKkjR13TuJw4CWEEcwSHQstNDOpSoVRPi3yRFtHd2Bp-Dq0u4CMVoJnTErLMEtiqpxkTkmLU2qwE7gBpzONFbrGDg8UFi-F7yGCeos59Tbg6Ev4rYLM-F0sD6r_Egk41-UHb_2itn7xl_UbcDAzXFEH37ggnAZQGZaQvf_4xz6skPDmobyqfQBLk9HUHvpKZKKasJi1r5qwnLdu7-6bpQt-AgJB3vw
linkProvider	Directory of Open Access Journals
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bTxNBFD5BfFAeDKDGIuo8YKImG3cuO7tjYgxVaoGWmLRNeFvnCiSkhbZo-FP-RuZsdytG4xuvOyf7cObc5syc7wPYya1lRWZEQr3zCKrtEyVSnuSFlYrLIpb82NDvH8nuSBwcZ8cr8KuZhcFnlU1MrAK1m1jskb9nkiNSiMjYp4vLBFmj8Ha1odBYmMWhv_4Zj2yzj_tf4v6-ZqyzN_zcTWpWgcRyKeaJNVpUFNuGGsWjCVuTsqB9GmQImuI9kVJOCc9sqqyywuUmV87EbCIz6wOP_70H9wXnCj2q6Hxd9nQQY7Og6WIMMK6neAtNWfQbWrA_El_FD_BX-K9yWmcdHtXFKNldWM8GrPjxJqzdgijchI3a-WfkTY1Q_fYxHOy2k55uD_Y-kNH1yenVlAzw0RE2GkmDckJ-nGkSq0vybeqTYU1EQZB67Zxg85e0z3qDYf8JjO5EfU9hdTwZ-2dAnDVKFkJrL6jIUm6CFsFoT3PuaFC0Be8ajZW2RixH4ozzMp5cUL3lLfW2YGcpfLEA6vi3WBtVvxRBdO3qw2R6UtbOWhZeuSAzE5yPAS54RYPJM2oNc1pox1uw3WxcWbv8rPxtoFv_X34FD7rDfq_s7R8dPoeHDOcpqmfg27A6n175F7HKmZuXlWkR-H7XtnwDWq0QlA
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LbxMxEB6VVEJwQLSASCngQ5EAadX1Y71rJIS6NFEfaRSRROpt8bNUqpI2SUH9a_w67I03FIG49bpr-TD-PDMej78PYCfXmhSZYgm2xgZSbZsIltIkLzQXlBc-5Q8F_ZM-Pxizo9PsdA1-Nm9hQltl4xNrR22mOtTIdwmngSmEZWTXxbaIwX730-VVEhSkwk1rI6exhMixvfnhj2_zj4f7fq3fENLtjD4fJFFhINGUs0WilWS13LbCSlAPZ61S4qRNHXdO4nBnJIQRzBKdCi00M7nKhVE-svBMW0f9vPdgPfenorQF62WnP_iyqvAExs0Cp8tHgZSKNNxJY-J3ES7IH2GwVgv4KxjUEa77GB7F1BTtLbG0AWt2sgkPbxEWbsJGdAVz9DbyVb97Akd7ZdKT5bDzAY1vzr5dz9AwtCCFsiNqOE_Q93OJfK6JBjObjKIsBQpCbBcolIJRed4bjk6ewvhODPgMWpPpxD4HZLQSvGBSWoZZllLlJHNKWpxTg53AbXjfWKzSkb88yGhcVP4cE8xb3TJvG3ZWgy-XtB3_HlYG06-GBK7t-sN0dlbFrVsVVhjHM-WM9e7OWYGdyjOsFTGSSUPbsN0sXBUdwLz6Ddet__9-Dfc9jqveYf_4BTwg4XFF3RO-Da3F7Nq-9CnPQr2K2ELw9a7h_AtWcBYm
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=AB-LaBSE%3A+Uyghur+Sentiment+Analysis+via+the+Pre-Training+Model+with+BiLSTM&rft.jtitle=Applied+sciences&rft.au=Yijie+Pei&rft.au=Siqi+Chen&rft.au=Zunwang+Ke&rft.au=Wushour+Silamu&rft.date=2022-02-01&rft.pub=MDPI+AG&rft.eissn=2076-3417&rft.volume=12&rft.issue=3&rft.spage=1182&rft_id=info:doi/10.3390%2Fapp12031182&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_8e9df65bfde048fe91fb751cb2da4ad3
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2076-3417&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2076-3417&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2076-3417&client=summon