A Decision Tree Based Approach for Pashto Coreference Resolution: The Case of Person Name Aliases
Coreference resolution is an important problem in fields such as natural language understanding, natural language generation, named entity recognition, text summarization, and anaphora resolution. Determining whether or not two proper nouns are aliases of each other (i.e. aliases identification) is...
Saved in:
Published in | VFAST Transactions on Software Engineering Vol. 13; no. 2; pp. 161 - 169 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
06.06.2025
|
Online Access | Get full text |
ISSN | 2411-6246 2309-3978 |
DOI | 10.21015/vtse.v13i2.2143 |
Cover
Abstract | Coreference resolution is an important problem in fields such as natural language understanding, natural language generation, named entity recognition, text summarization, and anaphora resolution. Determining whether or not two proper nouns are aliases of each other (i.e. aliases identification) is a classification problem. A binary classifier for alias identification is needed which returns “Yes” if the two input nouns are aliases and “No” otherwise. In this research paper, a binary decision tree based classifier is proposed that is augmented with cosine similarity measure for personal name aliases identification in Pashto. This classifier is trained on aliases records containing features’ vectors. A total of 10000 proper nouns’ pairs examples from the Pashto corpus have been extracted and a collection of crawled Pashto text, and recorded their features in this work. This resulted in 10000 example records, having 12 attributes. The selected dataset contains examples from different genres of the corpus e.g. novels, dramas, news, sports, letters and essays. These examples contain 5000 positive instances (i.e. class “Yes”) and 5000 negative instances (i.e. class “No”). These records are divided into two parts: the training part and the testing part in the ratio of 7:3. The 7000 examples of training part are used to induct the decision tree. This decision tree is created using Rapidminer, which is a data mining tool. Then, first order logic rules are created from the decision tree. These rules are then transformed into an algorithm, which is implemented in programming language Python. These rules are tested on the testing part of examples, which contain 3000 labeled examples. A total of 2794 out of these 3000 examples are classified correctly, which means an accuracy of approximately 93%. The error analysis of the 7% classification errors is performed to improve the system in future. |
---|---|
AbstractList | Coreference resolution is an important problem in fields such as natural language understanding, natural language generation, named entity recognition, text summarization, and anaphora resolution. Determining whether or not two proper nouns are aliases of each other (i.e. aliases identification) is a classification problem. A binary classifier for alias identification is needed which returns “Yes” if the two input nouns are aliases and “No” otherwise. In this research paper, a binary decision tree based classifier is proposed that is augmented with cosine similarity measure for personal name aliases identification in Pashto. This classifier is trained on aliases records containing features’ vectors. A total of 10000 proper nouns’ pairs examples from the Pashto corpus have been extracted and a collection of crawled Pashto text, and recorded their features in this work. This resulted in 10000 example records, having 12 attributes. The selected dataset contains examples from different genres of the corpus e.g. novels, dramas, news, sports, letters and essays. These examples contain 5000 positive instances (i.e. class “Yes”) and 5000 negative instances (i.e. class “No”). These records are divided into two parts: the training part and the testing part in the ratio of 7:3. The 7000 examples of training part are used to induct the decision tree. This decision tree is created using Rapidminer, which is a data mining tool. Then, first order logic rules are created from the decision tree. These rules are then transformed into an algorithm, which is implemented in programming language Python. These rules are tested on the testing part of examples, which contain 3000 labeled examples. A total of 2794 out of these 3000 examples are classified correctly, which means an accuracy of approximately 93%. The error analysis of the 7% classification errors is performed to improve the system in future. |
Author | Naz, Surayya Ali, Hina Zuhra, Fatima Tuz |
Author_xml | – sequence: 1 givenname: Fatima Tuz orcidid: 0000-0003-2427-9483 surname: Zuhra fullname: Zuhra, Fatima Tuz – sequence: 2 givenname: Hina orcidid: 0009-0006-5690-5089 surname: Ali fullname: Ali, Hina – sequence: 3 givenname: Surayya orcidid: 0009-0004-9015-3488 surname: Naz fullname: Naz, Surayya |
BookMark | eNotkF9LwzAUxYNMcM69-3i_QGeSm7SNb7X-haFD-l7S9oYVtmYkc-C3N06fzuHAORx-12w2-YkYuxV8JQUX-u50jLQ6CRxlChResLlEbjI0RTlLXgmR5VLlV2wZ49hxpYpcaSzmzFbwSP0YRz9BE4jgwUYaoDocgrf9FpwPsLFxe_RQ-0COAk09wSdFv_s6ptY9NFuCOrXAO9hQiGnp3e4Jqt2Y0njDLp3dRVr-64I1z09N_ZqtP17e6mqd9SVipkhrxYXsSimlFQ6dMWZA1w2EvOTKFoKk0MqVRnSlFqSt4QWSKpRBHHJcMP432wcfY3raHsK4t-G7Fbw9Q2p_IbVnSO0vJPwBPDtcTA |
Cites_doi | 10.1016/j.jbi.2017.04.015 10.1016/j.jjimei.2022.100115 10.1109/WKDD.2010.56 10.32604/cmc.2021.015054 10.54254/2755-2721/54/20241498 10.1109/IALP.2009.21 10.1016/j.ins.2014.02.050 10.1016/j.inffus.2024.102769 10.1145/956755.956759 10.1007/978-3-540-30211-7_3 10.1162/089120101753342653 10.7717/peerj-cs.1617 10.14569/IJACSA.2023.01406142 10.3115/1073083.1073102 10.1016/j.specom.2023.102970 10.1016/j.jbi.2023.104578 10.1109/TKDE.2010.162 |
ContentType | Journal Article |
DBID | AAYXX CITATION |
DOI | 10.21015/vtse.v13i2.2143 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2309-3978 |
EndPage | 169 |
ExternalDocumentID | 10_21015_vtse_v13i2_2143 |
GroupedDBID | AAYXX CITATION M~E |
ID | FETCH-LOGICAL-c833-4e554012b8222a1f3f999d3fbde30804a71e2154f891b851e5a9073e474933d63 |
ISSN | 2411-6246 |
IngestDate | Thu Jul 03 08:45:19 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Issue | 2 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c833-4e554012b8222a1f3f999d3fbde30804a71e2154f891b851e5a9073e474933d63 |
ORCID | 0000-0003-2427-9483 0009-0004-9015-3488 0009-0006-5690-5089 |
OpenAccessLink | https://vfast.org/journals/index.php/VTSE/article/download/2143/1707 |
PageCount | 9 |
ParticipantIDs | crossref_primary_10_21015_vtse_v13i2_2143 |
PublicationCentury | 2000 |
PublicationDate | 2025-06-06 |
PublicationDateYYYYMMDD | 2025-06-06 |
PublicationDate_xml | – month: 06 year: 2025 text: 2025-06-06 day: 06 |
PublicationDecade | 2020 |
PublicationTitle | VFAST Transactions on Software Engineering |
PublicationYear | 2025 |
References | 59603 59614 59602 59613 59605 59616 59604 59615 59610 59621 59620 59601 59612 59611 59622 59607 59618 59606 59617 59609 59608 59619 |
References_xml | – ident: 59616 – ident: 59615 doi: 10.1016/j.jbi.2017.04.015 – ident: 59602 doi: 10.1016/j.jjimei.2022.100115 – ident: 59605 doi: 10.1109/WKDD.2010.56 – ident: 59612 doi: 10.32604/cmc.2021.015054 – ident: 59601 – ident: 59609 doi: 10.54254/2755-2721/54/20241498 – ident: 59621 doi: 10.1109/IALP.2009.21 – ident: 59619 doi: 10.1016/j.ins.2014.02.050 – ident: 59622 doi: 10.1016/j.inffus.2024.102769 – ident: 59603 doi: 10.1145/956755.956759 – ident: 59620 doi: 10.1007/978-3-540-30211-7_3 – ident: 59618 doi: 10.1162/089120101753342653 – ident: 59610 – ident: 59606 doi: 10.7717/peerj-cs.1617 – ident: 59608 doi: 10.14569/IJACSA.2023.01406142 – ident: 59617 doi: 10.3115/1073083.1073102 – ident: 59607 doi: 10.1016/j.specom.2023.102970 – ident: 59613 doi: 10.1016/j.jbi.2023.104578 – ident: 59614 – ident: 59604 doi: 10.1109/TKDE.2010.162 – ident: 59611 |
SSID | ssib044764537 |
Score | 1.9129322 |
Snippet | Coreference resolution is an important problem in fields such as natural language understanding, natural language generation, named entity recognition, text... |
SourceID | crossref |
SourceType | Index Database |
StartPage | 161 |
Title | A Decision Tree Based Approach for Pashto Coreference Resolution: The Case of Person Name Aliases |
Volume | 13 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1La9wwEBbb9NJLSWlLH2nQoZdinFaWLNu9OUuWpZASiFtyM7IlkUCzKftIyB7yA_qrO9LYXmVpoenFLMIMWs_H6JvRPAh5D2dCq6XUcSqMjQUzKm64LuJPuc1lW9jCNi4OefxVTr-JL2fp2Wj0K8haWi2bg3b9x7qS_9EqrIFeXZXsAzQ7CIUF-A36hSdoGJ7_pOMSzAWOyImquTHRIRxJ2hFLLJNyGYQnanEO9HK8mSfiI_a4qz7lYqwwon_i6TdY3EsTlT8uYHURktfvk_K0wnboWA7hrxpOwZDfuPyxoLXhEI5enfs5RtEEdnypomq1HhCGddnTbny3D0irNSYKzdXtrQrjEUnq86bkxmwBJWCxTLrAosE1d48DzCe_Z3d5gK8kMKIM27N35zHDUS7bph5cVd8W43q5MAfXjF8ksIQdn-531d467YYcRPB-vIzaSai9hNpJeEQeJ1nmr_yP74562yREJkXqW7AOfxBvvb2Qj1vbCFhOQFeqXfK08zNoiaB5RkZm9pyokvaAoQ4w1AOG9oChABiKgKEBYOgGMJ8pwIU6uNArSxEu1MGFdnB5QarJUTWext2MjbjNOY-FAToJHKVxPFExyy04DJrbRhsOvoRQGTNACoXNC9YAOTepKuBQMCITBeda8pdkZ3Y1M68IlbJ13fLghQJ82pSpxCRCK9sU3CZaiNfkQ_9J6p_YSaX-mw7ePODdt-TJBoZ7ZGc5X5l3QBSXzb7X4G9DCWiq |
linkProvider | ISSN International Centre |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Decision+Tree+Based+Approach+for+Pashto+Coreference+Resolution%3A+The+Case+of+Person+Name+Aliases&rft.jtitle=VFAST+Transactions+on+Software+Engineering&rft.au=Zuhra%2C+Fatima+Tuz&rft.au=Ali%2C+Hina&rft.au=Naz%2C+Surayya&rft.date=2025-06-06&rft.issn=2411-6246&rft.eissn=2309-3978&rft.volume=13&rft.issue=2&rft.spage=161&rft.epage=169&rft_id=info:doi/10.21015%2Fvtse.v13i2.2143&rft.externalDBID=n%2Fa&rft.externalDocID=10_21015_vtse_v13i2_2143 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2411-6246&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2411-6246&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2411-6246&client=summon |