Urdu Word Segmentation using Machine Learning Approaches
Word Segmentation is considered a basic NLP task and in diverse NLP areas, it plays a significant role. The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task. There can be a number of reasons but Space Ins...
Saved in:
Published in | International journal of advanced computer science & applications Vol. 9; no. 6 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
West Yorkshire
Science and Information (SAI) Organization Limited
2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Word Segmentation is considered a basic NLP task and in diverse NLP areas, it plays a significant role. The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task. There can be a number of reasons but Space Insertion Problem and Space Omission Problems are the major ones. Compared to Urdu, the tools and resources developed for word segmentation of English and English like other western languages have record-setting performance. Some languages provide a clear indication for words just like English which having space or capitalization of the first character in a word. But there are many languages which do not have proper delimitation in between words e.g. Thai, Lao, Urdu, etc. The objective of this research work is to present a machine learning based approach for Urdu word segmentation. We adopted the use of conditional random fields (CRF) to achieve the subject task. Some other challenges faced in Urdu text are compound words and reduplicated words. In this paper, we tried to overcome such challenges in Urdu text by machine learning methodology. |
---|---|
AbstractList | Word Segmentation is considered a basic NLP task and in diverse NLP areas, it plays a significant role. The main areas which can be benefited from Word segmentation are IR, POS, NER, sentiment analysis, etc. Urdu Word Segmentation is a challenging task. There can be a number of reasons but Space Insertion Problem and Space Omission Problems are the major ones. Compared to Urdu, the tools and resources developed for word segmentation of English and English like other western languages have record-setting performance. Some languages provide a clear indication for words just like English which having space or capitalization of the first character in a word. But there are many languages which do not have proper delimitation in between words e.g. Thai, Lao, Urdu, etc. The objective of this research work is to present a machine learning based approach for Urdu word segmentation. We adopted the use of conditional random fields (CRF) to achieve the subject task. Some other challenges faced in Urdu text are compound words and reduplicated words. In this paper, we tried to overcome such challenges in Urdu text by machine learning methodology. |
Author | Ullah, Aman Khan, Asfandyar Subhan, Fazali Ullah, Burhan Khan, Sadiq Nawaz Khan, Wahab Khan, Khairullah |
Author_xml | – sequence: 1 givenname: Sadiq Nawaz surname: Khan fullname: Khan, Sadiq Nawaz – sequence: 2 givenname: Khairullah surname: Khan fullname: Khan, Khairullah – sequence: 3 givenname: Wahab surname: Khan fullname: Khan, Wahab – sequence: 4 givenname: Asfandyar surname: Khan fullname: Khan, Asfandyar – sequence: 5 givenname: Fazali surname: Subhan fullname: Subhan, Fazali – sequence: 6 givenname: Aman surname: Ullah fullname: Ullah, Aman – sequence: 7 givenname: Burhan surname: Ullah fullname: Ullah, Burhan |
BookMark | eNp9UD1PwzAQtVCRKKW_gCUSc4I_YsceowpoURBDqWCzHMcuqVqn2MnAv8dtmBi44e709N493bsGE9c5A8AtghnKKRP3q-dysS4zDBHPoIAM8wswxYiylNICTs47TxEsPq7APIQdjEUEZpxMAd_4ZkjeO98ka7M9GNervu1cMoTWbZMXpT9bZ5LKKO9OQHk8-i6CJtyAS6v2wcx_5wxsHh_eFsu0en1aLcoq1QTjPm20ybHOGdRGFMrGBhUqbE2MUrWocUMgKWquaoSs4IzbAjaWacY4qhGBOZmBu_FuNP4aTOjlrhu8i5YSM8pyRCjFkUVGlvZdCN5YefTtQflviaA8pyTHlOQpJTmmFFXij0q34_-9V-3-X-0PjYBuBw |
CitedBy_id | crossref_primary_10_7717_peerj_cs_1704 |
ContentType | Journal Article |
Copyright | 2018. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2018. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 3V. 7XB 8FE 8FG 8FK 8G5 ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ GUQSH HCIFZ JQ2 K7- M2O MBDVC P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U |
DOI | 10.14569/IJACSA.2018.090628 |
DatabaseName | CrossRef ProQuest Central (Corporate) ProQuest Central (purchase pre-March 2016) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Research Library ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Databases Technology Collection ProQuest One Community College ProQuest Central Korea ProQuest Central Student ProQuest Research Library SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database Research Library Research Library (Corporate) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic |
DatabaseTitle | CrossRef Publicly Available Content Database Research Library Prep Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College Research Library (Alumni Edition) ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Research Library ProQuest Central (New) Advanced Technologies & Aerospace Collection ProQuest Central Basic ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2156-5570 |
ExternalDocumentID | 10_14569_IJACSA_2018_090628 |
GroupedDBID | .DC 5VS 8G5 AAYXX ABUWG ADMLS AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS AZQEC BENPR BGLVJ CCPQU CITATION DWQXO EBS EJD GNUQQ GUQSH HCIFZ K7- KQ8 M2O OK1 PHGZM PHGZT PIMPY RNS 3V. 7XB 8FE 8FG 8FK JQ2 MBDVC P62 PKEHL PQEST PQGLB PQQKQ PQUKI PRINS Q9U |
ID | FETCH-LOGICAL-c322t-dce42c460ce97afe970a17fb3eaab9b2d3037b8ab11f9868f70df6c6681b13043 |
IEDL.DBID | BENPR |
ISSN | 2158-107X |
IngestDate | Fri Jul 25 08:11:45 EDT 2025 Tue Jul 01 04:11:32 EDT 2025 Thu Apr 24 23:10:48 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 6 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c322t-dce42c460ce97afe970a17fb3eaab9b2d3037b8ab11f9868f70df6c6681b13043 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
OpenAccessLink | https://www.proquest.com/docview/2656413552?pq-origsite=%requestingapplication% |
PQID | 2656413552 |
PQPubID | 5444811 |
ParticipantIDs | proquest_journals_2656413552 crossref_primary_10_14569_IJACSA_2018_090628 crossref_citationtrail_10_14569_IJACSA_2018_090628 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2018-00-00 |
PublicationDateYYYYMMDD | 2018-01-01 |
PublicationDate_xml | – year: 2018 text: 2018-00-00 |
PublicationDecade | 2010 |
PublicationPlace | West Yorkshire |
PublicationPlace_xml | – name: West Yorkshire |
PublicationTitle | International journal of advanced computer science & applications |
PublicationYear | 2018 |
Publisher | Science and Information (SAI) Organization Limited |
Publisher_xml | – name: Science and Information (SAI) Organization Limited |
SSID | ssj0000392683 |
Score | 2.0922832 |
Snippet | Word Segmentation is considered a basic NLP task and in diverse NLP areas, it plays a significant role. The main areas which can be benefited from Word... |
SourceID | proquest crossref |
SourceType | Aggregation Database Enrichment Source Index Database |
SubjectTerms | Conditional random fields Data mining Effectiveness Languages Machine learning Segmentation Sentiment analysis |
Title | Urdu Word Segmentation using Machine Learning Approaches |
URI | https://www.proquest.com/docview/2656413552 |
Volume | 9 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3LUsIwFL0jsHHj2xFFJguXVvpI03TlVAZEZmAckZFdJ6-y0YI8_t-kTXHYsMmmbRYnzT0nt7fnAjyIjLlYhcJR1JhqexI7XIjAoTwSxJN6VCahPxqTwRQPZ-HMJtzWtqyyiolFoJYLYXLkHV8LDx1ww9B_Xv46pmuU-bpqW2jUoKFDMKV1aLz0xu8fuyyLq-mfFF6cmtqMj2k0s9ZDWjjEnbdh0p0kpsCLPrnGsZfu09N-dC4op38GJ1YroqRc3HM4UvkFnFZ9GJDdlpdApyu5RV_6GIkmav5j_ybKkalpn6NRUS6pkHVSnaPE2oir9RVM-73P7sCxHREcoTfexpFCYV9g4goVRyzTg8u8KOOBYozH3JeakCJOGfe8LKaEZpErMyII0eJUkxUOrqGeL3J1A0iFhMZar2VccKzn4xmNsQwo87n0WMib4FdApMLahZuuFd-pOTYY9NISvdSgl5boNeFx99CydMs4fHurQji1W2ed_i_07eHLd3Bs5irzIS2ob1Zbda8Vwoa3oUb7r237MvwBoGS5Sg |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV25UsMwEN2BUEDDzRAIoAI6THzIslwwjAcICRAayJDOWIfTQIAcw_BTfCMrW4ahSUejxraK5_Xu03r3LcChzDOX6lA6mhtRbU9RR0gZOFxEknkKV20S-t071u7R637Yn4OvqhfGlFVWPrFw1OpVmhx500figQ43DP2zt3fHTI0yf1erERqlWdzozw88so1POxf4fo98v3X5cN527FQBR6LxThwlNfUlZa7UcZTluLiZF-Ui0FkmYuErdOqR4JnwvDzmjOeRq3ImGUOChw6fBrjvPCzQACO56UxvXf3kdFwkG6xQ_sRAalRTo74VOkKaEjc718n5fWLKyfiJa_SB-d9g-DcWFAGutQrLlpmSpDSlNZjTw3VYqaY-EOsENoD3RmpKHhEBcq8HL7Z3aUhMBf2AdIviTE2sbuuAJFa0XI83ofcvSG1Bbfg61NtAdMh4jOwwF1JQ3E_kPKYq4JkvlJeFog5-BUQqrTi5mZHxnJpDikEvLdFLDXppiV4djn8eeiu1OWbf3qgQTu2HOk5_zWpn9uUDWGw_dG_T287dzS4smX3LTEwDapPRVO8hN5mI_cIgCDz9twV-A7lZ9Lg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Urdu+Word+Segmentation+using+Machine+Learning+Approaches&rft.jtitle=International+journal+of+advanced+computer+science+%26+applications&rft.au=Khan%2C+Sadiq+Nawaz&rft.au=Khan%2C+Khairullah&rft.au=Khan%2C+Wahab&rft.au=Khan%2C+Asfandyar&rft.date=2018&rft.issn=2158-107X&rft.eissn=2156-5570&rft.volume=9&rft.issue=6&rft_id=info:doi/10.14569%2FIJACSA.2018.090628&rft.externalDBID=n%2Fa&rft.externalDocID=10_14569_IJACSA_2018_090628 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2158-107X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2158-107X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2158-107X&client=summon |