Kurdish Text Segmentation using Projection-Based Approaches
An optical character recognition (OCR) system may be the solution to data entry problems for saving the printed document as a soft copy of them. Therefore, OCR systems are being developed for all languages, and Kurdish is no exception. Kurdish is one of the languages that present special challenges...
Saved in:
Published in | UHD Journal of Science and Technology Vol. 5; no. 1; pp. 56 - 65 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
University of Human Development
16.05.2021
|
Subjects | |
Online Access | Get full text |
ISSN | 2521-4209 2521-4217 |
DOI | 10.21928/uhdjst.v5n1y2021.pp56-65 |
Cover
Abstract | An optical character recognition (OCR) system may be the solution to data entry problems for saving the printed document as a soft copy of them. Therefore, OCR systems are being developed for all languages, and Kurdish is no exception. Kurdish is one of the languages that present special challenges to OCR. The main challenge of Kurdish is that it is mostly cursive. Therefore, a segmentation process must be able to specify the beginning and end of the characters. This step is important for character recognition. This paper presents an algorithm for Kurdish character segmentation. The proposed algorithm uses the projection-based approach concepts to separate lines, words, and characters. The algorithm works through the vertical projection of a word and then identifies the splitting areas of the word characters. Then, a post-processing stage is used to handle the over-segmentation problems that occur in the initial segmentation stage. The proposed method is tested using a data set consisting of images of texts that vary in font size, type, and style of more than 63,000 characters. The experiments show that the proposed algorithm can segment Kurdish words with an average accuracy of 98.6%. |
---|---|
AbstractList | An optical character recognition (OCR) system may be the solution to data entry problems for saving the printed document as a soft copy of them. Therefore, OCR systems are being developed for all languages, and Kurdish is no exception. Kurdish is one of the languages that present special challenges to OCR. The main challenge of Kurdish is that it is mostly cursive. Therefore, a segmentation process must be able to specify the beginning and end of the characters. This step is important for character recognition. This paper presents an algorithm for Kurdish character segmentation. The proposed algorithm uses the projection-based approach concepts to separate lines, words, and characters. The algorithm works through the vertical projection of a word and then identifies the splitting areas of the word characters. Then, a post-processing stage is used to handle the over-segmentation problems that occur in the initial segmentation stage. The proposed method is tested using a data set consisting of images of texts that vary in font size, type, and style of more than 63,000 characters. The experiments show that the proposed algorithm can segment Kurdish words with an average accuracy of 98.6%. |
Author | Hussein, Jamal Ali Tofiq, Tofiq Ahmed |
Author_xml | – sequence: 1 givenname: Tofiq Ahmed surname: Tofiq fullname: Tofiq, Tofiq Ahmed – sequence: 2 givenname: Jamal Ali surname: Hussein fullname: Hussein, Jamal Ali |
BookMark | eNo9kM1OAjEURhujiYi8w_gAg7edttPGFRJ_iCSaiOumtHdgJjCdtIORt1fAsPq-nMVZnBty2YYWCbmjMGZUM3W_W_sm9eNv0dI9A0bHXSdkLsUFGTDBaM4ZLS_PH_Q1GaXUAABToiwEH5CHt130dVpnC_zps09cbbHtbV-HNtulul1lHzE06A4gf7QJfTbpuhisW2O6JVeV3SQc_e-QfD0_Laav-fz9ZTadzHNHZSlyKytwoK3SpeeAHLgGXzF0RQVCoQRXleCtxyVbCuksaOoll47TUmqlRTEks5PXB9uYLtZbG_cm2NocQYgrY2Nfuw0a5atCAIelL4Cjk8ohtUxo6lTBkLE_lz65XAwpRazOPgrmGNWcoppzVHOIaqQofgFdvG_w |
ContentType | Journal Article |
DBID | AAYXX CITATION DOA |
DOI | 10.21928/uhdjst.v5n1y2021.pp56-65 |
DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ (Directory of Open Access Journals) url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2521-4217 |
EndPage | 65 |
ExternalDocumentID | oai_doaj_org_article_8df35040bd304ec68ce1a2591c832e22 10_21928_uhdjst_v5n1y2021_pp56_65 |
GroupedDBID | AAYXX ADBBV ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION GROUPED_DOAJ OK1 |
ID | FETCH-LOGICAL-c1675-a6f0c09a897d40e40490df2ec3f058e60cf70dadeb2b56ca091d646c417698953 |
IEDL.DBID | DOA |
ISSN | 2521-4209 |
IngestDate | Wed Aug 27 01:31:14 EDT 2025 Tue Jul 01 02:48:25 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | http://creativecommons.org/licenses/by-nc-nd/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c1675-a6f0c09a897d40e40490df2ec3f058e60cf70dadeb2b56ca091d646c417698953 |
OpenAccessLink | https://doaj.org/article/8df35040bd304ec68ce1a2591c832e22 |
PageCount | 10 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_8df35040bd304ec68ce1a2591c832e22 crossref_primary_10_21928_uhdjst_v5n1y2021_pp56_65 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2021-05-16 |
PublicationDateYYYYMMDD | 2021-05-16 |
PublicationDate_xml | – month: 05 year: 2021 text: 2021-05-16 day: 16 |
PublicationDecade | 2020 |
PublicationTitle | UHD Journal of Science and Technology |
PublicationYear | 2021 |
Publisher | University of Human Development |
Publisher_xml | – name: University of Human Development |
SSID | ssj0002857354 |
Score | 2.1446052 |
Snippet | An optical character recognition (OCR) system may be the solution to data entry problems for saving the printed document as a soft copy of them. Therefore, OCR... |
SourceID | doaj crossref |
SourceType | Open Website Index Database |
StartPage | 56 |
SubjectTerms | character segmentation cursive writing optical character recognition kurdish text segmentation optical character recognition projection-based approach |
Title | Kurdish Text Segmentation using Projection-Based Approaches |
URI | https://doaj.org/article/8df35040bd304ec68ce1a2591c832e22 |
Volume | 5 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7Sg3gRRcX6YgWvocnmjadWLEVRBFvoLWSTbEvBtdj24L93sruW3rx4DewymRnm-yaZzCB0p6kPMbKIdQEpCs-DwK6gHBteSK-ML5xMb4dfXuVowp-mYroz6ivVhDXtgRvF9XQomQBPKwIk3tFL7SN1wNmpB1-MeR19iSE7ydSiPjISitUj0HLAJ5CBmH10m-qegdHo3mYeFisIAKKi35D9Q7RaCokTwOyA004P_xpshkfosGWJWb-R7hjtxeoE3T9vwJqreTaGgJq9x9lH-26oylL1-ix7a05VYAEPAJxC1m8bhsfVKZoMH8cPI9zOPsCeAofHTpbEE-O0UYGTyNMFXSjz6FlJhI6S-FKR4AIkxoWQ3gHsB8ml51SlkZCCnaFO9VnFc5RFRQNnMlDhFQfCY1zJmYqGB8e0CLyL8t9N22XT4sJCalBryjaasltN2aQpK0UXDZJ6th-kLtX1AtjOtrazf9nu4j9-cokOkmDpRp_KK9RZf23iNRCFdXFT-8QPhxm5lw |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Kurdish+Text+Segmentation+using+Projection-Based+Approaches&rft.jtitle=UHD+Journal+of+Science+and+Technology&rft.au=Tofiq+Ahmed+Tofiq&rft.au=Jamal+Ali+Hussein&rft.date=2021-05-16&rft.pub=University+of+Human+Development&rft.issn=2521-4209&rft.eissn=2521-4217&rft.volume=5&rft.issue=1&rft.spage=56&rft.epage=65&rft_id=info:doi/10.21928%2Fuhdjst.v5n1y2021.pp56-65&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_8df35040bd304ec68ce1a2591c832e22 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2521-4209&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2521-4209&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2521-4209&client=summon |