Kurdish Text Segmentation using Projection-Based Approaches

An optical character recognition (OCR) system may be the solution to data entry problems for saving the printed document as a soft copy of them. Therefore, OCR systems are being developed for all languages, and Kurdish is no exception. Kurdish is one of the languages that present special challenges...

Full description

Saved in:
Bibliographic Details
Published inUHD Journal of Science and Technology Vol. 5; no. 1; pp. 56 - 65
Main Authors Tofiq, Tofiq Ahmed, Hussein, Jamal Ali
Format Journal Article
LanguageEnglish
Published University of Human Development 16.05.2021
Subjects
Online AccessGet full text
ISSN2521-4209
2521-4217
DOI10.21928/uhdjst.v5n1y2021.pp56-65

Cover

Abstract An optical character recognition (OCR) system may be the solution to data entry problems for saving the printed document as a soft copy of them. Therefore, OCR systems are being developed for all languages, and Kurdish is no exception. Kurdish is one of the languages that present special challenges to OCR. The main challenge of Kurdish is that it is mostly cursive. Therefore, a segmentation process must be able to specify the beginning and end of the characters. This step is important for character recognition. This paper presents an algorithm for Kurdish character segmentation. The proposed algorithm uses the projection-based approach concepts to separate lines, words, and characters. The algorithm works through the vertical projection of a word and then identifies the splitting areas of the word characters. Then, a post-processing stage is used to handle the over-segmentation problems that occur in the initial segmentation stage. The proposed method is tested using a data set consisting of images of texts that vary in font size, type, and style of more than 63,000 characters. The experiments show that the proposed algorithm can segment Kurdish words with an average accuracy of 98.6%.
AbstractList An optical character recognition (OCR) system may be the solution to data entry problems for saving the printed document as a soft copy of them. Therefore, OCR systems are being developed for all languages, and Kurdish is no exception. Kurdish is one of the languages that present special challenges to OCR. The main challenge of Kurdish is that it is mostly cursive. Therefore, a segmentation process must be able to specify the beginning and end of the characters. This step is important for character recognition. This paper presents an algorithm for Kurdish character segmentation. The proposed algorithm uses the projection-based approach concepts to separate lines, words, and characters. The algorithm works through the vertical projection of a word and then identifies the splitting areas of the word characters. Then, a post-processing stage is used to handle the over-segmentation problems that occur in the initial segmentation stage. The proposed method is tested using a data set consisting of images of texts that vary in font size, type, and style of more than 63,000 characters. The experiments show that the proposed algorithm can segment Kurdish words with an average accuracy of 98.6%.
Author Hussein, Jamal Ali
Tofiq, Tofiq Ahmed
Author_xml – sequence: 1
  givenname: Tofiq Ahmed
  surname: Tofiq
  fullname: Tofiq, Tofiq Ahmed
– sequence: 2
  givenname: Jamal Ali
  surname: Hussein
  fullname: Hussein, Jamal Ali
BookMark eNo9kM1OAjEURhujiYi8w_gAg7edttPGFRJ_iCSaiOumtHdgJjCdtIORt1fAsPq-nMVZnBty2YYWCbmjMGZUM3W_W_sm9eNv0dI9A0bHXSdkLsUFGTDBaM4ZLS_PH_Q1GaXUAABToiwEH5CHt130dVpnC_zps09cbbHtbV-HNtulul1lHzE06A4gf7QJfTbpuhisW2O6JVeV3SQc_e-QfD0_Laav-fz9ZTadzHNHZSlyKytwoK3SpeeAHLgGXzF0RQVCoQRXleCtxyVbCuksaOoll47TUmqlRTEks5PXB9uYLtZbG_cm2NocQYgrY2Nfuw0a5atCAIelL4Cjk8ohtUxo6lTBkLE_lz65XAwpRazOPgrmGNWcoppzVHOIaqQofgFdvG_w
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.21928/uhdjst.v5n1y2021.pp56-65
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ (Directory of Open Access Journals)
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
EISSN 2521-4217
EndPage 65
ExternalDocumentID oai_doaj_org_article_8df35040bd304ec68ce1a2591c832e22
10_21928_uhdjst_v5n1y2021_pp56_65
GroupedDBID AAYXX
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
GROUPED_DOAJ
OK1
ID FETCH-LOGICAL-c1675-a6f0c09a897d40e40490df2ec3f058e60cf70dadeb2b56ca091d646c417698953
IEDL.DBID DOA
ISSN 2521-4209
IngestDate Wed Aug 27 01:31:14 EDT 2025
Tue Jul 01 02:48:25 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License http://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1675-a6f0c09a897d40e40490df2ec3f058e60cf70dadeb2b56ca091d646c417698953
OpenAccessLink https://doaj.org/article/8df35040bd304ec68ce1a2591c832e22
PageCount 10
ParticipantIDs doaj_primary_oai_doaj_org_article_8df35040bd304ec68ce1a2591c832e22
crossref_primary_10_21928_uhdjst_v5n1y2021_pp56_65
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-05-16
PublicationDateYYYYMMDD 2021-05-16
PublicationDate_xml – month: 05
  year: 2021
  text: 2021-05-16
  day: 16
PublicationDecade 2020
PublicationTitle UHD Journal of Science and Technology
PublicationYear 2021
Publisher University of Human Development
Publisher_xml – name: University of Human Development
SSID ssj0002857354
Score 2.1446052
Snippet An optical character recognition (OCR) system may be the solution to data entry problems for saving the printed document as a soft copy of them. Therefore, OCR...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 56
SubjectTerms character segmentation
cursive writing optical character recognition
kurdish text segmentation
optical character recognition
projection-based approach
Title Kurdish Text Segmentation using Projection-Based Approaches
URI https://doaj.org/article/8df35040bd304ec68ce1a2591c832e22
Volume 5
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7Sg3gRRcX6YgWvocnmjadWLEVRBFvoLWSTbEvBtdj24L93sruW3rx4DewymRnm-yaZzCB0p6kPMbKIdQEpCs-DwK6gHBteSK-ML5xMb4dfXuVowp-mYroz6ivVhDXtgRvF9XQomQBPKwIk3tFL7SN1wNmpB1-MeR19iSE7ydSiPjISitUj0HLAJ5CBmH10m-qegdHo3mYeFisIAKKi35D9Q7RaCokTwOyA004P_xpshkfosGWJWb-R7hjtxeoE3T9vwJqreTaGgJq9x9lH-26oylL1-ix7a05VYAEPAJxC1m8bhsfVKZoMH8cPI9zOPsCeAofHTpbEE-O0UYGTyNMFXSjz6FlJhI6S-FKR4AIkxoWQ3gHsB8ml51SlkZCCnaFO9VnFc5RFRQNnMlDhFQfCY1zJmYqGB8e0CLyL8t9N22XT4sJCalBryjaasltN2aQpK0UXDZJ6th-kLtX1AtjOtrazf9nu4j9-cokOkmDpRp_KK9RZf23iNRCFdXFT-8QPhxm5lw
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Kurdish+Text+Segmentation+using+Projection-Based+Approaches&rft.jtitle=UHD+Journal+of+Science+and+Technology&rft.au=Tofiq+Ahmed+Tofiq&rft.au=Jamal+Ali+Hussein&rft.date=2021-05-16&rft.pub=University+of+Human+Development&rft.issn=2521-4209&rft.eissn=2521-4217&rft.volume=5&rft.issue=1&rft.spage=56&rft.epage=65&rft_id=info:doi/10.21928%2Fuhdjst.v5n1y2021.pp56-65&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_8df35040bd304ec68ce1a2591c832e22
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2521-4209&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2521-4209&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2521-4209&client=summon