A New Method to Improve Multi Font Farsi/Arabic Character Segmentation Results: Using Extra Classes of Some Character Combinations

A new segmentation algorithm for multifont Farsi/Arabic texts based on conditional labeling of up and down contours was presented in [1]. A preprocessing technique was used to adjust the local base line for each subword. Adaptive base line, up and down contours and their curvatures were used to impr...

Full description

Saved in:

Bibliographic Details
Published in	Advances in Multimedia Modeling pp. 670 - 679
Main Authors	Omidyeganeh, Mona, Azmi, Reza, Nayebi, Kambiz, Javadtalab, Abbas
Format	Book Chapter Conference Proceeding
Language	English
Published	Berlin, Heidelberg Springer Berlin Heidelberg 2007 Springer
Series	Lecture Notes in Computer Science
Subjects	Applied sciences Arabic Text Artificial intelligence Character Segmentation Computer science; control theory; systems Exact sciences and technology Pattern recognition. Digital image processing. Computational geometry Segmentation Algorithm Segmentation Result Text Line Multimedia High performance Statistical analysis Arabic Segmentation Contour line Text Curvature Adaptive method
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A new segmentation algorithm for multifont Farsi/Arabic texts based on conditional labeling of up and down contours was presented in [1]. A preprocessing technique was used to adjust the local base line for each subword. Adaptive base line, up and down contours and their curvatures were used to improve the segmentation results. The algorithm segments 97% of 22236 characters in 18 fonts correctly. However, finding the best way to receive high performance in the multifont case is challengeable. Different characteristics of each font are the reason. Here we propose an idea to consider some extra classes in the recognition stage. The extra classes will be some parts of characters or the combination of 2 or more characters causing most of errors in segmentation stage. These extra classes will be determined statistically. We have used a learn document of 4820 characters for 4 fonts. Segmentation result improves from 96.7% to 99.64%.
ISBN:	3540694218 9783540694212
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-540-69423-6_65