Thinning for Segmentation-based and Segmentation-free for Arabic script adopting languages

Main Article Content

G. NABI
N. A. SHAIKH
R. A. RAJPER
R. A. SHAIKH

Abstract

Converting the text which is available in an image is not the same as the text in file as word processing is called Optical Character Recognition (OCR). The text image contains text which is not editable and recognizing by computer is called OCR process. Various approaches are used in preprocessing to prepare a text image for recognizing text available in an image. One of the approaches in preprocessing is thinning in which the available characters are thinned to one pixel of their strokes until the skeleton is found of the characters. The one-pixel skeleton is found of the characters of various languages so that the segmentation process of recognition is made easy. A thinning algorithm is presented in this paper for Arabic and its adopted languages such as Urdu, Persian, and Uyghur. The algorithm works in steps until the skeleton is found of a character. The thinning process always tries to keep all connected elements and the pattern of the character(s) intact so that the next phases of the OCR can be done easily. The custom-built application is an interactive process where the thinning process can be stopped and checked if it is the acceptable skeleton then the final image is produced. The interactive algorithm can be used with both types of OCR namely segmentation free and segmentation-based OCR. The thinning algorithm has been tested on various Arabic script and its adopting languages using multiple experiments. The other languages may be tested with our algorithms.

Article Details

Section
Articles