Srivardhini Mandipati, Gottumukkala Asisha, Preethi Raj S, and Chitrakala S
Department of Computer Science and Engineering, Easwari Engineering College, Chennai, India
Abstract. Optical Character Recognition converts text in images into a form that the computer can manipulate. The need for faster OCRs stems from the abundance of such text. This paper presents a Two-Stage Rejection Algorithm for reducing the search space of an OCR. It is tacit that the reduction in search space expedites an OCR. Preprocessing operations are applied on the input and features are extracted from them. These feature vectors are clustered and the Two-Stage Rejection Algorithm is applied for character recognition. With about the same character recognition rate as other OCRs, an OCR reinforced with the Two-Stage Rejection Algorithm is considerably faster.
Keywords: Optical Character Recognition, Feature Extraction, K-means.
1 Introduction Optical character recognition has been an active area of research for many decades. The fact that OCRs have the potential to simplify data entry in the future adds value to research in this area. OCRs use various pattern matching techniques for character recognition. Most OCRs typically use classifiers like SVM or neural networks for character recognition. The training process for these classifiers is time consuming. Moreover, with an increase in the number of classes, the comparisons made increases and consequently the time taken for character recognition increases. Hence, they cannot be easily extended to recognize characters from additional languages. The proposed system uses a structural approach as opposed to statistical approach for feature extraction. The strength of the structural method over the statistical one is its representation of a pattern that is similar to the way human perceive it. The structural features help
References: [1]GWeijie Su, Xin Jin, “Hidden Markov Model with Parameter-Optimized K-means Clustering for Handwriting Recognition”, International Conference on Internet Computing and Information Services, pp:435-438, 2011 [2]Karthik Sheshadri, Pavan Kumar T Ambekar, Deeksha Padma Prasad and Dr.Ramakanth P Kumar, “An OCR system for Printed Kannada using K-means clustering”, International Conference on Industrial Technology ,pp:183-187, 2010 [3]Mu-King Tsay, Keh-Hwashyu, Pao-Chung Chang, “Feature Transformation with Generalized Learning Vector Quantization for Hand-Written Chinese Character Recognition”, IEICE Transactions on Information & System, Vol.E82-D, 1992 [4]B. Vijay Kumar, A. G. Ramakrishnan, “Radial Basis Function And Subspace Approach For Printed Kannada Text Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp: V-321-4 vol.5, 2004 [5]Premnath Dubey, Wasin Sinthupinyo, “New Approach on Structural Feature Extraction for Character Recognition”, International Symposium on Communications and Information Technologies, pp:946-949, 2010 [6]Igor Kleiner, Daniel Keren, Llan Newman, Oren Ben-Zwi,“Applying property testing to an image partitioning problem”, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 33, No.2, 2011 [7]Sanghamitra Mohanty, Himadri Nandini Dasbebartta, Tarun Kumar Behera, “An Efficient Bilingual Optical Character Recognition(English-Oriya) System for Printed Documents”, Seventh International Conference on Advances in Pattern Recognition, pp: 398 – 401, 2009 [8]Oivind Due Trier, Anil K Jain, and Torfinn Taxt ,“Feature Extraction Methods For Character Recognition–A Survey ”, Pattern Recognition, Vol 29, pp 641-662, 1995 [9]Vuokko Vuori, Jorma Laaksonen , “A Comparison of Techniques for Automatic Clustering of Handwritten Characters”, 16th International Conference on Pattern Recognition, Vol 3, pp:168-171, 2002