(45-1) 11 * << * >> * Russian * English * Content * All Issues

Optimal affine image normalization approach for optical character recognition
I.A. Konovalenko 1,2, V.V. Kokhan 1,2, D.P. Nikolaev 1,2

Institute for Information Transmission Problems RAS, 127051, Moscow, Russia, Bolshoy Karetny per. 19, bld. 1,
Smart Engines, 117312, Moscow, Russia, pr-t 60-letiya Oktyabrya, 9

 PDF, 1063 kB

DOI: 10.18287/2412-6179-CO-759

Pages: 90-100.

Full text of article: English language.

Optical character recognition (OCR) in images captured from arbitrary angles requires preliminary normalization, i.e. a geometric transformation resulting in an image as if it was captured at an angle suitable for OCR. In most cases, a surface containing characters can be considered flat, and a pinhole model can be adopted for a camera. Thus, in theory, the normalization should be projective. Usually, the camera optical axis is approximately perpendicular to the document surface, so the projective normalization can be replaced with an affine one without a significant loss of accuracy. An affine image transformation is performed significantly faster than a projective normalization, which is important for OCR on mobile devices. In this work, we propose a fast approach for image normalization. It utilizes an affine normalization instead of a projective one if there is no significant loss of accuracy. The approach is based on a proposed criterion for the normalization accuracy: root mean square (RMS) coordinate discrepancies over the region of interest (ROI). The problem of optimal affine normalization according to this criterion is considered. We have established that this unconstrained optimization is quadratic and can be reduced to a problem of fractional quadratic functions integration over the ROI. The latter was solved analytically in the case of OCR where the ROI consists of rectangles. The proposed approach is generalized for various cases when instead of the affine transform its special cases are used: scaling, translation, shearing, and their superposition, allowing the image normalization procedure to be further accelerated.

optical character recognition, image registration, image normalization, coordinate discrepancy, projective transformation, affine transformation, approximation, optimization, symbolic computation.

Konovalenko IA, Kokhan VV, Nikolaev DP. Optimal affine image normalization approach for optical character recognition. Computer Optics 2021; 45(1): 90-100. DOI: 10.18287/2412-6179-CO-759.

This work was partially financially supported by the Russian Foundation for Basic Research, projects 18-29-26035 and 17-29-03370.


  1. Zeynalov R, Velizhev A, Konushin A. Vosstanovlenie formy stranicy teksta dlya korrekcii geometricheskih iskazhenij [In Russian]. Proc of the 19 International Conference GraphiCon-2009 2009: 125-128.
  2. Zhukovskiy AE, Nikolaev DP, Arlazarov VV, et al. Segments graph-based approach for document capture in a smartphone video stream. ICDAR 2017; 1: 337-342. DOI: 10.1109/ICDAR.2017.63.
  3. Bolotova YuA, Spitsyn VG, Osina PM. A review of algorithms for text detection in images and videos. Computer Optics 2017; 41(3): 441-452. DOI: 10.18287/2412-6179-2017-41-3-441-452.
  4. Shemiakina JA, Faradjev IA, Zhukovsky AE. Research on algorithms for calculation of projective transformation in the problem of planar-object targeting by feature points. Sci Tech Inf Process 2018; 45(5): 346-351.
  5. Skoryukina N, Shemyakina J, Arlazarov VL, Faradzhev I. Document localization algorithms based on feature points and straight lines. Proc SPIE 2018; 10696: 106961H. DOI: 10.1117/12.2311478.
  6. Povolotskiy MA, Kuznetsova EG, Khanipov TM. Russian license plate segmentation based on dynamic time warping. Proc ECMS 2017: 285-291.
  7. Skoryukina NS, Chernov TS, Bulatov KB, et al. Snapscreen: TV-stream frame search with projectively distorted and noisy query. Proc SPIE 2017; 10341; 103410Y. DOI: 10.1117/12.2268735.
  8. Xie Y, Tang G, Hoff W. Geometry-based populated chessboard recognition. Proc SPIE 2018; 10696: 1069603.
  9. Arvind CS, Ritesh Mishra, Kumar Vishal, Venugopal Gundimeda. Vision based speed breaker detection for autonomous vehicle. Proc SPIE 2018; 10696: 106960E.
  10. Dubuisson M-P, Jain AAK. A modified Hausdorff distance for object matching. Proc 12th International Conference on Pattern Recognition 1994; 1: 566-568.
  11. Sim D-G, Kwon O-K, Park R-H. Object matching algorithms using robust Hausdorff distance measures. IEEE Trans Image Process 1999; 8(3): 425-429.
  12. Orrite C, Herrero JE. Shape matching of partially occluded curves invariant under projective transformation. Comput Vis Image Underst 2004; 93(1): 34-64.
  13. Nikolayev PP. Projectively invariant description of non-planar smooth figures. 1. Preliminary analysis of the problem [In Russian]. Sensornye Sistemy 2016; 30(4): 290-311.
  14. Balitskii AM, Savchik AV, Konovalenko IA, Gafarov RF. On projectively invariant points of an oval with a distinguished exterior line. Probl Inf Transm 2017; 53(3): 279-283.
  15. Savchik AV, Nikolaev PP. Metod proektivnogo sopostavleniya dlya ovalov s dvumya otmechennymi tochkami [In Russian]. Informacionnye Tekhnologii i vychislitel'nye Sistemy 2018; 2018(1): 60-67.
  16. Katamanov SN. Avtomaticheskaya privyazka izobrazhenij geostacionarnogo sputnika MTSAT-1R [In Russian]. Sovremennye Problemy Distancionnogo Zondirovaniya Zemli iz Kosmosa 2007; 1(4): 63-68.
  17. Karpenko S, Konovalenko I, Miller A, et al. UAV Control on the basis of 3D Landmark Bearing-Only observations. Sensors 2015; 15(12): 29802-29820. DOI: 10.3390/s151229768.
  18. Kholopov I.S. Projective distortion correction algorithm at low altitude photographing. Computer Optics 2017; 41(2): 284-290. DOI: 10.18287/0134-2452-2017-41-2-284-290.
  19. Legge GE, Pelli DG, Rubin GS, et al. Psychophysics of reading. I. Normal vision. Vision Res 1985; 25(2): 239-252.
  20. Forsyth DA, Ponce J. Computer vision: a modern approach. Prentice Hall Professional Technical Reference; 2002.
  21. Triputen V, Gorohovatskij V. Algoritm parallel'noj normalizacii affinnyh preobrazovanij dlya cvetnyh izobrazhenij [In Russian]. Radioelektronika i Informatika 1997; 1: 97-98.
  22. Putyatin EP, Prokopenko DO, Pechenaya EM. Voprosy normalizacii izobrazhenij pri proektivnyh preobrazovaniyah [In Russian]. Radioelektronika i Informatika 1998; 2(3): 82-86.
  23. Wolberg G. Digital image warping. Los Alamitos, CA: IEEE Computer Society Press; 1990.
  24. Trusov A, Limonova E. The analysis of projective transformation algorithms for image recognition on mobile devices. Proc SPIE 2020; 11433: 114330Y.
  25. Gruen A. Adaptive least squares correlation: a powerful image matching technique. South African Journal of Photogrammetry, Remote Sensing and Cartography 1985; 14(3): 175-187.
  26. Ohta T-i, Maenobu K, Sakai T. Obtaining surface orientation from texels under perspective projection. IJCAI'81 1981; 2: 746-751.
  27. Pavić Darko Schönefeld V, Kobbelt L. Interactive image completion with perspective correction. Visual Comput 2006; 22(9-11): 671-681.
  28. Heckbert PS. Fundamentals of texture mapping and image warping. Technical Report. Berkeley: University of California, 1989.
  29. Lorenz H, Döllner J. Real-time piecewise perspective projections. GRAPP 2009: 147-155.
  30. Huang J-B, Singh A, Ahuja N. Single image superresolution from transformed self-exemplars. Proc IEEE Conf CVPR 2015: 5197-5206.
  31. 3D Pose from three corresponding points under weak-perspective projection. Technical Report. Cambridge, MA: Massachusetts Institute of Technology; 1992.
  32. Kutulakos KN, Vallino J. Affine object representations for calibration-free augmented reality. Proc IEEE Virtual Reality Annual International Symposium 1996: 25-36.
  33. Aradhye H, Myers GK. Method and apparatus for recognition of symbols in images of three-dimensional scenes. US Patent 7,738,706 of June 15, 2010.
  34. Mikolajczyk K, Schmid C. An affine invariant interest point detector. In Book: Heyden A, Sparr G, Nielsen M, Johansen P, eds. Computer vision – ECCV 2002. Berlin, Heidelberg, New York:  Springer-Verlag; 2002: 128-142.
  35. Mikolajczyk K, Schmid C. Scale & affine invariant interest point detectors. Int J Comput Vis 2004; 60(1): 63-86.
  36. Morel J-M, Yu G. ASIFT: A new framework for fully affine invariant image comparison. SIAM J Imaging Sci 2009; 2(2): 438-469.
  37. Kadir T, Zisserman A, Brady M. An affine invariant salient region detector. In Book: Pajdla T, Matas J, eds. Computer vision – ECCV 2004. Berlin, Heidelberg, New York: Springer-Verlag; 2004: 228-241.
  38. Faugeras OD. What can be seen in three dimensions with an uncalibrated stereo rig? In Book: Sandini G, ed. Computer vision – ECCV'92. Berlin, Heidelberg, New York: Springer-Verlag; 1992: 563-578.
  39. Zwicker M, Räsänen J, Botsch M, et al. Perspective accurate splatting. Proceedings of Graphics Interface 2004: 247-254.
  40. Kunina IA, Gladilin SA, Nikolaev DP. Blind compensation of radial distortion in a single image using fast Hough transform. Computer Optics 2016; 40(3): 395-403. DOI: 10.18287/2412-6179-2016-40-3-395-403.
  41. Hsu SC, Sawhney HS. Influence of global constraints and lens distortion on pose and appearance recovery from a purely rotating camera. Proc 4th IEEE Workshop on Applications of Computer Vision (WACV'98) 1998: 154-159.
  42. Chen H, Sukthankar R, Wallace G, Li K. Scalable alignment of large-format multi-projector displays using camera homography trees. IEEE Visualization (VIS 2002) 2002: 339-346.
  43. Konovalenko IA, Коkhаn VV, Nikolaev DP. Optimal affine approximation of image projective transformation [In Russian]. Sensornye Sistemy 2019; 33(1): 7-14.
  44. Vanichev AY. Normalizaciya siluetov ob"ektov v sistemah tekhnicheskogo zreniya [In Russian]. Programmnye Produkty i Sistemy 2007; 3: 86-88.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20