A review of algorithms for text detection in images and videos
Yu. A. Bolotova, V.G. Spitsyn, P.M. Osina


Tomsk Polytechnic University, Tomsk, Russia

Full text of article: Russian language.

This article reviews the history and state-of-the-art optical character recognition systems, such as ABBYY FineReader, Tesseract, CuneiForm, with particular attention given to their inner algorithms, including page layout analysis; page segmentation and document skew angle estimation. The overview includes the description and comparison of different methods proposed for the last 30 years in terms of speed and versatility. Critical analysis and discussions about the status of the field and open problems are reported.

OCR, page layout analysis, text segmentation, skew detection.

Bolotova YuA, Spitsyn VG, Osina PM. A review of algorithms for text detection in images and videos. Computer Optics 2017; 41(3): 441-452. DOI: 10.18287/2412-6179-2017-41-3-441-452.


  1. Kuzmitskiy NN. Detection of text objects in images of real scenes based on convolutional neural network model [In Russian]. Informatics 2015; 2(46): 12-21.
  2. Kazanskiy NL, Popov SB. The distributed vision system of the registration of the railway train [In Russian]. Computer Optics 2012; 36(3): 419-428.
  3. Smith RW. Hybrid page layout analysis via tab-stop detection. Proc ICDAR'09 2009: 214-245. DOI: 10.1109/ICDAR.2009.257.
  4. Yin X-C, Pei W-Y, Zhang J, Hao H-W. Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015; 37(9): 1930-1937. DOI: 10.1109/TPAMI.2014.2388210.
  5. Zuo Z-Y, Tian S, Yin X-C. Multi-strategy tracking based text detection in scene videos. ICDAR 2015: 66-70. DOI: 10.1109/ICDAR.2015.7333727.
  6. Koo HI, Kim DH. Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 2013; 22(6): 2296-2305. DOI: 10.1109/TIP.2013.2249082.
  7. Nagy G. Twenty years of document image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000; 22(1): 38-62. DOI: 10.1109/34.824820.
  8. Bolotova YuA, Spitsyn VG, Rudometkina MN. License plate recognition algorithm on the basis of a connected components method and a hierarchical temporal memory model.Computer Optics 2015; 39(2): 275-280. DOI: 10.18287/0134-2452-2015-39-2-275-280.
  9. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 2016; 116(1): 1-20. DOI: 10.1007/s11263-015-0823-z.
  10. Novikova T, Barinova O, Kohli P, Lempitsky V. Large-lexicon attribute-consistent text recognition in natural images. ECCV 2012: 752-765. DOI: 10.1007/978-3-642-33783-3_54.
  11. Zapryagaev SA, Sorokin AI. Handwritten character recognition based on analysis of chord-length function descriptors. Proceedings of Voronezh State University; Series: System Analysis and Information Technologies 2009; 2: 49-58.
  12. Glumov NI, Mjasnikov EV, Kopenkov VN, Chicheva MA. The method of fast correlation using ternary templates for object recognition on images [In Russian]. Computer Optics 2008; 32(3): 277-282.
  13. Smith R. History of the Tesseract OCR engine: what worked and what didn’t. Proc SPIE 2013; 8658: 865802. DOI: 10.1117/12.2010051.
  14. Breuel TM. The OCRopus open source OCR system. Proc SPIE 2008; 6815: 68150F. DOI: 10.1117/12.783598.
  15. Senior AW. Off-line cursive handwriting recognition using recurrent neural networks. PhD thesis. Cambridge: Cambridge University;1994.
  16. Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 2008; 31(5): 855-868. DOI: 10.1109/TPAMI.2008.137.
  17. Srihari SN, Zack GW. Document Image analysis. Proceedings of 8-th International Conference on Pattern Recognition 1986: 434-436.
  18. Gorohovatskyi OV. The detection of text regions on image of a document using merge method. Information Processing Systems 2014; 1(117): 75-81.
  19. Cattoni R, Coianiz T, Messelodi S, Modena CM. Geometric layout analysis techniques for document image understanding: a review. ITC-irst Technical Report TR#9703-09 1998: 68p. Source: áhttp://www.acade­mia.edu/18416548/Geometric_p
  20. Layout_Analysis_Techniques_for_Document_Image_Understanding_a_Review._TR_9703-09ñ.
  21. Negi A, Shanker KN, Chereddi CK. Localization, Extraction and recognition of text in Telugu document images. Proc ICDAR 2003: 1193-1197. DOI: 10.1109/ICDAR.2003.1227846.
  22. Bukhari SS, Shafait F, Breuel TM. High performance layout analysis of Arabic and Urdu document images. Proc ICDAR 2011: 1275-1279. DOI: 10.1109/ICDAR.2011.257.
  23. Wong KY, Casey RG, Wahl FM. Document analysis system. IBM Journal of Research and Development 1982; 26(6): 647-656. DOI: 10.1147/rd.266.0647.
  24. Nagy G, Wagle S. Hierarchical representation of optically scanned documents. Proceedings of 7-th International conference on Pattern recognition 1984: 347-349.
  25. Baird HS, Jones SE, Fortune SJ. Image Segmentation by Shape-Directed Covers. Proc ICPR 1990: 820-825. DOI: 10.1109/ICPR.1990.118223.
  26. Oudjemia S, Ameur Z, Ouahabi A. Segmentation of complex document. Carpathian Journal of Electronic and Computer Engineering 2014; 7(1): 13-18.
  27. Breuel TM. An algorithm for finding maximal white­space rectangles at arbitrary orientations for document layout analysis. Proc ICDAR 2003; 1: 66-70. DOI: 10.1109/ICDAR.2003.1227629.
  28. Winder A, Andersen T, Smith EHB. Extending page segmentation algorithms for mixed-layout document processing. Proc ICDAR 2011: 1245-1249. DOI: 10.1109/ICDAR.2011.251.
  29. Breuel TM. Two geometric algorithms for layout analysis. International Workshop on Document Analysis Systems 2002: 188-199. DOI: 10.1007/3-540-45869-7_23.
  30. Shafait F, Keysers D, Breuel TM. Performance comparison of six algorithms for page segmentation. International Workshop on Document Analysis Systems 2006: 368-379. DOI: 10.1007/11669487_33.
  31. Baird HS. Background structure in document images. International Journal of Pattern Recognition and Artificial Intelligence 1994; 8(05): 1013-1030. DOI: 10.1142/S0218001494000516.
  32. O'Gorman L. The document spectrum for page layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 1993; 15(11): 1162-1173. DOI: 10.1109/34.244677.
  33. Skvortsov AV. Delaunay trianguliation and its application [In Russian]. Tomsk: Tomsk University Publisher; 2002. ISBN: 5-7511-1501-5.
  34. Kise K, Sato A, Iwata M. Segmentation of page images using the area Voronoi diagram. Computer Vision and Image Understanding 1998; 70(3): 370-382. DOI: 10.1006/cviu.1998.0684.
  35. Mao S, Kanungo T. Empirical performance evaluation methodology and its application to page segmentation algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001; 23(3): 242-256. DOI: 10.1109/34.910877.
  36. Gather P, Singh A. Empirical performance evaluation methodology and its application to page segmentation algorithms: A review. International Journal of Advanced Research in Computer Engineering & Technology 2015; 4(4): 1277-1279.
  37. Esposito F, Malerba D, Semeraro G. A knowledge-ba­sed approach to the layout analysis. Proc ICDAR 1995; 1: 466-471. DOI: 10.1109/ICDAR.1995.599037.
  38. Li L, Yu S, Zhong L, Li X. Multilingual text detection with nonlinear neural network. Mathematical Problems in Engineering 2015; 2015: 431608. DOI: 10.1155/2015/431608.
  39. Shih FY, Chen SS. Adaptive document block segmentation and classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 1996; 26(5): 797-802. DOI: 10.1109/3477.537322.
  40. Wang D, Srihari SN. Classification of newspaper image blocks using texture analysis. Computer Vision, Graphics, and Image Processing 1989; 47(3): 327-352. DOI: 10.1016/0734-189X(89)90116-3.
  41. Vil’kin AM, Safonov IV, Egorova MA. Algorithm for segmentation of documents based on texture features. Pattern Recognition and Image Analysis 2013; 23(1): 153-159. DOI: 10.1134/S1054661813010136.
  42. Sauvola JJ, Pietikäinen M. Page segmentation and classification using fast feature extraction and connectivity analysis. Proc ICDAR '95 1995; 2: 1127-1131. DOI: 10.1109/ICDAR.1995.602118.
  43. Scherl W, Wahl F, Fuchsberger H. Automatic separation of text, graphic and picture segments in printed material. Pattern Recognition in Practice 1980: 213-221.
  44. Tsujimoto S, Asada H. Major components of a complete text reading system. Proceedings of the IEEE 1992; 80(7): 1133-1149. DOI: 10.1109/5.156475.
  45. Jain AK, Zhong Y. Page segmentation using texture analysis. Pattern Recognition 1996; 29(5): 743-770. DOI: 10.1016/0031-3203(95)00131-X.
  46. Cattoni R, Coianiz T, Messelodi S, Modena CM. Geometric layout analysis techniques for document image understanding: A review. ITC-irst Technical Report TR#9703-09 1998. Source: <http://www.acade­mia.edu/18416548/Geometric_Layout_Analysis_Techniques_for_Document_Image_Understanding_a_Review._TR_9703-09>.
  47. Jain AK, Bhattacharjee S. Text segmentation using Gabor filters for automatic document processing. Machine Vision and Applications 1992; 5(3): 169-184. DOI: 10.1007/BF02626996.
  48. Smith R. A simple and efficient skew detection algorithm via text row accumulation. Proc ICDAR '95 1995; 2: 1145-1148. DOI: 10.1109/ICDAR.1995.602124.
  49. Hough PVC. Method and means for recognizing complex patterns. Patent US 3069654, filed of March 26, 1960, published of Desember 18, 1962.
  50. Hinds SC, Fisher JL, D'Amato DP. A document skew detection method using run-length encoding and the Hough transform. Proc ICPR 1990; 1: 464-468. DOI: 10.1109/ICPR.1990.118147.
  51. Rashid SF, Shafait F, Breuel TM. Scanning neural network for text line recognition. 10th IAPR International Workshop on Document Analysis Systems (DAS) 2012: 105-109. DOI: 10.1109/DAS.2012.77.
  52. Breuel TM, Ul-Hasan A, Al-Azawi MA. High-perfor­mance OCR for printed English and Fraktur using LSTM networks. Proc ICDAR 2013: 683-687. DOI: 10.1109/ICDAR.2013.140.
  53. Nagy G, Nartker TA, Rice SV. Optical character recognition: an illustrated guide to the frontier. Proceedings of the IS&T/SPIE Symposium on Electronic Imaging 1999: 58-69.
  54. Masalovich A, Mestetskiy L. Warped image restoration based on continuous skeletal-border representation [In Russian]. Proceedings of the International Conference "GraphiCon" (Novosibirsk) 2006: 4 p. Source: áhttp://graphicon.ru/html/2006/wr34_16_MestetskiyMasalovitch.pdfñ.
  55. Wang T, Wu DJ, Coates A, Ng AY. End-to-end text recognition with convolutional neural networks. ICPR 2012: 3304-3308.
  56. Zhong Y, Zhang H, Jain AK. Automatic caption localization in compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence 2000; 22(4): 385-392. DOI: 10.1109/34.845381.

© 2009, IPSI RAS
Institution of Russian Academy of Sciences, Image Processing Systems Institute of RAS, Russia, 443001, Samara, Molodogvardeyskaya Street 151; e-mail: ko@smr.ru; Phones: +7 (846 2) 332-56-22, Fax: +7 (846 2) 332-56-20