(46-5) 13 * << * >> * Russian * English * Content * All Issues

Development of software for the segmentation of text areas in real-scene images
V.A. Lobanova 1, Yu.A. Ivanova 1

Tomsk Polytechnic University, 634050, Tomsk, Russia

 PDF, 1304 kB

DOI: 10.18287/2412-6179-CO-1047

Pages: 790-800.

Full text of article: Russian language.

This article discusses the design and development of a neural network algorithm for the segmentation of text areas in real-scene images. After reviewing the available neural network models, the U-net model was chosen as a basis. Then an algorithm for detecting text areas in real-scene images was proposed and implemented. The experimental training of the network allows one to define the neural network parameters such as the size of input images and the number and types of the network layers. Bilateral and low-pass filters were considered as a preprocessing stage. The number of images in the KAIST Scene Text Database was increased by applying rotations, compression, and splitting of the images. The results obtained were found to surpass competing methods in terms of the F-measure value.

deep learning, U-Net architecture, image processing, image segmentation, text areas, real scenes images.

Lobanova VA, Ivanova YA. Development of software for the segmentation of text areas in real-scene images. Computer Optics 2022; 46(5): 790-800. DOI: 10.18287/2412-6179-CO-1047.


  1. Mechi O, Mehri M, Ingold R, Ben Amara NE. Text line segmentation in historical document images using an adaptive U–Net architecture. Int Conf on Document Analysis and Recognition 2019: 369-374.
  2. Chowdhury PN, Shivakumara P, Raghavendra R, Pal U, Lu T, Blumenstein M. A new U-Net based license plate enhancement model in night and day images 5th Asian Conf on Pattern Recognition 2019: 749-763.
  3. Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 2004; 22(10): 761-767.
  4. Neumann L, Matas J. Real-time scene text localization and recog-nition. IEEE Conf on Computer Vision and Pattern Recognition 2012: 3538-3545.
  5. Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2010; 2963-2970.
  6. Ahmed N, Natarajan T, RaoKR. Discrete cosine transform. IEEE Trans Comput 1974; C-23(1): 90-93.
  7. Zhong Y, Zhang H, Jain AK. Automatic caption localization in compressed video. IEEE Trans Pattern Anal Mach Intell 2000; 22(4): 385-392.
  8. Dalal N, Triggs B. Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2005; 1: 886-893.
  9. Czarnek N. Physically motivated feature development for machine learning applications. Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering in the Graduate School of Duke University 2017.
  10. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Proc 2001 IEEE Computer Society Conf on Computer Vision and Pattern Recognition 2001; 1: 511-518.
  11. Ghorbel A. Generalized Haar-like filters for document analysis: application to word spotting and text extraction from comics. Document and Text Processing. Université de La Rochelle; 2016.
  12. Chen X, Yuille AL. Detecting and reading text in natural scenes. Proc 2004 IEEE Computer Society Conf on Computer Vision and Pattern Recognition 2004; 2: 366-373.
  13. Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V. Multi-digit number recognition from street view imagery using deep convolutional neural networks. Proc Int Conf on Learning Representations 2014: 1-12.
  14. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X. Multi-oriented text detection with fully convolutional networks. Proc 2016 IEEE Conf on Computer Vision and Pattern Recognition 2016: 4159-4167.
  15. Ronneberger O, Fischer P, BroxT. U-net: Convolutional networks for bio-medical image segmentation. Med Image Comput Comput Assist Interv 2015; 9351: 234-241.
  16. Bezmaternykh PV, Ilin DA, Nikolaev DP. U-Net-bin: hacking the document image binarization contest. Computer Optics 2019; 43(5): 825-832. DOI: 10.18287/2412-6179-2019-43-5-825-832.
  17. Lee S, Cho MS, Jung K, Kim JH. Scene text extraction with edge constraint and text collinearity. 20th Int Conf on Pattern Recognition 2010: 3983-3986.
  18. Tomasi C, Manduchi R. Bilateral filtering for gray and color images. 6th Int Conf on Computer Vision 1998: 839-846.
  19. Bai B, Yin F, Liu CL. A seed-based segmentation method for scene text extraction. 11th IAPR Int Workshop on Document Analysis Systems2014: 262-266.
  20. Agrawal A, Mukherjee P, Srivastava S, Lall B. Enhanced characterness for text detection in the wild. Proc 2nd Int Conf on Computer Vision & Image Processing 2018: 359-369.
  21. Gomez L, Karatzas D. A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. Int J Doc Anal Recognit 2016; 19(4): 335-349.
  22. Jahangiri, M., Petrou, M. An attention model for extracting components that merit identification. 2009 16th IEEE Int Conf on Image Processing (ICIP) 2009: 965-968.
  23. Li Y, et al. Characterness. An indicator of text in the wild. IEEE Trans Image Process 2014; 23(4): 1666-1677.
  24. Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern Syst 1979; 9(1): 62-66.
  25. Niblack W. An introduction to digital image processing. New York: Prentice Hall; 1986.
  26. Kita K, Wakahara T. Binarization of color characters in scene images using k-means clustering and support vector machines. 2010 20th Int Conf on Pattern Recognition 2010: 3183-3186.
  27. Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R. Multi-lingual scene text detection and language identification. Pattern Recognit Lett 2020; 138: 16-22.
  28. Li L, Yu S, Zhong L, Li X. Multilingual text detection with nonlinear neural network. Math Probl Eng 2015; 2015: 431608.
  29. Xu H, Su X, Liu T, Guo P, Gao G, Bao F. A natural scene text extraction approach based on generative adversarial learning. Int Conf on Neural Information Processing 2019: 65-73.
  30. Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie J-C, Liu C-l, Ogier JM. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. 2019 Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1582-1587.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20