(45-1) 10 * << * >> * Russian * English * Content * All Issues

Weighted combination of per-frame recognition results for text recognition in a video stream
O. Petrova 1,2, K. Bulatov 1,2,3, V.V. Arlazarov 1,2, V.L. Arlazarov 1,2,3

FRC CSC RAS, Moscow, Russia,
Smart Engines Service LLC, Moscow, Russia,
Moscow Institute of Physics and Technology (State University), Moscow, Russia

 PDF, 2666 kB

DOI: 10.18287/2412-6179-CO-795

Pages: 77-89.

Full text of article: English language.

The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.

mobile OCR, video stream, anytime algorithms, weighted combination, ensemble methods.

Petrova O, Bulatov K, Arlazarov VV, Arlazarov VL. Weighted combination of per-frame recognition results for text recognition in a video stream. Computer Optics 2021, 45(1): 77-89. DOI: 10.18287/2412-6179-CO-795.

This work is partially supported by the Russian Foundation for Basic Research (projects 17-29-03236 and 18-07-01387).


  1. Singh A, Bacchuwar K, Bhasin A. A survey of OCR appli-cations. Int J Mach Learn Comput 2012; 2(3): 314-318. DOI:10.7763/IJMLC.2012.V2.137.
  2. Soheili MR, Yousefi MR, Kabir E, Stricker D. Merging clustering and classification results for whole book recognition. 10th Iranian Conference on Machine Vision and Image Processing (MVIP) 2017: 134-138.
  3. Digitising the real-world: Transforming scanned text into digital data. Source: <https://www.itproportal.com/features/digitising-the-real-world-transforming-scanned-text-into-digital-data/>.
  4. Optical character recognition: how using ocr software can increase business efficiency. Source: <https://suscosolutions.com/optical-character-recognition-using-ocr-software-can-increase-business-efficiency/>.
  5. Mir AW, Ahmed H, Shah AA. Automated speed limit identification for efficient driving system. International Conference on Communication, Computing and Digital Systems (C-CODE) 2017: 299-303. DOI: 10.1109/C-CODE.2017.7918946.
  6. Jabnoun H, Benzarti F, Amiri H. A new method for text detection and recognition in indoor scene for assisting blind people. Proc SPIE 2017; 10341: 1034123. DOI: 10.1117/12.2268399.
  7. Saudagar A, Habeebvulla M. Augmented reality mobile application for arabic text extraction, recognition and translation. J Stat Manage Syst 2018; 21: 617-629. DOI: 10.1080/09720510.2018.1466968.
  8. License plate recognition systems. Source: <https://epic.org/privacy/licenseplates/>.
  9. Optical character recognition system hits the highway. Source: <https://www.vision-systems.com/non-factory/security-surveillance-transportation/article/16737819/optical-character-recognition-system-hits-the-highway>.
  10. Arora K, Bist A, Prakash R, Chaurasia S. Custom OCR for identity documents: OCRXNet. Aptisi Transactions On Technopreneurship 2020; 2: 112-119. DOI: 10.34306/att.v2i2.87.
  11. Panchal RB, Sonawane, RG, Shaikh H, Gawali PP. Design of text detection and translation system for camera based android smartphone. International Journal for Scientific Research & Development 2015;  3(1): 4 p.
  12. Ôn Vũ Ngoc M, Fabrizio J, Géraud T. Document detection in videos captured by smartphones using a saliency-based method. ICDARW 2019: 19-24. DOI: 10.1109/ICDARW.2019.30059.
  13. Esser D, Muthmann K, Schuster D. Information extraction efficiency of business documents captured with smartphones and tablets. Proc ACM Symposium on Document Engineering 2013: 111-114.
  14. Xu J, Wu X. A system to localize and recognize texts in Oriented ID card images. IEEE International Conference on Progress in Informatics and Computing (PIC) 2018: 149-153. DOI: 10.1109/PIC.2018.8706303.
  15. Attivissimo F, Giaquinto N, Scarpetta M, Spadavecchia M. An automatic reader of identity documents. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) 2019: 3525-3530. DOI: 10.1109/SMC.2019.8914438.
  16. Arlazarov V, Bulatov K, Chernov T, Arlazarov VL. MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics 2019; 43(5): 818-824. DOI: 10.18287/2412-6179-2019-43-5-818-824.
  17. Myasnikov E, Savchenko A. Detection of sensitive textual information in user photo albums on mobile devices. International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON, Novosibirsk, Russia) 2019: 0384-0390. DOI: 10.1109/SIBIRCON48586.2019.8958325.
  18. Arlazarov VV, Zhukovsky A, Krivtsov V, Nikolaev D, Polevoy D. Analysis of using stationary and mobile small-scale digital cameras for documents recognition [In Russian]. Information Technologies and Computing Systems 2014; 3: 71-81.
  19. Alaql O, Ghazinour K, Lu CC. Classification of image distortions for image quality assessment. International Conference on Computational Science and Computational Intelligence (CSCI) 2016: 653-658. DOI: 10.1109/CSCI.2016.0129.
  20. Puybareau É, Géraud T. Real-time document detection in smartphone videos. 25th IEEE International Conference on Image Processing (ICIP 2018: 1498-1502. DOI: 10.1109/ICIP.2018.8451533.
  21. Chernov TS, Ilin DA, Bezmaternykh PV, Faradjev IA,  Karpenko SM. Research of methods for segmentation of document text block images using algorithms of structure analysis and machine learning. RFBR Journal 2016; 4(92): 55-71. DOI: 10.22204/2410-4639-2016-092-04-55-71.
  22. Bulatov K, Matalov D, Arlazarov V. MIDV-2019: challenges of the modern mobile-based document OCR. Proc SPIE 2020: 11433: 717-722. DOI: 10.1117/12.2558438.
  23. Tropin DV, Shemyakina YA, Konovalenko IA, Faradzhev IA. Localization of planar objects on the images with complex structure of projective distortion. Informatsionnye Protsessy 2019; 19(2): 208-229.
  24. Javed K, Shafait F. Real-time document localization in natural images by recursive application of a CNN. 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017: 105-110. DOI: 10.1109/ICDAR.2017.26.
  25. Zhu A, Zhang C, Li Z. et al. Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement. IJDAR 2019; 22: 351-360. DOI: 10.1007/s10032-019-00341-0.
  26. Cheng Z, Lu J, Niu Y, Pu S, Wu F, Zhou S. You only recognize once: Towards fast video text spotting. Proceedings of the 27th ACM International Conference on Multimedia 2019: 855-863. DOI: 10.1145/3343031.3351093.
  27. Brisinello M, Grbić R, Stefanovič D, Pečkai-Kovač R. Optical character recognition on images with colorful back-ground. IEEE 8th International Conference on Consumer Electronics (ICCE-Berlin) 2018: 1-6. DOI: 10.1109/ICCE-Berlin.2018.8576202.
  28. Mustafa WA, Aziz H, Khairunizam W, Ibrahim Z, Shahri-man A, Razlan ZM. Review of different binarization ap-proaches on degraded document images. International Conference on Computational Approach in Smart Systems Design and Applications (ICASSDA) 2018: 1-8. DOI: 10.1109/ICASSDA.2018.8477621.
  29. Islam N, Islam Z, Noor N. A survey on optical character recognition system. ITB Journal of Information and Com-munication Technology 2016; 10(2).
  30. Tereshin A, Usilin S, Arlazarov VV. Performance improvement of multi-class detection using greedy algorithm for Viola-Jones cascade selection. Proc SPIE 2018; 10696: 106960D. DOI: 10.1117/12.2310101.
  31. Skoryukina N, Arlazarov VV, Nikolaev D. Fast method of ID documents location and type identification for mobile and server application. 15th International Conference on Document Analysis and Recognition (ICDAR) 2019: 850-857. DOI: 10.1109/ICDAR.2019.00141.
  32. Fang X, Fu X, Xu X. ID card identification system based on image recognition. 12th IEEE Conference on Industrial Electronics and Applications (ICIEA) 2017: 1488-1492. DOI: 10.1109/ICIEA.2017.8283074.
  33. Volkova V, Deriuga I, Osadchyi V, Radyvonenko O. Improvement of character segmentation using recurrent neural networks and dynamic programming. IEEE Second International Conference on Data Stream Mining AND Processing (DSMP) 2018: 218-222. DOI: 10.1109/DSMP.2018.8478457.
  34. Ryan M, Hanafiah N. An examination of character recognition on ID card using Template Matching Approach. International Conference on Computer Science and Computational Intelligence (ICCSCI) 2015; 59: 520-529. DOI: 10.1016/j.procs.2015.07.534.
  35. Anantharajah K, Denman S, Sridharan S, Fookes C, Tjondronegoro D. Quality based frame selection for video face recognition. 6th International Conference on Signal Processing and Communication Systems 2012: 1-5. DOI: 10.1109/ICSPCS.2012.6507950.
  36. Zhanzhan C, Jing L, Yi N, Shiliang P, Fei W, Shuigeng Z. You only recognize once: Towards fast video text spot-ting. 27th ACM International Conference 2019: 855-863. DOI: 10.1145/3343031.3351093.
  37. Haris M, Shakhnarovich G, Ukita N. Recurrent back-projection network for video super-resolution. Proc IEEE Conference on Computer Vision and Pattern Recognition 2019: 3897-3906. DOI: 10.1109/CVPR.2019.00402.
  38. Deudon M, Kalaitzis A, Goytom I, Arefin MdR, Lin Z, Sankaran K, Michalski V, Kahou SE, Cornebise J, Bengio Y. HighRes-net: Multi-Frame Super-Resolution by Recursive Fusion. ICLR 2020 Conference. Source: <https://openreview.net/forum?id=HJxJ2h4tPr>.
  39. Mehregan K, Ahmadyfard A, Khosravi H. Super-resolution of license-plates using frames of low-resolution video. 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) 2019: 1-6. DOI: 10.1109/ICSPIS48872.2019.9066104.
  40. Merino-Gracia C, Mirmehdi M. Real-time text tracking in natural scenes. IET Comput Vis 2014; 8(6): 670-681. DOI: 10.1049/iet-cvi.2013.0217.
  41. Cheng Z, Lu J, Xie J, Niu Y, Pu S, Wu F. Efficient video scene text spotting: Unifying detection, tracking, and recognition. arXiv e-prints 2019. Source: <https://arxiv.org/abs/1903.03299>.
  42. Zhang S, Li P, Meng Y, Li L, Zhou Q, Fu X. A video deblurring algorithm based on motion vector and an encorder-decoder network. IEEE Access 2019; 7: 86778-86788. DOI: 10.1109/ACCESS.2019.2923759.
  43. Myasnikov VV, Dmitriev EA. The accuracy dependency investigation of simultaneous localization and mapping on the errors from mobile device sensors. Computer Optics 2019; 43(3): 492-503. DOI: 10.18287/2412-6179-2019-43-3-492-503.
  44. Sankar K, Jawahar C, Manmatha R. Nearest neighbor based collection OCR. Proceedings of the 9th IAPR International Workshop on Document Analysis Systems 2010: 207-214. DOI: 10.1145/1815330.1815357.
  45. Fiscus JG. A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings 1997: 347-354. DOI: 10.1109/ASRU.1997.659110.
  46. Zhou ZH. Ensemble methods: Foundations and algorithms. New York: Chapman and Hall/CRC; 2012. ISBN: 978-1-4398-3003-1.
  47. Polikar R. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 2006; 6(3): 21-45. DOI: 10.1109/MCAS.2006.1688199.
  48. Kittler J. On combining classifiers. IEEE Trans Pattern Anal Mach Intell 1998; 20(3): 226-239.
  49. Bulatov K, Arlazarov V, Chernov T, Slavin O, Nikolaev D. Smart IDReader: Document recognition in video stream. 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017; 6: 39-44. DOI: 10.1109/ICDAR.2017.347.
  50. Llobet R, Navarro Cerdán J, Perez-Cortes J-C, Arlandis J. OCR Post-processing using weighted finite-state transducers. Proc ICPR 2010: 2021-2024. DOI: 10.1109/ICPR.2010.498.
  51. Bulatov K. A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives. Bulletin SUSU MMCS 2019; 12(3): 74-88. DOI: 10.14529/mmp190307.
  52. Bulatov K, Lynchenko A, Krivtsov V. Optimal frame-by-frame result combination strategy for OCR in video stream. Proc SPIE 2017; 10696: 106961Z. DOI: 10.1117/12.2310139.
  53. Bulatov KB, Polevoy DV. Reducing overconfidence in neural networks by dynamic variation of recognizer relevance. Proc ECMS 2015, 488-491. DOI: 10.7148/2015-0488.
  54. Chernov TS, Ilyuhin SA, Arlazarov VV. Application of dynamic saliency maps to video stream recognition systems with image quality assessment. Proc SPIE 2018; 11041: 110410T. DOI: 10.1117/12.2522768.
  55. Petrova O, Bulatov K, Arlazarov VL. Methods of weighted combination for text field recognition in a video stream Proc SPIE 2020; 11433: 114332L. DOI: 10.1117/12.2559378.
  56. Chernyshova YS, Sheshkus AV, Arlazarov VV. Two-step CNN framework for text line recognition in camera-captured images. IEEE Access 2020; 8: 32587-32600. DOI: 10.1109/ACCESS.2020.2974051.
  57. Yujian L, Bo L. A normalized Levenshtein distance metric. IEEE Trans Pattern Anal Mach Intell 2007; 29(6): 1091-1095. DOI: 10.1109/TPAMI.2007.1078.
  58. Zilberstein S. Using anytime algorithms in intelligent systems. AI Magazine 1996; 17(3): 73-83.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20