Русский * English * Содержание * Все выпуски

MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream

V.V. Arlazarov1,2,3, K. Bulatov1,2,3, T. Chernov3, V.L. Arlazarov1,2,3

1Moscow Institute of Physics and Technology (State University), Moscow, Russia,  
2Institute for Systems Analysis, FRC CSC RAS, Moscow, Russia,
3LLC "Smart Engines Service", Moscow, Russia

DOI:10.18287/2412-6179-2019-43-5-818-824

Страницы: 818-824.

Язык статьи: английский.

Аннотация:
A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses.
The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.

Ключевые слова:
document analysis and recognition, dataset, identity documents, video stream recognition.

Цитирование:
Arlazarov, VV. MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream / V.V. Arlazarov, K. Bulatov, T. Chernov, V.L. Arlazarov // Computer Optics. – 2019. – Vol. 43(5). – P. 818-824. – DOI: 10.18287/2412-6179-2019-43-5-818-824.

Благодарности:
This work is partially supported by Russian Foundation for Basic Research (projects 17-29-03170 and 17-29-03370). All source images for MIDV-500 dataset are obtained from Wikimedia Commons. Author attributions for each source images are listed in the description table at ftp://smartengines.com/midv-500/documents.pdf.

Литература:

  1. Gai, K. A survey on fintech / K. Gai, M. Qiu, X. Sun // Journal of Network and Computer Applications. – 2017. – Vol. 103. – P. 262-273. – DOI: 10.1016/j.jnca.2017.10.011.
  2. De Koker, L. Money laundering compliance – the challenges of technology / L. De Koker. – In: Financial crimes: Psychological, technological, and ethical issues / ed. by M. Dion, D. Weisstub, J.L. Richet. – Cham: Springer, 2016. – P. 329-347.
  3. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 85/46/EC (General Data Protection Regulation). Official Journal of the European Union. – 2016. – Vol. L119. – P. 1-88.
  4. de las Heras, L.P. Use case visual bag-of-words techniques for camera based identity document classification / L.P. de las Heras, O.R. Terrades, J. Llados, D. Fernandez-Mota, C. Canero // 2015 13th International Conference on Document Analysis and Recognition (ICDAR). –  2015. – P. 721-725.
  5. Awal, A.M. Complex document classification and localization application on identity document images / A.M. Awal, N. Ghanmi, R. Sicre, T. Furon // 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). – 2017. – Vol. 1. – P. 426-431.
  6. Simon, M. Fine-grained classification of identity document types with only one example / M. Simon, E. Rodner, J. Denzler // 2015 14th IAPR International Conference on Machine Vision Applications (MVA). – 2015. – P. 126-129.
  7. Usilin, S. Visual appearance based document image classification / S. Usilin, D. Nikolaev, V. Postnikov, G. Schaefer // 2010 IEEE International Conference on Image Processing. – 2010. – P. 2133-2136.
  8. Skoryukina, N. Real time rectangular document detection on mobile devices / N. Skoryukina, D.P. Nikolaev, A. Sheshkus, D. Polevoy // Proceedings of SPIE. – 2015. – Vol. 9445. – 94452A.
  9. Burie, J.C. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc) / J.C. Burie, J. Chazalon, M. Coustaty, S. Eskenazi, M.M. Luqman, M. Mehri, N. Nayef, J.M. Ogier, S. Prum, M. Rusinol // 2015 13th International Conference on Document Analysis and Recognition (ICDAR). – 2015. – P. 1161-1165.
  10. Veit, A. COCO-text: Dataset and benchmark for text detection and recognition in natural images [Electronical Resource] / A. Veit, T. Matera, L. Neumann, J. Matas, S. Belongie. – 2016. – URL: https://arxiv.org/abs/1601.07140 (request date 15.08.2019).
  11. Pratikakis, I. ICDAR2017 competition on document image binarization (DIBCO 2017) / I. Pratikakis, K. Zagoris, G. Barlas, B. Gatos // 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). – 2017. – Vol. 1. – P. 1395-1403.
  12. LeCun, Y. The MNIST database of handwritten digits [Electronical Resource] / Y. LeCun. – 1998. – URL: http://yann.lecun.com/exdb/mnist (request date 15.08.2019).
  13. Zhang, Y. Uber-text: A large-scale dataset for optical character recognition from street-level imagery / Y. Zhang, L. Gueguen, I. Zharkov, P. Zhang, K. Seifert, B. Kadlec // SUNw: Scene Underst Workshop – CVPR 2017. – 2017. – 2 p.
  14. Chazalon, J. SmartDoc 2017 video capture: Mobile document acquisition in video mode / J. Chazalon, P. Gomez-Kramer, J.C. Burie, M. Coustaty, S. Eskenazi, M. Luqman, N. Nayef, M. Rusinol, N. Sidere, J.M. Ogier // 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). – 2017. – Vol. 4. – P. 11-16.
  15. Sidere, N. A dataset for forgery detection and spotting in document images / N. Sidere, F. Cruz, M. Coustaty, J.M. Ogier // 2017 Seventh International Conference on Emerging Security Technologies (EST). – 2017. – P. 26-31.
  16. Harley, A.W. Evaluation of deep convolutional nets for document image classification and retrieval / A.W. Harley, A. Ufkes, K.G. Derpanis // Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR). – 2015. – P. 991-995.
  17. Antonacopoulos, A. A realistic dataset for performance evaluation of document layout analysis / A. Antonacopoulos, D. Bridson, C. Papadopoulos, S. Pletschacher // 2009 10th International Conference on Document Analysis and Recognition. – 2009. – P. 296-300.
  18. Clausner, C. ICDAR2017 competition on recognition of documents with complex layouts – RDCL2017 / C. Clausner, A. Antonacopoulos, S. Pletschacher // 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). – 2017. – Vol. 1. – P. 1404-1410.
  19. Kumar, J. A dataset for quality assessment of camera captured document images / J. Kumar, P. Ye, D. Doermann. – In: Camera-based document analysis and recognition / ed. by M. Iwamura, F. Shafait. – Cham: Springer, 2013. – P. 113-125.
  20. Nayef, N. SmartDoc-QA: A dataset for quality assessment of smartphone captured document images – single and multiple distortions / N. Nayef, M.M. Luqman, S. Prum, S. Eskenazi, J. Chazalon, J.M. Ogier // 2015 13th International Conference on Document Analysis and Recognition (ICDAR). – 2015. – P. 1231-1235.
  21. Deng, J. ImageNet: A large-scale hierarchical image database / J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei // 2009 IEEE Conference on Computer Vision and Pattern Recognition. – 2009. – P. 248-255.
  22. Krasin, I. OpenImages: A public dataset for large-scale multi-label and multi-class image classification [Electronical Resource] / I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, S. Kamali, M. Malloci, J. Pont-Tuset, A. Veit, S. Belongie, V. Gomes, A. Gupta, C. Sun, G. Chechik, D. Cai, Z. Feng, D. Narayanan, K. Murphy. – 2017. – URL: https://github.com/amukka/openimages (request date 15.08.2019).
  23. Chernov, T. An algorithm for detection and phase estimation of protective elements periodic lattice on document image / T. Chernov, S. Kolmakov, D. Nikolaev // Pattern Recognition and Image Analysis. – 2017. – Vol. 22, Issue 1. – P. 53-65.
  24. Chernov, T.S. Image quality assessment for video stream recognition systems / T.S. Chernov, N.P. Razumnuy, A.S. Kozharinov, D.P. Nikolaev, V.V. Arlazarov // Proceedings of SPIE. – 2018. – Vol. 10696. – 106961U.
  25. Bulatov, K. Smart IDReader: Document recognition in video stream / K. Bulatov, V. Arlazarov, T. Chernov, O. Slavin, D. Nikolaev // 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). – 2017. – Vol. 6. – P. 39-44.
  26. Zhukovskiy, A.E. Segments graph-based approach for document capture in a smartphone video stream / A.E. Zhukovskiy, D.P. Nikolaev, V.V. Arlazarov, V.V. Postnikov, D.V. Polevoy, N.S. Skoryukina, T.S. Chernov, Y.A.Shemyakina, A.A. Mukovozov, I.A. Konovalenko, M.A. Povolotskiy // 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). – 2017. – Vol. 1. – P. 337-342.
  27. Anantharajah, K. Quality based frame selection for video face recognition / K. Anantharajah, S. Denman, S. Sridharan, C. Fookes, D. Tjondronegoro // 2012 6th International Conference on Signal Processing and Communication Systems. – 2012. – P. 1-5.
  28. Zilberstein, S. Using anytime algorithms in intelligent systems / S. Zilberstein // AI Magazine. – 1996. – Vol. 17, Issue 3. – P. 73-83.
  29. King, D.E. Dlib-ml: A machine learning toolkit / D.E. King // The Journal of Machine Learning Research. – 2009. – Vol. 10. – P. 1755-1758.
  30. Bradski, G. The OpenCV Library / G. Bradski // Dr. Dobb’s Journal of Software Tools. – 2000. – URL: http://www.drdobbs.com/open-source/the-opencv-library/184404319 (request date 15.08.2019).
  31. Smith, R. An overview of the Tesseract OCR engine / Smith R. // Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). – 2007. – P. 629-633.
  32. Ocrad – The GNU OCR [Electronical Resource]. – 2017; – URL: https://www.gnu.org/software/ocrad/ (request date 15.08.2019).
  33. Yujian, L. A normalized Levenshtein distance metric / L. Yujian, L. Bo // IEEE Transactions on Pattern Analysis and Machine Intelligence. – 2007. – Vol. 29, Issue 6. – P. 1091-1095.

 


© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: ko@smr.ru ; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20