(46-4) 09 * << * >> * Russian * English * Content * All Issues

Investigation of the applicability of natural language processing methods to problems of searching and matching of machinery drawing images
K.N. Figura 1

Bratsk State University,
665709, Bratsk, Russia, Makarenko 40

 PDF, 953 kB

DOI: 10.18287/2412-6179-CO-1030

Pages: 590-595.

Full text of article: Russian language.

In this work it is shown that the application of the technique of local feature descriptors in its pure form to the task of searching and matching of drawings is ineffective. It is revealed that this is mainly due to the presence in the drawings of a large number of identical elements (frames, a title block, extension lines, font elements, etc.). It is proposed that this problem should be solved using a tf-idf (term frequency-inverse document frequency) method, which is widely known in natural language processing. In the study, instead of the word vectors used in the original tf-idf technique, descriptors of image feature points calculated using the ORB and BRISK algorithms were used. The study has led to the following conclusions: 1) the proposed approach offers high efficiency in finding a copy of the image-query in the database. Thus, copies of all images presented for search and having their full analogs in the database are revealed. 2) The identification rate of modified image-queries varies, depending on the algorithm used for finding keypoints and descriptors. So, the maximum percentage of identified modified analogs is 60% when using ORB and 80% when using BRISK - out of all image analogs in the database. 3) The proposed approach shows a limited efficiency in finding images that can be attributed to the same class as the image queries (for example, a drawing of an excavator, a bulldozer, or a truck crane). Here, the maximum proportion of false identification has reached 60%.

natural language processing, tf-idf method, image retrieval, image analysis, pattern recognition, digital image processing.

Figura KN. Investigation of the applicability of natural language processing methods to problems of searching and matching of machinery drawing images. Computer Optics 2022; 46(4): 590-595. DOI: 10.18287/2412-6179-CO-1030.


  1. Ahmed KT, Ummesafi S, Iqbal A. Content based image retrieval using image features information fusion. Inf Fusion 2019; 51: 76-99. DOI: 10.1016/j.inffus.2018.11.004.
  2. Duan G, Yang J, Yang Y. Content-based image retrieval research. Phys Procedia 2011; 22: 471-477. DOI: 10.1016/j.phpro.2011.11.073.
  3. Tzelepi M, Tefas A. Deep convolutional learning for Content Based Image Retrieval. Neurocomputing 2018; 275: 2467-2478. DOI: 10.1016/j.neucom.2017.11.022.
  4. Haji MS, Alkawaz MH, Rehman A, Saba T. Content-based image retrieval: a deep look at features prospectus. Int J Comput Vis Robot 2019; 9(1): 14-38. DOI: 10.1504/IJCVR.2019.098004.
  5. Rana SP, Dey M, Siarry P. Boosting content based image retrieval performance through integration of parametric & nonparametric approaches. J Vis Commun Image Represent 2019; 58: 205-219. DOI: 10.1016/j.jvcir.2018.11.015.
  6. Mouats T, Aouf N, Nam D, Vidas S. Performance evaluation of feature detectors and descriptors beyond the visible. J Intell Robot Syst 2018; 92: 33-63. DOI: 10.1007/s10846-017-0762-8.
  7. Mukherjee D, Wu QMJ, Wang G. A comparative experimental study of image feature detectors and descriptors. Mach Vis Appl 2015; 26(4): 443-466. DOI: 10.1007/s00138-015-0679-9.
  8. Saha SK, Xiao D, Frost S, Kanagasingam Y. Performance evaluation of state-of-the-art local feature detectors and descriptors in the context of longitudinal registration of retinal images. J Med Syst 2018; 42(2): 57. DOI: 10.1007/s10916-018-0911-z.
  9. Ma J, Jiang X, Fan A, Jiang J, Yan J. Image matching from handcrafted to deep features: A survey. Int J Comput Vis 2021; 129: 23-79. DOI: 10.1007/s11263-020-01359-2.
  10. Zakharov AA, Zhiznyakov AL, Titov VS. A method for feature matching in images using descriptor structures. Computer Optics 2019; 43(5): 810-817. DOI: 10.18287/2412-6179-2019-43-5-810-817.
  11. Zakharov AA, Barinov AE, Zhiznyakov AL, Titov VS. Object detection in images with a structural descriptor based on graphs. Computer Optics 2018; 42(2): 283-290. DOI: 10.18287/2412-6179-2018-42-2-283-290.
  12. Zheng L, Yang Y, Tian Q. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans Pattern Anal Mach Intell 2018; 40(5): 1224-1244. DOI: 10.1109/TPAMI.2017.2709749.
  13. Alrahhal M, Supreethi KP. Content-based image retrieval using local patterns and supervised machine learning techniques. 2019 Amity Int Conf on Artificial Intelligence (AICAI) 2019: 118-124. DOI: 10.1109/AICAI.2019.8701255.
  14. Ali A, Sharma S. Content based image retrieval using feature extraction with machine learning. 2017 Int Conf on Intelligent Computing and Control Systems (ICICCS) 2017: 1048-1053. DOI: 10.1109/ICCONS.2017.8250625.
  15. Saritha RR, Paul V, Kumar PG. Content based image retrieval using deep learning process. Cluster Comput 2019; 22: 4187-4200. DOI: 10.1007/s10586-018-1731-0.
  16. GOST 2.001-2013 Unified system for design documentation (ESKD). General Provisions (as amended) 22 November 2013 [In Russian]. Source: <https://docs.cntd.ru/document/1200106859>.
  17. Krasnabayeu YA, Chistabayeu DV, Malyshev AL. Comparison of binary feature points descriptors of images under distortion conditions. Computer Optics 2019; 43(3): 434-445. DOI: 10.18287/2412-6179-2019-43-3-434-445.
  18. Lowe DG. Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf on Computer Vision 1999; 2: 1150-1157. DOI: 10.1109/ICCV.1999.790410.
  19. Bay H, Ess A, Tuytelaars T, Van Gool L. Speeded-up robust features (SURF). Comput Vis Image Underst 2008; 110: 346-359. DOI: 10.1016/j.cviu.2007.09.014.
  20. Calonder M, Lepetit V, Strecha C, Fua P. BRIEF: Binary robust independent elementary features. In Book: Daniilidis K, Maragos P, Paragios N, eds. Computer vision – ECCV 2010. Berlin, Heidelberg: Springer; 2010: 778-792. DOI: 10.1007/978-3-642-15561-1_56.
  21. Rublee E, Rabaud V, Konolige K, Bradski G. ORB: An efficient alternative to SIFT or SURF. 2011 Int Conf on Computer Vision 2011: 2564-2571. DOI: 10.1109/ICCV.2011.6126544.
  22. Leutenegger S, Chli M, Siegwart RY. BRISK: Binary robust invariant scalable keypoints. 2011 Int Conf on Computer Vision 2011: 2548-2455. DOI: 10.1109/ICCV.2011.6126542.
  23. Rosten E, Drummond T. Machine learning for high-speed corner detection. In Book: Leonardis A, Bischof H, Pinz A, eds. Computer vision – ECCV 2006. Part I. Berlin, Heidelberg: Springer; 2006: 430-443. DOI: 10.1007/11744023_34.
  24. OpenCV 4.4.0. OpenCV 2020. Source: <https://opencv.org/opencv-4-4-0/>.
  25. Roelleke T, Wang J. TF-IDF uncovered: a study of theories and probabilities. Proc 31st Annual Int ACM SIGIR conf on Research and Development in Information Retrieval 2008: 435-442. DOI: 10.1145/1390334.1390409.
  26. Whissell JS, Clarke CLA. Improving document clustering using Okapi BM25 feature weighting. Inf Retrieval 2011; 14: 466-487. DOI: 10.1007/s10791-011-9163-y.
  27. Bruno A, Cattaneo G, Petrillo UF, Narducci F, Roscigno G. Distributed anti-plagiarism checker for biomedical images based on sensor noise. In Book: Battiato S, Farinella GM, Leo M, Gallo G, eds. New trends in image analysis and processing – ICIAP 2017. Cham: Springer International Publishing; 2017: 343-352. DOI: 10.1007/978-3-319-70742-6_32.
  28. Iwanowski M, Cacko A, Sarwas G. Comparing images for document plagiarism detection. In Book: Chmielewski LJ, Datta A, Kozera R, Wojciechowski K, eds. Computer Vision and Graphics. Cham: Springer International Publishing; 2016: 532-543. DOI: 10.1007/978-3-319-46418-3_47.
  29. Chen Y, Gan L, Zhang S, Guo W, Chuang Y, Zhao X. Plagiarism detection in homework based on image hashing. In Book: Zou B, Han Q, Sun G, Jing W, Peng X, Lu Z, eds. Data science. Singapore: Springer Singapore; 2017: 424-432. DOI: 10.1007/978-981-10-6388-6_35.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20