A Framework to Estimate the Key Point Within an Object Based on a Deep Learning Object Detection

W. Kurdthongmee, K. Suwannarat, C. Wattanapanich


Automatic identification of key points within objects is crucial in various application domains. This paper presents a novel framework for accurately estimating the key point within an object by leveraging deep neural network-based object detection. The proposed framework is built upon a training dataset annotated with four non-overlapping bounding boxes, one of which shares a coordinate with the key point. These bounding boxes collectively cover the entire object, enabling automatic annotation if region annotations around the key point exist. The trained object detector is then utilized to generate detection results, which are subsequently post-processed to estimate the key point. To validate the effectiveness of the framework, experiments were conducted using two distinct datasets: cross-sectional images of a parawood log and pupil images. The experimental results demonstrate that our proposed framework surpasses previously proposed approaches in terms of precision, recall, F1-score, and other domain-specific metrics. The improvement in performance can be attributed to the unique annotation strategy and the fusion of object detection and key point estimation within a unified deep learning framework. The contribution of this study lies in introducing a novel framework for closely estimating key points within objects based on deep neural network-based object detection. By leveraging annotated training data and post-processing techniques, our approach achieves superior performance compared to existing methods. This work fills a critical gap in the field by integrating object detection and key point estimation, which has received limited attention in previous research. Our framework provides valuable insights and advancements in key point estimation techniques, offering potential applications in precise object analysis and understanding.


Doi: 10.28991/HIJ-2023-04-01-08

Full Text: PDF


Key Point Estimation; Object Detection; Pupil Estimation; Wood Pith Detection; Computer Vision.


Kurdthongmee, W. (2020). A comparative study of the effectiveness of using popular DNN object detection algorithms for pith detection in cross-sectional images of parawood. Heliyon, 6(2), e03480. doi:10.1016/j.heliyon.2020.e03480.

Zheng, Y., Fu, H., Li, R., Lo, W. L., Chi, Z., Feng, D. D., Song, Z., & Wen, D. (2019). Intelligent evaluation of strabismus in videos based on an automated cover test. Applied Sciences (Switzerland), 9(4), 731. doi:10.3390/app9040731.

Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep Learning for Generic Object Detection: A Survey. International Journal of Computer Vision, 128(2), 261–318. doi:10.1007/s11263-019-01247-4.

Luo, H. L., & Chen, H. K. (2020). Survey of Object Detection Based on Deep Learning. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 48(6), 1230–1239. doi:10.3969/j.issn.0372-2112.2020.06.026.

Szegedy, C., Toshev, A., & Erhan, D. (2013). Deep neural networks for object detection. Advances in neural information processing systems, 5-10 December, 2013, Tahoe City, United States.

Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019). Object Detection with Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232. doi:10.1109/TNNLS.2018.2876865.

Kurdthongmee, W., Kurdthongmee, P., Suwannarat, K., & Kiplagat, J. K. (2022). A YOLO Detector Providing Fast and Accurate Pupil Center Estimation using Regions Surrounding a Pupil. Emerging Science Journal, 6(5), 985–997. doi:10.28991/ESJ-2022-06-05-05.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, 9905. Springer, Cham, Switzerland. doi:10.1007/978-3-319-46448-0_2.

Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint, arXiv:1804.02767. doi:10.48550/arCiv.1804.02767.

Bagherzadeh, S. Z., & Toosizadeh, S. (2022). Eye tracking algorithm based on multi model Kalman filter. HighTech and Innovation Journal, 3(1), 15-27. doi:10.28991/HIJ-2022-03-01-02.

Zhu, X., Vondrick, C., Fowlkes, C. C., & Ramanan, D. (2016). Do We Need More Training Data? International Journal of Computer Vision, 119(1), 76–92. doi:10.1007/s11263-015-0812-2.

Zhu, X., Vondrick, C., Ramanan, D., & Fowlkes, C. (2012). Do We Need More Training Data or Better Models for Object Detection? Procedings of the British Machine Vision Conference 2012. doi:10.5244/c.26.80.

Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, United States. doi:10.1109/cvpr.2017.789.

Haque, I., Alim, M., Alam, M., Nawshin, S., Noori, S. R. H., & Habib, M. T. (2022). Analysis of recognition performance of plant leaf diseases based on machine vision techniques. Journal of Human, Earth, and Future, 3(1), 129-137. doi:10.28991/HEF-2022-03-01-09.

Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018). DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. doi:10.1109/cvpr.2018.00418.

Shao, S., Li, Z., Zhang, T., Peng, C., Yu, G., Zhang, X., Li, J., & Sun, J. (2019). Objects365: A Large-Scale, High-Quality Dataset for Object Detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea. doi:10.1109/iccv.2019.00852.

Prusa, J., Khoshgoftaar, T. M., & Seliya, N. (2015). The Effect of Dataset Size on Training Tweet Sentiment Classifiers. 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). doi:10.1109/icmla.2015.22.

Pang, Y., Cao, J., Li, Y., Xie, J., Sun, H., & Gong, J. (2020). TJU-DHD: A diverse high-resolution dataset for object detection. IEEE Transactions on Image Processing, 30, 207-219. doi:10.1109/TIP.2020.3034487.

Tu, Z., Ma, Y., Li, Z., Li, C., Xu, J., & Liu, Y. (2022). RGBT Salient Object Detection: A Large-scale Dataset and Benchmark. IEEE Transactions on Multimedia. doi:10.1109/TMM.2022.3171688.

Zhou, Y., & Tuzel, O. (2018). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. doi:10.1109/cvpr.2018.00472.

Du, Z., Yin, J., & Yang, J. (2019). Expanding Receptive Field YOLO for Small Object Detection. Journal of Physics: Conference Series, 1314(1), 012202. doi:10.1088/1742-6596/1314/1/012202.

Qu, H., Zhang, L., Wu, X., He, X., Hu, X., & Wen, X. (2019). Multiscale object detection in infrared streetscape images based on deep learning and instance level data augmentation. Applied Sciences (Switzerland), 9(3), 565. doi:10.3390/app9030565.

Takahashi, M., Ji, Y., Umeda, K., & Moro, A. (2020). Expandable YOLO: 3D Object Detection from RGB-D Images. 2020 21st International Conference on Research and Education in Mechatronics (REM). doi:10.1109/rem49740.2020.9313886.

Huang, H., Tang, X., Wen, F., & Jin, X. (2022). Small object detection method with shallow feature fusion network for chip surface defect detection. Scientific Reports, 12(1), 1–9. doi:10.1038/s41598-022-07654-x.

Jarrett, K., Kavukcuoglu, K., Ranzato, M. A., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? 2009 IEEE 12th International Conference on Computer Vision. doi:10.1109/iccv.2009.5459469.

Thammarak, K., Sirisathitkul, Y., Kongkla, P., & Intakosum, S. (2022). Automated Data Digitization System for Vehicle Registration Certificates Using Google Cloud Vision API. Civil Engineering Journal, 8(7), 1447-1458. doi:10.28991/CEJ-2022-08-07-09.

Shrivastava, A., Gupta, A., & Girshick, R. (2016). Training Region-Based Object Detectors with Online Hard Example Mining. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2016.89.

RoyChowdhury, A., Chakrabarty, P., Singh, A., Jin, S., Jiang, H., Cao, L., & Learned-Miller, E. (2019). Automatic Adaptation of Object Detectors to New Domains Using Self-Training. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2019.00087.

Kumar B., C., Punitha, R., & Mohana. (2020). YOLOv3 and YOLOv4: Multiple Object Detection for Surveillance Applications. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India. doi:10.1109/icssit48917.2020.9214094.

Tran, D. P., Nguyen, G. N., & Hoang, V. D. (2020). Hyperparameter Optimization for Improving Recognition Efficiency of an Adaptive Learning System. IEEE Access, 8(160569), 160569–160580. doi:10.1109/ACCESS.2020.3020930.

Yoon, H., Lee, S. H., & Park, M. (2020). TensorFlow with user friendly Graphical Framework for object detection API. arXiv preprint, arXiv:2006.06385. doi:10.48550/arXiv.2006.06385.

Wang, J., Song, L., Li, Z., Sun, H., Sun, J., & Zheng, N. (2021). End-to-End Object Detection with Fully Convolutional Network. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr46437.2021.01559.

Barbedo, J. G. A. (2018). Impact of dataset size and variety on the effectiveness of deep learning and transfer learning for plant disease classification. Computers and Electronics in Agriculture, 153, 46–53. doi:10.1016/j.compag.2018.08.013.

Karatas, G., Demir, O., & Sahingoz, O. K. (2020). Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset. IEEE Access, 8, 32150–32162. doi:10.1109/ACCESS.2020.2973219.

Althnian, A., AlSaeed, D., Al-Baity, H., Samha, A., Dris, A. Bin, Alzakari, N., Abou Elwafa, A., & Kurdi, H. (2021). Impact of dataset size on classification performance: An empirical evaluation in the medical domain. Applied Sciences (Switzerland), 11(2), 1–18. doi:10.3390/app11020796.

Ozer, I., Cetin, O., Gorur, K., & Temurtas, F. (2021). Improved machine learning performances with transfer learning to predicting need for hospitalization in arboviral infections against the small dataset. Neural Computing and Applications, 33(21), 14975–14989. doi:10.1007/s00521-021-06133-0.

Bailly, A., Blanc, C., Francis, É., Guillotin, T., Jamal, F., Wakim, B., & Roy, P. (2022). Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Computer Methods and Programs in Biomedicine, 213(106504). doi:10.1016/j.cmpb.2021.106504.

Bongiorno, V., Gibbon, S., Michailidou, E., & Curioni, M. (2022). Exploring the use of machine learning for interpreting electrochemical impedance spectroscopy data: evaluation of the training dataset size. Corrosion Science, 198, 110119. doi:10.1016/j.corsci.2022.110119.

Decelle, R., Ngo, P., Debled-Rennesson, I., Mothe, F., & Longuetaud, F. (2021). Pith Estimation on Tree Log End Images. Reproducible Research in Pattern Recognition. RRPR 2021. Lecture Notes in Computer Science, 12636. Springer, Cham, Switzerland. doi:10.1007/978-3-030-76423-4_7.

Larumbe-Bergera, A., Garde, G., Porta, S., Cabeza, R., & Villanueva, A. (2021). Accurate pupil center detection in off-the-shelf eye tracking systems using convolutional neural networks. Sensors, 21(20). doi:10.3390/s21206847.

Larumbe, A., Cabeza, R., & Villanueva, A. (2018). Supervised descent method (SDM) applied to accurate pupil detection in off-the-shelf eye tracking systems. Proceedings of the 2018 ACM Symposium on Eye Tracking Research Applications, 1-8. doi:10.1145/3204493.3204551.

Larumbe-Bergera, A., Porta, S., Cabeza, R., & Villanueva, A. (2019). SeTA. Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications. doi:10.1145/3314111.3319830.

King, D. E. (2009). Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, 10, 1755-1758.

Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-22 June, 2023, Vancouver, Canada.

Kim, S., Jeong, M., & Ko, B. C. (2020). Energy efficient pupil tracking based on rule distillation of cascade regression forest. Sensors (Switzerland), 20(18), 1–17. doi:10.3390/s20185141.

Lee, K.I., Jeon, J.H., & Song, B.C. (2020). Deep Learning-Based Pupil Center Detection for Fast and Accurate Eye Tracking System. Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, 12364, Springer, Cham, Switzerland. doi:10.1007/978-3-030-58529-7_3.

Cai, H., Liu, B., Ju, Z., Thill, S., Belpaeme, T., Vanderborght, B., & Liu, H. (2019). Accurate eye center localization via hierarchical adaptive convolution. British Machine Vision Conference 2018, BMVC 2018, 3-6 September, 2018, London, United Kingdom.

Levinshtein, A., Phung, E., & Aarabi, P. (2018). Hybrid eye center localization using cascaded regression and hand-crafted model fitting. Image and Vision Computing, 71, 17–24. doi:10.1016/j.imavis.2018.01.003.

Choi, J. H., Il Lee, K., Kim, Y. C., & Cheol Song, B. (2019). Accurate Eye Pupil Localization Using Heterogeneous CNN Models. 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan. doi:10.1109/icip.2019.8803121.

Kitazumi, K., & Nakazawa, A. (2018). Robust Pupil Segmentation and Center Detection from Visible Light Images Using Convolutional Neural Network. 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan. doi:10.1109/smc.2018.00154.

Full Text: PDF

DOI: 10.28991/HIJ-2023-04-01-08


  • There are currently no refbacks.

Copyright (c) 2023 W. Kurdthongmee, K. Suwannarat, C. Wattanapanich