Fast and Accurate Pupil Estimation Through Semantic Segmentation Fine-Tuning on a Shallow Convolutional Backbone

Wattanapong Kurdthongmee, Piyadhida Kurdthongmee


In the diverse realms of computer vision, psychology, biometrics, medicine, and robotics, the accurate estimation of pupil size and position holds paramount importance for applications like eye tracking, medical diagnostics, and facial recognition. Traditional pupil estimation techniques often grapple with speed and error issues, impeding their applicability in real-world scenarios. To address this challenge, our study introduces an innovative approach that significantly enhances both the speed and accuracy of pupil estimation. This method hinges on the fine-tuning of a pre-trained semantic segmentation model integrated with a shallow convolutional neural network (CNN) backbone. Our methodology employs a dual-phase process: initially leveraging a robust pre-trained semantic segmentation model, subsequently refined through targeted fine-tuning using a diverse collection of eye images. This process intricately learns pupil characteristics, substantially elevating detection precision. The incorporation of a shallow CNN backbone streamlines the model, ensuring rapid processing suitable for real-time applications. The novelty of our approach lies in its adept handling of varying lighting and camera conditions, establishing new benchmarks in both speed and accuracy, as evidenced by our experimental findings. This advancement marks a significant leap in pupil estimation technology, offering a practical, efficient solution with far-reaching implications in several key technological domains.


Doi: 10.28991/HIJ-2024-05-02-016

Full Text: PDF


Pupil Estimation; Semantic Segmentation; Shallow Convolutional Neural Network; Fine-Tuning; Deep Learning.


Xiong, J., Zhang, Z., Wang, C., Cen, J., Wang, Q., & Nie, J. (2024). Pupil localization algorithm based on lightweight convolutional neural network. The Visual Computer, 1-17. doi:10.1007/s00371-023-03222-0.

Wang, C., Muhammad, J., Wang, Y., He, Z., & Sun, Z. (2020). Towards complete and accurate iris segmentation using deep multi-task attention network for non-cooperative iris recognition. IEEE Transactions on information forensics and security, 15, 2944-2959. doi:10.1109/TIFS.2020.2980791.

Sangeetha, S. K. B. (2021). A survey on deep learning-based eye gaze estimation methods. Journal of Innovative Image Processing (JIIP), 3(03), 190-207. doi:10.36548/jiip.2021.3.003.

Kurdthongmee, W., Suwannarat, K., & Wattanapanich, C. (2023). A framework to estimate the key point within an object based on a deep learning object detection. HighTech and Innovation Journal, 4(1), 106-121. doi:10.28991/HIJ-2023-04-01-08.

Pathirana, P., Senarath, S., Meedeniya, D., & Jayarathna, S. (2022). Eye gaze estimation: A survey on deep learning-based approaches. Expert Systems with Applications, 199, 116894. doi:10.1016/j.eswa.2022.116894.

Khan, W., Hussain, A., Kuru, K., & Al-Askar, H. (2020). Pupil localisation and eye centre estimation using machine learning and computer vision. Sensors, 20(13), 3785. doi:10.3390/s20133785.

Yan, C., Wang, Y., & Zhang, Z. (2011). Robust real-time multi-user pupil detection and tracking under various illumination and large-scale head motion. Computer Vision and Image Understanding, 115(8), 1223-1238. doi:10.1016/j.cviu.2011.03.001.

Han, Y. J., Kim, W., & Park, J. S. (2018). Efficient Eye-Blinking Detection on Smartphones: A Hybrid Approach Based on Deep Learning. Mobile Information Systems, 2018(1), 6929762. doi:10.1155/2018/6929762.

Dubey, N., Ghosh, S., & Dhall, A. (2019, July). Unsupervised learning of eye gaze representation from the web. In 2019 International Joint Conference on Neural Networks (IJCNN), 1-7. doi:10.1109/IJCNN.2019.8851961.

Wan, Z. H., Xiong, C. H., Chen, W. B., & Zhang, H. Y. (2021). Robust and accurate pupil detection for head-mounted eye tracking. Computers & Electrical Engineering, 93, 107193. doi:10.1016/j.compeleceng.2021.107193.

Donuk, K., Ari, A., & Hanbay, D. (2022). A CNN based real-time eye tracker for web mining applications. Multimedia Tools and Applications, 81(27), 39103-39120. doi:10.1007/s11042-022-13085-7.

Ou, W. L., Kuo, T. L., Chang, C. C., & Fan, C. P. (2021). Deep-learning-based pupil center detection and tracking technology for visible-light wearable gaze tracking devices. Applied Sciences, 11(2), 851. doi:10.3390/app11020851.

Deane, O., Toth, E., & Yeo, S. H. (2023). Deep-SAGA: a deep-learning-based system for automatic gaze annotation from eye-tracking data. Behavior Research Methods, 55(3), 1372-1391. doi:10.3758/s13428-022-01833-4.

Larumbe-Bergera, A., Garde, G., Porta, S., Cabeza, R., & Villanueva, A. (2021). Accurate pupil center detection in off-the-shelf eye tracking systems using convolutional neural networks. Sensors, 21(20), 6847. doi:10.3390/s21206847.

Kurdthongmee, W., Kurdthongmee, P., Suwannarat, K., & Kiplagat, J. K. (2022). A YOLO Detector Providing Fast and Accurate Pupil Center Estimation using Regions Surrounding a Pupil. Emerging Science Journal, 6(5), 985–997. doi:10.28991/ESJ-2022-06-05-05.

Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640–651. doi:10.1109/TPAMI.2016.2572683.

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2015). Semantic image segmentation with deep convolutional nets and fully connected CRFs. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 32133222.

Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11211 LNCS, 833–851. doi:10.1007/978-3-030-01234-2_49.

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 779–788. doi:10.1109/CVPR.2016.91.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9905 LNCS, 21–37. doi:10.1007/978-3-319-46448-0_2.

King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10, 1755–1758.

TensorFlow. (2024). Semantic Segmentation with Deep Learning: A guide to building your own model using TensorFlow. Google Brain Team. Available online: (accessed on March 2024).

Jesorsky, O., Kirchberg, K. J., & Frischholz, R. W. (2001). Robust face detection using the Hausdorff distance. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2091, 90–95. doi:10.1007/3-540-45344-x_14.

Kim, S., Jeong, M., & Ko, B. C. (2020). Energy efficient pupil tracking based on rule distillation of cascade regression forest. Sensors (Switzerland), 20(18), 1–17. doi:10.3390/s20185141.

Lee, K. Il, Jeon, J. H., & Song, B. C. (2020). Deep Learning-Based Pupil Center Detection for Fast and Accurate Eye Tracking System. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12364 LNCS, 36–52. doi:10.1007/978-3-030-58529-7_3.

Cai, H., Liu, B., Ju, Z., Thill, S., Belpaeme, T., Vanderborght, B., & Liu, H. (2019). Accurate eye center localization via hierarchical adaptive convolution. British Machine Vision Conference 2018, BMVC 2018.

Larumbe, A., Cabeza, R., & Villanueva, A. (2018). Supervised Descent Method (SDM) applied to accurate pupil detection in off-the-shelf eye tracking systems. Eye Tracking Research and Applications Symposium (ETRA), 1–8. doi:10.1145/3204493.3204551.

Levinshtein, A., Phung, E., & Aarabi, P. (2018). Hybrid eye center localization using cascaded regression and robust circle fitting. 2017 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2017 - Proceedings, 2018-January, 11–15. doi:10.1109/GlobalSIP.2017.8308594.

Choi, J. H., Il Lee, K., Kim, Y. C., & Cheol Song, B. (2019). Accurate Eye Pupil Localization Using Heterogeneous CNN Models. Proceedings - International Conference on Image Processing, ICIP, 2019-September, 2179–2183. doi:10.1109/ICIP.2019.8803121.

Kitazumi, K., & Nakazawa, A. (2018). Robust Pupil Segmentation and Center Detection from Visible Light Images Using Convolutional Neural Network. Proceedings - 2018 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2018, 862–868. doi:10.1109/SMC.2018.00154.

Full Text: PDF

DOI: 10.28991/HIJ-2024-05-02-016


  • There are currently no refbacks.

Copyright (c) 2024 Wattanapong Kurdthongmee, Uhamard Madardam