Deep Learning-Based Appearance-Based Gaze Estimation
DOI:
https://doi.org/10.54097/9q1c9767Keywords:
Gaze estimation, Gaze point, Convolutional neural network, Gaze estimation dataset, Deep learning, CNN, Transformer, LLMAbstract
Gaze estimation is a technology that primarily predicts gaze position or direction through eyes, facial images, or videos. It is widely used in fields such as gaming, healthcare, intelligent driving, and offline retail. In recent years, the rapid development of deep learning, with its end-to-end capability and robustness, has transformed many computer vision tasks and has been correspondingly applied in the field of gaze estimation. For recent deep learning-based gaze estimation methods, this article first introduces the fundamentals of gaze estimation, then summarizes methods based on CNN, Transformer, hybrid CNN-Transformer designs, and large models. It also presents commonly used mainstream gaze estimation datasets and applications. Finally, it provides an outlook on future trends and challenges in the field of gaze estimation.
Downloads
References
[1] Kwon Y M, Jeon J S, Ki J. 3D gaze estimation and interaction to stereo display[J]. International Journal of Virtual Reality, 2006, 5(3): 41-45.
[2] Wang K,Wang S,Ji Q. Deep eye fixation map learning for calibration-free eye gaze tracking[C]. Charleston: Ninth Biennial ACM Symposium on Eye Tracking Research & Applications (ETRA ’16),2016:47-55.
[3] Kar A,Corcoran P. A review and analysis of eye-gaze estimation systems, algorithms and performance evaluation methods in consumer platforms[J]. IEEE Access, 2017, 5:16495-16519.
[4] LeCun Y,Bottou L,Bengio Y,Haffner P. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[5] Vaswani A,Shazeer N,Parmar N, et al. Attention is all you need[C]. Long Beach: Advances in Neural Information Processing Systems 30 (NIPS 2017),2017:5998-6008.
[6] Howard A G,Zhu M,Chen B,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[EB/OL].(2017-04-17)[2025-10-22]https://doi.org/10.48550/arXiv.1704.04861
[7] Iandola F N,Han S,Moskewicz M W,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[EB/OL].(2016-02-24)[2025-10-22] https://doi.org/10.48550/arXiv.1602.07360
[8] Vora S,Rangesh A,Trivedi M M. Driver gaze zone estimation using convolutional neural networks: A general framework and ablative analysis[J]. IEEE Transactions on Intelligent Vehicles, 2018, 3(3):254-265.
[9] Zhuang Y,Zhang Y,Zhao H. Appearance-based gaze estimation using separable convolution neural networks[C]. Chongqing: 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2021:609-612.
[10] Holla N S,Kushwaha A,Sanchi C, et al. LiAGE: Light-weight Adaptive Gaze Estimation[C]. Bengaluru: Fifteenth Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP 2024),2024:1-8.
[11] Muksimova S,Valikhujaev Y,Umirzakova S, et al. GazeCapsNet: A lightweight gaze estimation framework[J]. Sensors, 2025, 25(4):1224.
[12] Zhang Xucong, Sugano Y, Fritz M, et al. MPIIGaze: Real-world dataset and deep appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(1): 162–175.
[13] Fischer T,Chang H J,Demiris Y. RT-GENE: Real-time eye gaze estimation in natural environments[C]. Munich: European Conference on Computer Vision (ECCV 2018),2018:339-357.
[14] Krafka K,Khosla A,Kellnhofer P, et al. Eye tracking for everyone[C]. Las Vegas: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016:2176-2184.
[15] Huang G,Shi J,Xu J, et al. Gaze estimation by attention-induced hierarchical variational auto-encoder[J]. IEEE Transactions on Cybernetics, 2024, 54(4):2592-2605.
[16] Bao Y,Lu F. From feature to gaze: A generalizable replacement of linear layer for gaze estimation[C]. Seattle: CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024),2024:1409-1418.
[17] Wen Mingqi, Ren Luqian, Chen Zhenqin, et al. Survey on gaze estimation methods based on deep learning [J]. Computer Engineering and Applications.2024,60(12):18-33.
[18] Park S,Spurr A,Hilliges O. Deep pictorial gaze estimation[C]. Munich: European Conference on ComputerVision (ECCV 2018),2018:741-757.
[19] Yu Y,Odobez J-M. Unsupervised representation learning for gaze estimation[C]. Seattle: CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020),2020:7312-7322.
[20] Mahmud Z,Hungler P,Etemad A. Multistream gaze estimation with anatomical eye region isolation by synthetic to real transfer learning[J]. IEEE Transactions on Artificial Intelligence, 2024, 5(8):4232-4246.
[21] Zhang X,Sugano Y,Fritz M, et al. It’s written all over your face: Full-face appearance-based gaze estimation[C]. Honolulu: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2017),2017:2299-2308.
[22] Cheng Y H,Huang S Y,Wang F, et al. A coarse-to-fine adaptive network for appearance-based gaze estimation[C]. New York: AAAI Conference on Artificial Intelligence (AAAI 2020),2020:10623-10630.
[23] Cheng Y H,Bao Y W,Lu F. PureGaze: Purifying gaze feature for generalizable gaze estimation[C]. Vancouver: AAAI Conference on Artificial Intelligence (AAAI 2022),2022:436-443.
[24] Bao Y,Cheng Y,Liu Y,Lu F. Adaptive feature fusion network for gaze tracking in mobile tablets[C]. Milan: International Conference on Pattern Recognition (ICPR 2020),2021:9936-9943.
[25] Wu Xinmei, Li Lin, Zhu Haihong, et al. EG-Net: Appearance-based eye gaze estimation using an efficient gaze network with attention mechanism[J]. Expert Systems with Applications, 2024, 238:122363.
[26] Kawana Y,Shiba S,Kong Q,Kobori N. GA3CE: Unconstrained 3D gaze estimation with gaze-aware 3D context encoding[C]. Nashville: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025),2025:3081-3090.
[27] Li Duantengchuan,Wang Shutong,Zhao Wanli, et al. ADGaze: Anisotropic Gaussian label distribution learning for fine-grained gaze estimation[J]. Pattern Recognition, 2025, 164:111536.
[28] Huang Q,Veeraraghavan A,Sabharwal A. TabletGaze: Dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets[J]. Machine Vision and Applications, 2017, 28(5-6):445-461.
[29] Cheng, Yihua, Hengfei Wang, Zhongqun Zhang, Yang Yue, Bo Eun Kim, Feng Lu and Hyung Jin Chang. “3D Prior is All You Need: Cross-Task Few-shot 2D Gaze Estimation.” 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025): 23891-23900.
[30] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition atscale[EB/OL].(2020-10-22)[2025-10-22]https://arxiv.org/abs/2010.11929
[31] Cheng Y,Lu F. Gaze estimation using transformer[C]. Montreal: International Conference on Pattern Recognition (ICPR 2022),2022:3341-3347.
[32] Ye Lang, Wang Xinggang, Yao Jingfeng, et al. Transgaze: exploring plain vision transformers for gaze estimation[J]. Machine Vision and Applications, 2024, 35,128.
[33] Vuillecard P,Odobez J-M. Enhancing 3D gaze estimation in the wild using weak supervision with gaze following labels[C]. Nashville: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025),2025:13508-13518.
[34] Yahyaabadi R, Nikan S. Efficient 2D/3D Gaze Estimation Using TGGNet: A Transformer Graph Approach[J/OL]. IEEE Transactions on Cognitive and Developmental Systems. [2025-03-28]. https://doi.org/10.1109/TCDS.2025.3600102.
[35] Oh J O,Chang H J,Choi S I. Self-Attention with Convolution and Deconvolution for Efficient Eye Gaze Estimation from a Full Face Image[C]. New Orleans,LA, USA:2022 CVF Conference on Computer Visionand Pattern Recognition Workshops (CVPRW),2022:4988-4996.
[36] Li Yujie, Chen Jiahui, Ma Jiaxin, et al. Gaze Estimation Based on Convolutional Structure and Sliding Window-Based Attention Mechanism[J]. Sensors, 2023, 23(13): 6226.
[37] Zhong Y,Lee S H. GazeSymCAT: A symmetric cross-attention transformer for robust gaze estimation under extreme head poses and gaze variations[J]. Journal of Computational Design and Engineering,2025,12(3):115-129.
[38] Cheng Y,Lu F. DVGaze: Dual-View Gaze Estimation[C]. Paris, France:2023 IEEE/CVF International Conference on Computer Vision (ICCV),2023:20575-20584.
[39] Chen W,Chai Y,Wu X J,et al. Privileged Information-Guided Multitask Mutualistic Transformer for Gaze Prediction[J]. IEEE Transactions on Multimedia, 2025, 27:7353-7368.
[40] Nagpure V, Okuma K. Searching efficient neural architecture with multi-resolution fusion transformer for appearance-based gaze estimation[C]. Waikoloa, HI, USA: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023: 890–899.
[41] Karmi R, Mastouri R, Rahmany I, et al. An Appearance-based VisionTransformer Network for Enhanced Gaze Estimation[J]. Signal, Image and Video Processing, 2025, 19: 742.
[42] CHEN X, CHEN M, CHEN Y, et al. Large Generative Model Impulsed Lightweight Gaze Estimator via Deformable Approximate Large Kernel Pursuit [J]. IEEE Transactions on Image Processing, 2025, 34: 1149-62.
[43] Ryan F,Bati A,Lee S,et al. Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders[C]. Nashville, TN, USA:2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),2025:28874-28884.
[44] Qu H, Wei J, Shu X, et al. OmniGaze: Reward-inspired Generalizable Gaze Estimation In The Wild[EB/OL].(2025-10-15)[2025-10-23]https://arxiv.org/abs/2510.13660
[45] Funes Mora K A, Monay F, Odobez J M. Eyediap: A database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras[C]. Safety Harbor, FL, USA: Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA), 2014: 255-258.
[46] Kellnhofer P, Recasens A, Stent S, et al. Gaze360: Physically Unconstrained Gaze Estimation in the Wild[C]. Seoul, Korea (South): 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019: 6911-6920.
[47] Tonsen M, Steil J, Sugano Y, et al. Invisibleeye: Mobile eye tracking using multiple low-resolution cameras and learning-based gaze estimation[J]. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2017, 1(3): 1-21.
[48] Kim J, Stengel M, Majercik A, et al. Nvgaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation[C]. Glasgow, UK: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 2019: 1-12.
[49] Wu Z, Rajendran S, van As T, Zimmermann J, Badrinarayanan V, Rabinovich A. MagicEyes: A Large Scale Eye Gaze Estimation Dataset for Mixed Reality [EB/OL]. arXiv:2003.08806, 2020.
[50] Sugano Y, Matsushita Y, Sato Y. Learning-by-Synthesis for Appearance-based 3D Gaze Estimation[C]. Columbus, OH, USA: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014: 1821-1828.
[51] Wood E, Baltrušaitis T, Morency L P, et al. Learning an Appearance-Based Gaze Estimator from One Million Synthesised Images[C]. Charleston, SC, USA: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications (ETRA), 2016: 131-138.
[52] Zhang Xucong, Park S, Beeler T, et al. ETH-XGaze: A Large Scale Dataset for Gaze Estimation Under Extreme Head Pose and Gaze Variation[C]. Glasgow, UK: European Conference on Computer Vision (ECCV 2020), 2020: 365-381.
[53] Yan Zihan, Wu Yue, Shan Yifei, et al. A dataset of eye gaze images for calibration-free eye tracking augmented reality headset[J]. Scientific Data, 2022, 9: 115.
[54] Smith B A, Yin Qi, Feiner S, et al. Gaze locking: passive eye contact detection for human-object interaction[C]. St Andrews, UK: The 26th Annual ACM Symposium on User Interface Software and Technology (UIST 2013), 2013: 271-280.
[55] Porta S, Bossavit B, Cabeza R, et al. U2Eyes: A Binocular Dataset for Eye Tracking and Gaze Estimation[C]. Seoul, Korea (South): 2019 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2019: 3660-3664.
[56] Chen Z K, Bertram E S. Towards high performance low complexity calibration in appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 1174-1188.
[57] Park S, Aksan E, Zhang X, et al. Towards end-to-end video-based eye-tracking[C]. Glasgow, UK: European Conference on Computer Vision (ECCV 2020), 2020: 747-763.
[58] Chen Zhaokang, Shi Bertram E. Appearance-Based Gaze Estimation Using Dilated-Convolutions[C]//Computer Vision – ACCV 2018. pp. 309–324.
[59] Palmero C, Selva J, Bagheri M A, et al. Recurrent CNN for 3D gaze estimation using appearance and shape cues[C]. Newcastle upon Tyne, UK: British Machine Vision Conference (BMVC), 2018: 1–11.
[60] Fang Yi, Tang Jiapeng, Shen Wang, et al. Dual Attention Guided Gaze Target Detection in the Wild[C]. Nashville, TN, USA: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021: 11390–11399.
[61] Ramirez Gomez A, Lankes M. Eyesthetics: making sense of the aesthetics of playing with gaze[J]. Proceedings of the ACM on Human-Computer Interaction, 2021, 5(CHI PLAY): 1–24.
[62] Yang L, Zhang W, Li P, et al. The aiming advantages in experienced first-person shooter gamers: Evidence from eye movement patterns[J]. Computers in Human Behavior, 2025, 165: 108573.
[63] Acartürk C, Fal M, Çakır M P. User performance and engagement in multi-user gaming environments: An experimental investigation through the group eye tracking (GET) paradigm[J]. Entertainment Computing, 2024, 51: 100714.
[64] Nikan S, Upadhyay D. Appearance-based gaze estimation for driver monitoring[C]. New Orleans: NeurIPS 2022 Workshop on Gaze Meets ML, 2022: 1–13.
[65] Kazemi M, Rezaei M, Azarmi M. Evaluating driver readiness in conditionally automated vehicles from eye-tracking data and head pose[J]. IET Intelligent Transport Systems, 2025, 19(1): e70006.
[66] Shah SM, Gan Zengkang, Sun Zhaoyun, et al. AI-enabled driver assistance: monitoring head and gaze movements for enhanced safety[J]. Complex & Intelligent Systems, 2025, 11: 297.
[67] Ahuja K, Bose A, Jain M, et al. Gaze-based screening of autistic traits for adolescents and young adultsusing prosaic videos[C]. Guayaquil: ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS ’20), 2020: 324.
[68] Varma M, Washington P, Chrisman B, et al. Identification of Social Engagement Indicators Associated With Autism Spectrum Disorder Using a Game-Based Mobile App: Comparative Study of Gaze Fixation and Visual Scanning Methods[J]. Journal of Medical Internet Research, 2022, 24(2): e31830.
[69] Antolí A, Rodríguez-Lozano FJ, Juan Cañas J, et al. Using explainable machine learning and eye-tracking for diagnosing autism spectrum and developmental language disorders in social attention tasks[J]. Frontiers in Neuroscience, 2025, 19: 1558621.
[70] Ngo T D, Kieu H D, Nguyen M H, et al. An EEG & eye-tracking dataset of ALS patients & healthy people during eye-tracking-based spelling system usage [J]. Scientific Data, 2024, 11(1).
[71] Tupikovskaja-Omovie Z, Tyler D. Eye tracking technology to audit google analytics: Analysing digital consumer shopping journey in fashion m-retail [J]. International Journal of Information Management, 2021, 59: 102294.
[72] Nordfält J, Ahlbom C-P. Utilising eye-tracking data in retailing field research: A practical guide [J]. Journal of Retailing, 2024, 100(1): 148-60.
[73] Chen L, Jing K, Mei Y. The effect of consumption goals on review helpfulness: Behavioral and eye-tracking research [J]. Journal of Retailing and Consumer Services, 2024, 76.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








