Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffGaze: A Diffusion Model for Continuous Gaze Sequence Generation on 360° Images (2403.17477v1)

Published 26 Mar 2024 in cs.CV and cs.HC

Abstract: We present DiffGaze, a novel method for generating realistic and diverse continuous human gaze sequences on 360{\deg} images based on a conditional score-based denoising diffusion model. Generating human gaze on 360{\deg} images is important for various human-computer interaction and computer graphics applications, e.g. for creating large-scale eye tracking datasets or for realistic animation of virtual humans. However, existing methods are limited to predicting discrete fixation sequences or aggregated saliency maps, thereby neglecting crucial parts of natural gaze behaviour. Our method uses features extracted from 360{\deg} images as condition and uses two transformers to model the temporal and spatial dependencies of continuous human gaze. We evaluate DiffGaze on two 360{\deg} image benchmarks for gaze sequence generation as well as scanpath prediction and saliency prediction. Our evaluations show that DiffGaze outperforms state-of-the-art methods on all tasks on both benchmarks. We also report a 21-participant user study showing that our method generates gaze sequences that are indistinguishable from real human sequences.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. V. Sitzmann, A. Serrano, A. Pavel, M. Agrawala, D. Gutierrez, B. Masia, and G. Wetzstein, “Saliency in vr: How do people explore virtual environments?” IEEE transactions on visualization and computer graphics, vol. 24, no. 4, pp. 1633–1642, 2018.
  2. L. Sidenmark and H. Gellersen, “Eye, head and torso coordination during gaze shifts in virtual reality,” ACM Transactions on Computer-Human Interaction (TOCHI), vol. 27, no. 1, pp. 1–40, 2019.
  3. R. Rivu, V. Mäkelä, M. Hassib, Y. Abdelrahman, and F. Alt, “Exploring how saliency affects attention in virtual reality,” in IFIP Conference on Human-Computer Interaction.   Springer, 2021, pp. 147–155.
  4. M. Kassner, W. Patera, and A. Bulling, “Pupil: an open source platform for pervasive eye tracking and mobile gaze-based interaction,” in Adj. Proc. ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), 2014, pp. 1151–1160.
  5. M. Tonsen, J. Steil, Y. Sugano, and A. Bulling, “Invisibleeye: Mobile eye tracking using multiple low-resolution cameras and learning-based gaze estimation,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 1, no. 3, pp. 1–21, 2017.
  6. D. Chen, C. Qing, X. Xu, and H. Zhu, “Salbinet360: Saliency prediction on 360 images with local-global bifurcated deep network,” in Proceedings of the 2020 IEEE Conference on Virtual Reality and 3D User Interfaces.   IEEE, 2020, pp. 92–100.
  7. M. Assens Reina, X. Giro-i Nieto, K. McGuinness, and N. E. O’Connor, “Saltinet: Scan-path prediction on 360 degree images using saliency volumes,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 2331–2338.
  8. D. Martin, A. Serrano, A. W. Bergman, G. Wetzstein, and B. Masia, “Scangan360: A generative model of realistic scanpaths for 360 images,” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 5, pp. 2003–2013, 2022.
  9. X. Sui, Y. Fang, H. Zhu, S. Wang, and Z. Wang, “Scandmm: A deep markov model of scanpath prediction for 360deg images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6989–6999.
  10. Y. Tashiro, J. Song, Y. Song, and S. Ermon, “Csdi: Conditional score-based diffusion models for probabilistic time series imputation,” Advances in Neural Information Processing Systems, vol. 34, pp. 24 804–24 816, 2021.
  11. Y. Rai, J. Gutiérrez, and P. Le Callet, “A dataset of head and eye movements for 360 degree images,” in Proceedings of the 8th ACM on Multimedia Systems Conference, 2017, pp. 205–210.
  12. Y. Rai, P. Le Callet, and P. Guillotel, “Which saliency weighting for omni directional image quality assessment?” in 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX).   IEEE, 2017, pp. 1–6.
  13. L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998.
  14. M.-M. Cheng, N. J. Mitra, X. Huang, P. H. Torr, and S.-M. Hu, “Global contrast based salient region detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 569–582, 2015.
  15. A. Borji, D. N. Sihite, and L. Itti, “Probabilistic learning of task-specific visual attention,” in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2012, pp. 470–477.
  16. P. Xu, Y. Sugano, and A. Bulling, “Spatio-temporal modeling and prediction of visual attention in graphical user interfaces,” in Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2016, pp. 3299–3310.
  17. H. Liu, D. Xu, Q. Huang, W. Li, M. Xu, and S. Lin, “Semantically-based human scanpath estimation with hmms,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 3232–3239.
  18. W. Bao and Z. Chen, “Human scanpath prediction based on deep convolutional saccadic model,” Neurocomputing, vol. 404, pp. 154–164, 2020.
  19. S. P. Lee, J. B. Badler, and N. I. Badler, “Eyes alive,” in Proceedings of the 29th annual conference on Computer graphics and interactive techniques, 2002, pp. 637–644.
  20. G. Lan, T. Scargill, and M. Gorlatova, “Eyesyn: Psychology-inspired eye movement synthesis for gaze-based activity recognition,” in 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).   IEEE, 2022, pp. 233–246.
  21. Z. Hu, C. Zhang, S. Li, G. Wang, and D. Manocha, “Sgaze: a data-driven eye-head coordination model for realtime gaze prediction,” IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 5, pp. 2002–2010, 2019.
  22. Z. Hu, S. Li, C. Zhang, K. Yi, G. Wang, and D. Manocha, “Dgaze: Cnn-based gaze prediction in dynamic scenes,” IEEE Transactions on Visualization and Computer Graphics, vol. 26, no. 5, pp. 1902–1911, 2020.
  23. Z. Hu, A. Bulling, S. Li, and G. Wang, “Fixationnet: forecasting eye fixations in task-oriented virtual environments,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 5, pp. 2681–2690, 2021.
  24. Z. Hu, “Eye fixation forecasting in task-oriented virtual reality,” in Proceedings of the 2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops.   IEEE, 2021, pp. 707–708.
  25. G. A. Koulieris, G. Drettakis, D. Cunningham, and K. Mania, “Gaze prediction using machine learning for dynamic stereo manipulation in games,” in Proceedings of the 2016 IEEE Virtual Reality.   IEEE, 2016, pp. 113–120.
  26. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
  27. B. Coors, A. P. Condurache, and A. Geiger, “Spherenet: Learning spherical representations for detection and classification in omnidirectional images,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 518–533.
  28. R. Liu, J. Lehman, P. Molino, F. Petroski Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” Advances in neural information processing systems, vol. 31, 2018.
  29. Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “Diffwave: A versatile diffusion model for audio synthesis,” arXiv preprint arXiv:2009.09761, 2020.
  30. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  31. S. Zuo, H. Jiang, Z. Li, T. Zhao, and H. Zha, “Transformer hawkes process,” in International conference on machine learning.   PMLR, 2020, pp. 11 692–11 702.
  32. J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020.
  33. A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8162–8171.
  34. C. Jiao, Z. Hu, M. Bâce, and A. Bulling, “Supreyes: Super resolution for eyes using implicit neural representation learning,” in Proc. ACM Symposium on User Interface Software and Technology (UIST), 2023, pp. 1–13.
  35. Y. Wang, M. Bâce, and A. Bulling, “Scanpath prediction on information visualisations,” IEEE Transactions on Visualization and Computer Graphics (TVCG), pp. 1–15, 2023.
  36. C. Xia, J. Han, F. Qi, and G. Shi, “Predicting human saccadic scanpaths based on iterative representation learning,” IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3502–3515, 2019.
  37. R. Engbert, L. O. Rothkegel, D. Backhaus, and H. A. Trukenbrod, “Evaluation of velocity-based saccade detection in the smi-etg 2w system,” Technical report, Allgemeine und Biologische Psychologie, Uni-versität Potsdam, March, 2016.
  38. M. Kümmerer, M. Bethge, and T. S. Wallis, “Deepgaze iii: Modeling free-viewing human scanpaths with deep learning,” Journal of Vision, vol. 22, no. 5, pp. 7–7, 2022.
  39. G. Boccignone, V. Cuculo, and A. D’Amelio, “How to look next? a data-driven approach for scanpath prediction,” in Formal Methods. FM 2019 International Workshops: Porto, Portugal, October 7–11, 2019, Revised Selected Papers, Part I 3.   Springer, 2020, pp. 131–145.
  40. T. Judd, K. Ehinger, F. Durand, and A. Torralba, “Learning to predict where humans look,” in IEEE International Conference on Computer Vision (ICCV), 2009.
  41. P. Lebreton and A. Raake, “Gbvs360, bms360, prosal: Extending existing saliency prediction models from 2d to omnidirectional images,” Signal Processing: Image Communication, vol. 69, pp. 69–78, 2018.
  42. R. Monroy, S. Lutz, T. Chalasani, and A. Smolic, “Salnet360: Saliency maps for omni-directional images with cnn,” Signal Processing: Image Communication, vol. 69, pp. 26–34, 2018.
  43. F.-Y. Chao, L. Zhang, W. Hamidouche, and O. Deforges, “Salgan360: Visual saliency prediction on 360 degree images with generative adversarial networks,” in 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).   IEEE, 2018, pp. 01–04.
  44. B. H. Le, X. Ma, and Z. Deng, “Live speech driven head-and-eye motion generators,” IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 11, pp. 1902–1914, 2012.
  45. R. Zemblys, D. C. Niehorster, and K. Holmqvist, “gazeNet: End-to-end eye-movement event detection with deep neural networks,” Behavior research methods, vol. 51, pp. 840–864, 2019.
  46. G. Lan, T. Scargill, and M. Gorlatova, “EyeSyn: Psychology-inspired eye movement synthesis for gaze-based activity recognition,” in Proceedings of the ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), 2022.
  47. Y. Chen, Z. Yang, S. Ahn, D. Samaras, M. Hoai, and G. Zelinsky, “Coco-search18 fixation dataset for predicting goal-directed attention control,” Scientific reports, vol. 11, no. 1, pp. 1–11, 2021.
  48. Z. Yang, L. Huang, Y. Chen, Z. Wei, S. Ahn, G. Zelinsky, D. Samaras, and M. Hoai, “Predicting goal-directed human attention using inverse reinforcement learning,” in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  49. S. Chen, M. Jiang, J. Yang, and Q. Zhao, “Air: Attention with reasoning capability,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16.   Springer, 2020, pp. 91–107.
  50. A. Borji and L. Itti, “Cat2000: A large scale fixation dataset for boosting saliency research,” arXiv preprint arXiv:1505.03581, 2015.
  51. M. Nyström, D. C. Niehorster, R. Andersson, and I. Hooge, “The tobii pro spectrum: A useful tool for studying microsaccades?” Behavior Research Methods, vol. 53, pp. 335–353, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chuhan Jiao (3 papers)
  2. Yao Wang (331 papers)
  3. Guanhua Zhang (24 papers)
  4. Mihai Bâce (13 papers)
  5. Zhiming Hu (15 papers)
  6. Andreas Bulling (81 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets