Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices (2403.10425v1)

Published 15 Mar 2024 in cs.CV, cs.AI, and cs.RO

Abstract: Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow methods have achieved high accuracy, they often come with heavy computation costs. In this paper, we propose a highly efficient optical flow architecture, called NeuFlow, that addresses both high accuracy and computational cost concerns. The architecture follows a global-to-local scheme. Given the features of the input images extracted at different spatial resolutions, global matching is employed to estimate an initial optical flow on the 1/16 resolution, capturing large displacement, which is then refined on the 1/8 resolution with lightweight CNN layers for better accuracy. We evaluate our approach on Jetson Orin Nano and RTX 2080 to demonstrate efficiency improvements across different computing platforms. We achieve a notable 10x-80x speedup compared to several state-of-the-art methods, while maintaining comparable accuracy. Our approach achieves around 30 FPS on edge computing platforms, which represents a significant breakthrough in deploying complex computer vision tasks such as SLAM on small robots like drones. The full training and evaluation code is available at https://github.com/neufieldrobotics/NeuFlow.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. D. Fortun, P. Bouthemy, and C. Kervrann, “Optical flow modeling and computation: A survey,” Computer Vision and Image Understanding, vol. 134, pp. 1–21, 2015.
  2. J. Shin, S. Kim, S. Kang, S.-W. Lee, J. Paik, B. Abidi, and M. Abidi, “Optical flow-based real-time object tracking using non-prior training active feature model,” Real-time imaging, vol. 11, no. 3, pp. 204–218, 2005.
  3. K. Kale, S. Pawar, and P. Dhulekar, “Moving object tracking using optical flow and motion vector estimation,” in 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO)(trends and future directions).   IEEE, 2015, pp. 1–6.
  4. J. K. Aggarwal and Q. Cai, “Human motion analysis: A review,” Computer vision and image understanding, vol. 73, no. 3, pp. 428–440, 1999.
  5. I. Saleemi, L. Hartung, and M. Shah, “Scene understanding by statistical modeling of motion patterns,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.   IEEE, 2010, pp. 2069–2076.
  6. Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” Advances in neural information processing systems, vol. 34, pp. 16 558–16 569, 2021.
  7. P. Muller and A. Savakis, “Flowdometry: An optical flow and deep learning based approach to visual odometry,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).   IEEE, 2017, pp. 624–631.
  8. B. K. Horn and B. G. Schunck, “Determining optical flow,” Artificial intelligence, vol. 17, no. 1-3, pp. 185–203, 1981.
  9. J. J. Gibson, “The perception of the visual world.” 1950.
  10. ——, “The senses considered as perceptual systems.” 1966.
  11. Y. Liu and J. Miura, “Rdmo-slam: Real-time visual slam for dynamic environments using semantic label prediction with optical flow,” Ieee Access, vol. 9, pp. 106 981–106 997, 2021.
  12. T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger, “Elasticfusion: Real-time dense slam and light source estimation,” The International Journal of Robotics Research, vol. 35, no. 14, pp. 1697–1716, 2016.
  13. A. Behl, O. Hosseini Jafari, S. Karthik Mustikovela, H. Abu Alhaija, C. Rother, and A. Geiger, “Bounding boxes, segmentations and object coordinates: How important is recognition for 3d scene flow estimation in autonomous driving scenarios?” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2574–2583.
  14. P. Jain, J. Manweiler, and R. Roy Choudhury, “Overlay: Practical mobile augmented reality,” in Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, 2015, pp. 331–344.
  15. H. Chao, Y. Gu, and M. Napolitano, “A survey of optical flow techniques for robotics navigation applications,” Journal of Intelligent & Robotic Systems, vol. 73, pp. 361–372, 2014.
  16. A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2758–2766.
  17. E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2462–2470.
  18. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16.   Springer, 2020, pp. 402–419.
  19. H. Xu, J. Zhang, J. Cai, H. Rezatofighi, and D. Tao, “Gmflow: Learning optical flow via global matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8121–8130.
  20. Z. Huang, X. Shi, C. Zhang, Q. Wang, K. C. Cheung, H. Qin, J. Dai, and H. Li, “Flowformer: A transformer architecture for optical flow,” in European conference on computer vision.   Springer, 2022, pp. 668–685.
  21. B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in IJCAI’81: 7th international joint conference on Artificial intelligence, vol. 2, 1981, pp. 674–679.
  22. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
  23. C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, “Sift flow: Dense correspondence across different scenes,” in Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III 10.   Springer, 2008, pp. 28–42.
  24. S. Jiang, D. Campbell, Y. Lu, H. Li, and R. Hartley, “Learning to estimate hidden motions with global motion aggregation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9772–9781.
  25. X. Sui, S. Li, X. Geng, Y. Wu, X. Xu, Y. Liu, R. Goh, and H. Zhu, “Craft: Cross-attentional flow transformer for robust optical flow,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2022, pp. 17 602–17 611.
  26. X. Shi, Z. Huang, D. Li, M. Zhang, K. C. Cheung, S. See, H. Qin, J. Dai, and H. Li, “Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1599–1610.
  27. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  28. Z. Tu, W. Xie, D. Zhang, R. Poppe, R. C. Veltkamp, B. Li, and J. Yuan, “A survey of variational and cnn-based optical flow techniques,” Signal Processing: Image Communication, vol. 72, pp. 9–24, 2019.
  29. A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4161–4170.
  30. D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943.
  31. T.-W. Hui, X. Tang, and C. C. Loy, “Liteflownet: A lightweight convolutional neural network for optical flow estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8981–8989.
  32. ——, “A lightweight optical flow cnn—revisiting data fidelity and regularization,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 8, pp. 2555–2569, 2020.
  33. T.-W. Hui and C. C. Loy, “Liteflownet3: Resolving correspondence ambiguity for more accurate optical flow estimation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16.   Springer, 2020, pp. 169–184.
  34. G. Yang and D. Ramanan, “Volumetric correspondence networks for optical flow,” Advances in neural information processing systems, vol. 32, 2019.
  35. J. Xu, R. Ranftl, and V. Koltun, “Accurate optical flow via direct cost volume processing,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1289–1297.
  36. H. Jiang and E. Learned-Miller, “Dcvnet: Dilated cost volume networks for fast optical flow,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 5150–5157.
  37. H. Xu, J. Yang, J. Cai, J. Zhang, and X. Tong, “High-resolution optical flow from 1d attention and correlation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 498–10 507.
  38. S. Jiang, Y. Lu, H. Li, and R. Hartley, “Learning optical flow from a few matches,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 16 592–16 600.
  39. F. Zhang, O. J. Woodford, V. A. Prisacariu, and P. H. Torr, “Separable flow: Learning motion cost volumes for optical flow estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 807–10 817.
  40. M. Hofinger, S. R. Bulò, L. Porzi, A. Knapitsch, T. Pock, and P. Kontschieder, “Improving optical flow on a pyramid level,” in European Conference on Computer Vision.   Springer, 2020, pp. 770–786.
  41. J. Hur and S. Roth, “Iterative residual refinement for joint optical flow and occlusion estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5754–5763.
  42. J. Wang, Y. Zhong, Y. Dai, K. Zhang, P. Ji, and H. Li, “Displacement-invariant matching cost learning for accurate optical flow estimation,” Advances in Neural Information Processing Systems, vol. 33, pp. 15 220–15 231, 2020.
  43. P. Truong, M. Danelljan, and R. Timofte, “Glu-net: Global-local universal network for dense flow and correspondences,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 6258–6268.
  44. N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4040–4048.
  45. T. Dao, D. Fu, S. Ermon, A. Rudra, and C. Ré, “Flashattention: Fast and memory-efficient exact attention with io-awareness,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 344–16 359, 2022.
  46. D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black, “A naturalistic open source movie for optical flow evaluation,” in Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12.   Springer, 2012, pp. 611–625.
  47. M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3061–3070.
  48. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  49. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
  50. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1314–1324.
  51. X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848–6856.
Citations (4)

Summary

  • The paper introduces NeuFlow with a global-to-local optical flow framework that balances real-time performance and high accuracy.
  • NeuFlow employs hierarchical feature extraction using cross-attention and local refinement, achieving a 10×–80× speedup over leading methods.
  • The method supports critical robotics applications like SLAM, object tracking, and visual odometry on resource-constrained edge devices.

NeuFlow: Achieving Efficient Real-Time Optical Flow on Edge Devices

The paper "NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices" introduces an innovative optical flow architecture designed to operate efficiently on edge computing platforms. The authors focus on optimizing the balance between computational load and accuracy to support applications such as SLAM, object tracking, and visual odometry in robotics. Herein, I provide a structured overview of the methods, results, and implications presented in this research.

Architectural Design

NeuFlow is constructed upon a global-to-local optical flow estimation framework. It leverages a hierarchical image feature extraction strategy combining shallow CNN backbones and operations at different scales of resolution. The essence of NeuFlow is to initially perform a global cross-attention at a coarse resolution (1/16) to handle large displacements effectively. This is followed by a self-attention mechanism for resolving ambiguities, and a subsequent local refinement at a finer resolution (1/8) using convolutional layers. The architecture culminates with a convex upsampling process to achieve full-resolution optical flow.

Performance Analysis

The evaluation of NeuFlow shows a significant computational advantage over prominent optical flow methods including RAFT, GMA, GMFlow, and FlowFormer. Empirical results demonstrate that NeuFlow achieves a speedup ranging from 10× to 80× compared to these state-of-the-art methods, while maintaining comparable accuracy. For instance, NeuFlow achieves around 30 FPS on a Jetson Orin Nano for typical image sizes (e.g., 512×384), underscoring its practicality for real-time robotic applications.

Across the FlyingThings and Sintel datasets, NeuFlow provides results comparable to the latest techniques but with significantly reduced computational times, thereby underscoring its efficiency, especially in large displacement scenarios. The comparisons favor NeuFlow as a superior choice when both speed and accuracy are critical, particularly on resource-constrained platforms like the Jetson Orin Nano.

Implications and Future Directions

The contributions of NeuFlow substantiate a key advancement towards deploying optical flow algorithms on edge devices. This research opens the door for small robotic systems, such as drones, to employ sophisticated visual processing algorithms hitherto constrained to more powerful hardware setups. The open-source release of NeuFlow further encourages community involvement in expanding its use and exploring new applications.

Future work can explore several avenues, including enhancing accuracy through iterative refinement or expanding the architecture’s capacity (e.g., deeper feature extraction networks, more cross-attention layers). On the efficiency front, there is potential to incorporate lightweight network models like MobileNets or utilize pruning and quantization techniques for further optimization. Such developments could broaden NeuFlow's applicability, potentially transforming efficiencies in real-time perception tasks on edge computing devices.

In conclusion, NeuFlow occupies a pivotal space by addressing the trade-off between computational efficiency and optical flow accuracy. This work not only enriches the optical flow estimation domain but also propels practical implementations in the robotics and computer vision fields, emphasizing both academic and industrial growth opportunities.

X Twitter Logo Streamline Icon: https://streamlinehq.com