ZeroFlow: Scalable Scene Flow via Distillation (2305.10424v8)
Abstract: Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds to process full-size point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feedforward methods are considerably faster, running on the order of tens to hundreds of milliseconds for full-size point clouds, but require expensive human supervision. To address both limitations, we propose Scene Flow via Distillation, a simple, scalable distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feedforward model. Our instantiation of this framework, ZeroFlow, achieves state-of-the-art performance on the Argoverse 2 Self-Supervised Scene Flow Challenge while using zero human labels by simply training on large-scale, diverse unlabeled data. At test-time, ZeroFlow is over 1000x faster than label-free state-of-the-art optimization-based methods on full-size point clouds (34 FPS vs 0.028 FPS) and over 1000x cheaper to train on unlabeled data compared to the cost of human annotation (\$394 vs ~\$750,000). To facilitate further research, we release our code, trained model weights, and high quality pseudo-labels for the Argoverse 2 and Waymo Open datasets at https://vedder.io/zeroflow.html
- RMS-FlowNet: Efficient and Robust Multi-Scale Scene Flow Estimation for Large-Scale Point Clouds. In Int. Conf. Rob. Aut., pp. 883–889. IEEE, 2022.
- SLIM: Self-supervised LiDAR scene flow and motion segmentation. In Int. Conf. Comput. Vis., pp. 13126–13136, 2021.
- Pointflownet: Learning representations for rigid motion estimation from point clouds. In Int. Conf. Comput. Vis., pp. 7962–7971, 2019.
- Michael Black. Novelty in science: A guide to reviewers. https://medium.com/@black_51980/novelty-in-science-8f1fd1a0a143, 2022.
- On the opportunities and risks of foundation models. ArXiv, 2021.
- Language Models are Few-Shot Learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1877–1901, 2020.
- Object modelling by registration of multiple range images. Img. Vis. Comput., 10(3):145–155, 1992.
- Re-Evaluating LiDAR Scene Flow for Autonomous Driving. arXiv preprint, 2023.
- Rigid scene flow for 3d lidar scans. In Int. Conf. Intel. Rob. Sys., pp. 1765–1770. IEEE, 2016.
- Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 12776–12785, 2022.
- Smooth shells: Multi-scale shape registration with functional maps. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 12265–12274, 2020.
- 3D Object Detection with a Self-supervised Lidar Scene Flow Backbone. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (eds.), Computer Vision – ECCV 2022, pp. 247–265, Cham, 2022. Springer Nature Switzerland.
- Weakly supervised learning of rigid 3d scene flow. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 5692–5703, 2021.
- Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 3254–3263, 2019.
- Dynamic 3D Scene Analysis by Point Cloud Accumulation. In European Conference on Computer Vision, ECCV, 2022.
- Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 7233–7243, 2022.
- Scalable Scene Flow From Point Clouds in the Real World. IEEE Robotics and Automation Letters, 12 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Segment Anything. arXiv:2304.02643, 2023.
- Flowstep3d: Model unrolling for self-supervised scene flow estimation. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 4114–4123, 2021.
- PointPillars: Fast Encoders for Object Detection From Point Clouds. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12689–12697, 2019.
- Towards streaming perception. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 473–488. Springer, 2020.
- HCRF-Flow: Scene flow from point clouds with continuous high-order CRFs and position-aware flow embedding. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 364–373, 2021a.
- RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 16959–16968, 2022.
- Neural Scene Flow Prior. Advances in Neural Information Processing Systems, 34, 2021b.
- FlowNet3D: Learning Scene Flow in 3D Point Clouds. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Application of Laser Systems for Detection and Ranging in the Modern Road Transportation and Maritime Sector. Sensors, 22(16), 2022. ISSN 1424-8220.
- VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training. arXiv preprint arXiv:2210.00030, 2022.
- LIV: Language-Image Representations and Rewards for Robotic Control. arXiv preprint arXiv:2306.00958, 2023.
- A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Just Go With the Flow: Self-Supervised Scene Flow Estimation. In IEEE Conf. Comput. Vis. Pattern Recog., June 2020.
- Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving. European Conference on Computer Vision (ECCV), 2022.
- OpenAI. Gpt-4 technical report, 2023.
- An empirical analysis of range for 3d object detection. arXiv preprint arXiv:2308.04054, 2023.
- Scene flow from point clouds with or without learning. In Int. Conf. 3D Vis., pp. 261–270. IEEE, 2020.
- Flot: Scene flow on point clouds guided by optimal transport. In Eur. Conf. Comput. Vis., pp. 527–544. Springer, 2020.
- Improving language understanding by generative pre-training. 2018.
- R3M: A Universal Visual Representation for Robot Manipulation. Conference on Robot Learning (CoRL) 2022, 03 2022.
- Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Self-supervised learning of non-rigid residual flow and ego-motion. In Int. Conf. 3D Vis., pp. 150–159. IEEE, 2020.
- Sparse PointPillars: Maintaining and Exploiting Input Sparsity to Improve Runtime on Embedded Systems. International Conference on Intelligent Robots and Systems (IROS), 2022.
- Velodyne Lidar Alpha Prime. Velodyne Lidar, 11 2019.
- Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv preprint arXiv: Arxiv-2305.16291, 2023.
- PointMotionNet: Point-Wise Motion Learning for Large-Scale LiDAR Point Clouds Sequences. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4418–4427, 2022.
- Inverting the pose forecasting pipeline with spf2: Sequential pointcloud forecasting for sequential pose forecasting. In Conference on robot learning, pp. 11–20. PMLR, 2021.
- Argoverse 2: Next Generation Datasets for Self-driving Perception and Forecasting. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021), 2021.
- Pointpwc-net: Cost volume on point clouds for (self-) supervised scene flow estimation. In Eur. Conf. Comput. Vis., pp. 88–107. Springer, 2020.
- FlowMOT: 3D Multi-Object Tracking by Scene Flow Association. ArXiv, abs/2012.07541, 2020.
- PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking. In ICCV, 2023.