Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Efficacy of 3D Point Cloud Reinforcement Learning (2306.06799v1)

Published 11 Jun 2023 in cs.RO, cs.AI, and cs.LG

Abstract: Recent studies on visual reinforcement learning (visual RL) have explored the use of 3D visual representations. However, none of these work has systematically compared the efficacy of 3D representations with 2D representations across different tasks, nor have they analyzed 3D representations from the perspective of agent-object / object-object relationship reasoning. In this work, we seek answers to the question of when and how do 3D neural networks that learn features in the 3D-native space provide a beneficial inductive bias for visual RL. We specifically focus on 3D point clouds, one of the most common forms of 3D representations. We systematically investigate design choices for 3D point cloud RL, leading to the development of a robust algorithm for various robotic manipulation and control tasks. Furthermore, through comparisons between 2D image vs 3D point cloud RL methods on both minimalist synthetic tasks and complex robotic manipulation tasks, we find that 3D point cloud RL can significantly outperform the 2D counterpart when agent-object / object-object relationship encoding is a key factor.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
  2. A system for general in-hand object re-orientation. In Conference on Robot Learning, 2021.
  3. A system for general in-hand object re-orientation. In Conference on Robot Learning, pages 297–307. PMLR, 2022.
  4. 4d spatio-temporal convnets: Minkowski convolutional neural networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3070–3079, 2019.
  5. Robonet: Large-scale multi-robot learning. ArXiv, abs/1910.11215, 2019.
  6. Flowbot3d: Learning 3d articulation flow to manipulate articulated objects. ArXiv, abs/2205.04382, 2022.
  7. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  8. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
  9. Learning latent dynamics for planning from pixels. In International conference on machine learning, pages 2555–2565. PMLR, 2019.
  10. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  11. Stabilizing deep q-learning with convnets and vision transformers under data augmentation. Advances in Neural Information Processing Systems, 34:3680–3693, 2021.
  12. Temporal difference learning for model predictive control. arXiv preprint arXiv:2203.04955, 2022.
  13. Ffb6d: A full flow bidirectional fusion network for 6d pose estimation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3002–3012, 2021.
  14. Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11632–11641, 2020.
  15. Generalization in dexterous manipulation via geometry-aware multi-task learning. arXiv preprint arXiv:2111.03062, 2021.
  16. Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13739–13748, June 2022.
  17. Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, 2022.
  18. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. ArXiv, abs/1806.10293, 2018.
  19. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
  20. Reinforcement learning with augmented data. Advances in neural information processing systems, 33:19884–19895, 2020.
  21. Frame mining: a free lunch for learning robotic manipulation from 3d point clouds. In 6th Annual Conference on Robot Learning, 2022.
  22. Active domain randomization. In Conference on Robot Learning, 2019.
  23. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  24. O2O-Afford: Annotation-free large-scale object-object affordance learning. In Conference on Robot Learning (CoRL), 2021.
  25. Maniskill: Generalizable manipulation skill benchmark with large-scale demonstrations. arXiv preprint arXiv:2107.14483, 2021.
  26. R3m: A universal visual representation for robot manipulation. ArXiv, abs/2203.12601, 2022.
  27. The unsurprising effectiveness of pre-trained vision models for control. arXiv preprint arXiv:2203.03580, 2022.
  28. Sim-to-real transfer of robotic control with dynamics randomization. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–8, 2017.
  29. Asymmetric actor critic for image-based robot learning. ArXiv, abs/1710.06542, 2017.
  30. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
  31. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
  32. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  33. Planning to explore via self-supervised world models. ArXiv, abs/2005.05960, 2020.
  34. Perceiver-actor: A multi-task transformer for robotic manipulation. ArXiv, abs/2209.05451, 2022.
  35. Neural descriptor fields: Se(3)-equivariant object representations for manipulation. 2022 International Conference on Robotics and Automation (ICRA), pages 6394–6400, 2021.
  36. The optimal control of partially observable markov processes over a finite horizon. Operations research, 21(5):1071–1088, 1973.
  37. Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136, 2020.
  38. Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13438–13444, 2021.
  39. Searching efficient 3d architectures with sparse point-voxel convolution. In European Conference on Computer Vision, 2020.
  40. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
  41. Kpconv: Flexible and deformable convolution for point clouds. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6410–6419, 2019.
  42. Domain randomization for transferring deep neural networks from simulation to the real world. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 23–30, 2017.
  43. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
  44. Softgroup for 3d instance segmentation on point clouds. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2698–2707, 2022.
  45. Densefusion: 6d object pose estimation by iterative dense fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3343–3352, 2019.
  46. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8437–8445, 2018.
  47. AdaAfford: Learning to adapt manipulation affordance for 3d articulated objects via few-shot interactions. European conference on computer vision (ECCV 2022), 2022.
  48. Learning generalizable dexterous manipulation from human grasp affordance. arXiv preprint arXiv:2204.02320, 2022.
  49. Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
  50. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
  51. Improving sample efficiency in model-free reinforcement learning from images. In AAAI Conference on Artificial Intelligence, 2019.
  52. Mastering complex control in moba games with deep reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):6672–6679, 2020.
  53. Learning to detect mobile objects from lidar scans without labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022.
  54. Visual reinforcement learning with self-supervised 3d representations. ArXiv, abs/2210.07241, 2022.
  55. Transporter networks: Rearranging the visual world for robotic manipulation. In Conference on Robot Learning, 2020.
  56. Point transformer. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 16239–16248, 2020.
  57. Learning hybrid actor-critic maps for 6d non-prehensile manipulation. arXiv preprint arXiv:2305.03942, 2023.
  58. Reinforcement and imitation learning for diverse visuomotor skills. ArXiv, abs/1802.09564, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhan Ling (16 papers)
  2. Yunchao Yao (10 papers)
  3. Xuanlin Li (18 papers)
  4. Hao Su (218 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.