Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for End-to-End Visual Robotic Manipulation Learning (2206.08321v3)

Published 16 Jun 2022 in cs.RO and cs.AI

Abstract: End-to-end learning for visual robotic manipulation is known to suffer from sample inefficiency, requiring large numbers of demonstrations. The spatial roto-translation equivariance, or the SE(3)-equivariance can be exploited to improve the sample efficiency for learning robotic manipulation. In this paper, we present SE(3)-equivariant models for visual robotic manipulation from point clouds that can be trained fully end-to-end. By utilizing the representation theory of the Lie group, we construct novel SE(3)-equivariant energy-based models that allow highly sample efficient end-to-end learning. We show that our models can learn from scratch without prior knowledge and yet are highly sample efficient (5~10 demonstrations are enough). Furthermore, we show that our models can generalize to tasks with (i) previously unseen target object poses, (ii) previously unseen target object instances of the category, and (iii) previously unseen visual distractors. We experiment with 6-DoF robotic manipulation tasks to validate our models' sample efficiency and generalizability. Codes are available at: https://github.com/tomato1mule/edf

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
  2. G Aubert. An alternative to wigner d-matrices for rotating real spherical harmonics. AIP Advances, 3(6):062121, 2013.
  3. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  4. Roger Brockett. Notes on stochastic processes on manifolds. In Systems and Control in the Twenty-first Century, pp. 75–100. Springer, 1997.
  5. On contrastive divergence learning. In International workshop on artificial intelligence and statistics, pp.  33–40. PMLR, 2005.
  6. Gregory S Chirikjian. Stochastic models, information theory, and Lie groups, volume 2: Analytic methods and modern applications, volume 2. Springer Science & Business Media, 2011.
  7. Gregory S Chirikjian. Partial bi-invariance of se (3) metrics. Journal of Computing and Information Science in Engineering, 15(1), 2015.
  8. Local neural descriptor fields: Locally conditioned object representations for manipulation. arXiv preprint arXiv:2302.03573, 2023.
  9. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
  10. Geometric integrator for langevin systems with quaternion-based rotational degrees of freedom and hydrodynamic interactions. The Journal of chemical physics, 147(22):224103, 2017.
  11. Self-supervised 6d object pose estimation for robot manipulation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pp.  3665–3671. IEEE, 2020.
  12. Rosen Diankov. Automated Construction of Robotic Manipulation Programs. PhD thesis, Carnegie Mellon University, Robotics Institute, August 2010. URL http://www.programmingvision.com/rosen_diankov_thesis.pdf.
  13. Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems, 32, 2019.
  14. Implicit behavioral cloning. In Conference on Robot Learning, pp.  158–168. PMLR, 2022.
  15. Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756, 2018.
  16. SE(3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33:1970–1981, 2020.
  17. Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786, 2021.
  18. Caelan Reed Garrett. Pybullet planning. https://pypi.org/project/pybullet-planning/, 2018.
  19. Convergence of the riemannian langevin algorithm. arXiv preprint arXiv:2204.10818, 2022.
  20. Euclidean neural networks: e3nn, apr 2022. URL https://doi.org/10.5281/zenodo.6459381.
  21. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123–214, 2011.
  22. Deep learning. MIT press, 2016.
  23. Introduction to quantum mechanics. Cambridge university press, 2018.
  24. W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109, 04 1970. ISSN 0006-3444. doi: 10.1093/biomet/57.1.97. URL https://doi.org/10.1093/biomet/57.1.97.
  25. Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
  26. Equivariant transporter network. arXiv preprint arXiv:2202.09400, 2022.
  27. Learning equivariant energy based models with equivariant stein variational gradient descent. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  16727–16737. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/8b9e7ab295e87570551db122a04c6f7c-Paper.pdf.
  28. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation (2018). arXiv preprint arXiv:1806.10293, 2018.
  29. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  30. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  31. Unsupervised learning of object keypoints for perception and control. Advances in neural information processing systems, 32, 2019.
  32. Paul Langevin. Sur la théorie du mouvement brownien. Compt. Rendus, 146:530–533, 1908.
  33. Denoising diffusion probabilistic models on SO(3) for rotational alignment. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
  34. Hierarchical primitive composition: Simultaneous activation of skills with inconsistent action dimensions in multiple hierarchies. IEEE Robotics and Automation Letters, 7(3):7581–7588, 2022.
  35. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
  36. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. arXiv preprint arXiv:2206.11990, 2022.
  37. Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in neural information processing systems, 29, 2016.
  38. Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953.
  39. Mikio Nakahara. Geometry, topology and physics. CRC press, 2018.
  40. Normal distribution on the rotation group SO(3). Textures and Microstructures, 29, 1970.
  41. Recent advances in robot learning from demonstration. Annual Review of Control, Robotics, and Autonomous Systems, 3:297–330, 2020.
  42. TM Ivanova TI Savyolova. Normal distributions on SO(3). In Programming And Mathematical Techniques In Physics-Proceedings Of The Conference On Programming And Mathematical Methods For Solving Physical Problems, pp.  220. World Scientific, 1994.
  43. Learning to rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp.  4568–4575. IEEE, 2021.
  44. Point-gnn: Graph neural network for 3d object detection in a point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  1711–1719, 2020.
  45. Neural descriptor fields: SE(3)-equivariant object representations for manipulation. arXiv preprint arXiv:2112.05124, 2021.
  46. Se (3)-equivariant relational rearrangement with neural descriptor fields. arXiv preprint arXiv:2211.09786, 2022.
  47. Michael Spivak. Calculus on manifolds: a modern approach to classical theorems of advanced calculus. CRC press, 2018.
  48. Rgcnn: Regularized graph cnn for point cloud segmentation. In Proceedings of the 26th ACM international conference on Multimedia, pp.  746–754, 2018.
  49. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
  50. So (2) equivariant reinforcement learning. In International Conference on Learning Representations, 2022.
  51. The surprising effectiveness of equivariant models in domains with latent symmetry. arXiv preprint arXiv:2211.09231, 2022.
  52. Learning to draw samples: With application to amortized mle for generative adversarial learning. arXiv preprint arXiv:1611.01722, 2016.
  53. Dynamic graph cnn for learning on point clouds. Acm Transactions On Graphics (tog), 38(5):1–12, 2019.
  54. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pp.  681–688. Citeseer, 2011.
  55. SE(3)-equivariant energy-based models for end-to-end protein folding. bioRxiv, 2021.
  56. A theory of generative convnet. In International Conference on Machine Learning, pp. 2635–2644. PMLR, 2016.
  57. Synthesizing dynamic patterns by spatial-temporal generative convnet. In Proceedings of the ieee conference on computer vision and pattern recognition, pp.  7093–7101, 2017.
  58. Cooperative training of descriptor and generator networks. IEEE transactions on pattern analysis and machine intelligence, 42(1):27–45, 2018a.
  59. Learning descriptor networks for 3d shape synthesis and analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  8629–8638, 2018b.
  60. Generative pointnet: Deep energy-based learning on unordered point sets for 3d generation, reconstruction and classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14976–14985, 2021a.
  61. Cooperative training of fast thinking initializer and slow thinking solver for conditional learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(8):3957–3973, 2021b.
  62. Learning energy-based model with variational auto-encoder as amortized sampler. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10441–10451, 2021c.
  63. A tale of two flows: cooperative learning of langevin flow and normalizing flow toward energy-based model. arXiv preprint arXiv:2205.06924, 2022.
  64. Energy-based continuous inverse optimal control. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  65. Anthony Zee. Group theory in a nutshell for physicists, volume 17. Princeton University Press, 2016.
  66. Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In 2017 IEEE international conference on robotics and automation (ICRA), pp.  1386–1383. IEEE, 2017.
  67. Transporter networks: Rearranging the visual world for robotic manipulation. arXiv preprint arXiv:2010.14406, 2020.
  68. Patchwise generative convnet: Training energy-based models from a single natural image for internal learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2961–2970, 2021.
  69. Grade: Gibbs reaction and diffusion equations. In Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp.  847–854. IEEE, 1998.
Citations (43)

Summary

We haven't generated a summary for this paper yet.