Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries (2405.11677v1)
Abstract: Accurate 6-DoF pose estimation of surgical instruments during minimally invasive surgeries can substantially improve treatment strategies and eventual surgical outcome. Existing deep learning methods have achieved accurate results, but they require custom approaches for each object and laborious setup and training environments often stretching to extensive simulations, whilst lacking real-time computation. We propose a general-purpose approach of data acquisition for 6-DoF pose estimation tasks in X-ray systems, a novel and general purpose YOLOv5-6D pose architecture for accurate and fast object pose estimation and a complete method for surgical screw pose estimation under acquisition geometry consideration from a monocular cone-beam X-ray image. The proposed YOLOv5-6D pose model achieves competitive results on public benchmarks whilst being considerably faster at 42 FPS on GPU. In addition, the method generalizes across varying X-ray acquisition geometry and semantic image complexity to enable accurate pose estimation over different domains. Finally, the proposed approach is tested for bone-screw pose estimation for computer-aided guidance during spine surgeries. The model achieves a 92.41% by the 0.1 ADD-S metric, demonstrating a promising approach for enhancing surgical precision and patient outcomes. The code for YOLOv5-6D is publicly available at https://github.com/cviviers/YOLOv5-6D-Pose
- T. J. Learch, J. B. Massie, M. N. Pathria, B. A. Ahlgren, and S. R. Garfin, “Assessment of pedicle screw placement utilizing conventional radiography and computed tomography: a proposed systematic approach to improve accuracy of interpretation,” Spine (Phila Pa 1976), vol. 29, no. 7, pp. 767–773, Apr. 2004.
- A. Elmi-Terander, G. Burström, R. Nachabé, M. Fagerlund, F. Ståhl, A. Charalampidis, E. Edström, and P. Gerdhem, “Augmented reality navigation with intraoperative 3d imaging vs fluoroscopy-assisted free-hand surgery for spine fixation surgery: a matched-control study comparing accuracy,” Scientific Reports, vol. 10, no. 1, p. 707, Jan 2020. [Online]. Available: https://doi.org/10.1038/s41598-020-57693-5
- G. Burström, R. Nachabe, O. Persson, E. Edström, and A. Elmi Terander, “Augmented and virtual reality instrument tracking for minimally invasive spine surgery: A feasibility and accuracy study,” Spine, vol. 44, no. 15, 2019. [Online]. Available: https://journals.lww.com/spinejournal/Fulltext/2019/08010/Augmented_and_Virtual_Reality_Instrument_Tracking.16.aspx
- M. Richter, T. Mattes, and B. Cakir, “Computer-assisted posterior instrumentation of the cervical and cervico-thoracic spine,” European Spine Journal, vol. 13, no. 1, pp. 50–59, Feb 2004. [Online]. Available: https://doi.org/10.1007/s00586-003-0604-1
- C. R. Hatt, M. A. Speidel, and A. N. Raval, “Real-time pose estimation of devices from x-ray images: Application to x-ray/echo registration for cardiac interventions,” Medical Image Analysis, vol. 34, pp. 101–108, 2016, special Issue on the 2015 Conference on Medical Image Computing and Computer Assisted Intervention. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1361841516300275
- Y. Zhu, M. Li, W. Yao, and C. Chen, “A review of 6d object pose estimation,” in 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), vol. 10, 2022, pp. 1647–1655.
- B. Tekin, S. N. Sinha, and P. Fua, “Real-Time Seamless Single Shot 6D Object Pose Prediction,” in CVPR, 2018.
- H. Chen, P. Wang, F. Wang, W. Tian, L. Xiong, and H. Li, “Epro-pnp: Generalized end-to-end probabilistic perspective-n-points for monocular object pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2781–2790.
- C. G. A. Viviers, J. de Bruijn, L. Filatova, P. H. N. de With, and F. van der Sommen, “Towards real-time 6D pose estimation of objects in single-view cone-beam x-ray,” in Medical Imaging 2022: Image-Guided Procedures, Robotic Interventions, and Modeling, C. A. Linte and J. H. Siewerdsen, Eds., vol. 12034, International Society for Optics and Photonics. SPIE, 2022, p. 120341V. [Online]. Available: https://doi.org/10.1117/12.2611143
- X. X. Lu, “A review of solutions for perspective-n-point problem in camera pose estimation,” Journal of Physics: Conference Series, vol. 1087, p. 052009, sep 2018.
- Y. Wu, M. Zand, A. Etemad, and M. Greenspan, “Vote from the center: 6 dof pose estimation in rgb-d images by radial keypoint voting,” in European Conference on Computer Vision. Springer, 2022, pp. 335–352.
- Y. He, W. Sun, H. Huang, J. Liu, H. Fan, and J. Sun, “Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 632–11 641.
- Y. He, Y. Wang, H. Fan, J. Sun, and Q. Chen, “Fs6d: Few-shot 6d pose estimation of novel objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6814–6824.
- B. Wen, W. Yang, J. Kautz, and S. Birchfield, “Foundationpose: Unified 6d pose estimation and tracking of novel objects,” arXiv preprint arXiv:2312.08344, 2023.
- Y. Liu, Y. Wen, S. Peng, C. Lin, X. Long, T. Komura, and W. Wang, “Gen6d: Generalizable model-free 6-dof object pose estimation from rgb images,” in European Conference on Computer Vision. Springer, 2022, pp. 298–315.
- J. Lin, Z. Wei, Y. Zhang, and K. Jia, “Vi-net: Boosting category-level 6d object pose estimation via learning decoupled rotations on the spherical representations,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 14 001–14 011.
- R. Wang, X. Wang, T. Li, R. Yang, M. Wan, and W. Liu, “Query6dof: Learning sparse queries as implicit shape prior for category-level 6dof pose estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 14 055–14 064.
- Y. Bukschat and M. Vetter, “Efficientpose: An efficient, accurate and scalable end-to-end 6d multi object pose estimation approach,” 2020.
- M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 6105–6114. [Online]. Available: https://proceedings.mlr.press/v97/tan19a.html
- M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 781–10 790.
- Y. Xu, K.-Y. Lin, G. Zhang, X. Wang, and H. Li, “Rnnpose: Recurrent 6-dof object pose refinement with robust correspondence field estimation and pose optimization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” arXiv preprint arXiv:1711.00199, 2017.
- S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, “Pvnet: Pixel-wise voting network for 6dof pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4561–4570.
- J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in CVPR, 2017, pp. 6517–6525.
- J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv, 2018.
- A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv, 2020.
- C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Scaled-YOLOv4: Scaling cross stage partial network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 13 029–13 038.
- G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon, TaoXie, K. Michael, J. Fang, imyhxy, Lorna, C. Wong, Z. Yifu, A. V, D. Montes, Z. Wang, C. Fati, J. Nadar, Laughing, UnglvKitDe, tkianai, yxNONG, P. Skalski, A. Hogan, M. Strobel, M. Jain, L. Mammana, and xylieong, “ultralytics/yolov5: v6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations,” Aug. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.7002879
- Z. Li, G. Wang, and X. Ji, “Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7678–7687.
- A. Presenti, S. Bazrafkan, J. Sijbers, and J. De Beenhouwer, “Deep learning-based 2D-3D sample pose estimation for X-ray 3DCT,” Tech. Rep., 2020. [Online]. Available: http://www.ndt.net/?id=25117
- A. Presenti, Z. Liang, L. F. Alves Pereira, J. Sijbers, and J. De Beenhouwer, “Cnn-based pose estimation of manufactured objects during inline x-ray inspection,” in 2021 IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI), 2021, pp. 388–393.
- A. Presenti, Z. Liang, L. F. A. Pereira, J. Sijbers, and J. De Beenhouwer, “Fast and accurate pose estimation of additive manufactured objects from few x-ray projections,” Expert Systems with Applications, vol. 213, p. 118866, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S095741742201884X
- M. Bui, S. Albarqouni, M. Schrapp, N. Navab, and S. Ilic, “X-ray posenet: 6 dof pose estimation for mobile x-ray devices,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 2017, pp. 1036–1044.
- L. Kausch, S. Thomas, H. Kunze, T. Norajitra, A. Klein, L. Ayala, J. El Barbari, E. Mandelka, M. Privalov, S. Vetter, A. Mahnken, L. Maier-Hein, and K. Maier-Hein, “C-arm positioning for standard projections during spinal implant placement,” Medical Image Analysis, vol. 81, p. 102557, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S136184152200202X
- A. Pourtaherian, H. J. Scholten, L. Kusters, S. Zinger, N. Mihajlovic, A. F. Kolen, F. Zuo, G. C. Ng, H. H. M. Korsten, and P. H. N. de With, “Medical instrument detection in 3-dimensional ultrasound data volumes,” IEEE Transactions on Medical Imaging, vol. 36, no. 8, pp. 1664–1675, 2017.
- H. Yang, C. Shan, A. F. Kolen, and P. H. N. de With, “Medical instrument detection in ultrasound: a review,” Artificial Intelligence Review, Sep 2022. [Online]. Available: https://doi.org/10.1007/s10462-022-10287-1
- D. Kügler, J. Sehring, A. Stefanov, I. Stenin, J. Kristin, T. Klenzner, J. Schipper, and A. Mukhopadhyay, “i3posnet: Instrument pose estimation from x-ray in temporal bone surgery,” International journal of computer assisted radiology and surgery, vol. 15, no. 7, pp. 1137–1145, 2020. [Online]. Available: https://link.springer.com/article/10.1007/s11548-020-02157-4
- S. Liu and W. Deng, “Very deep convolutional neural network based image classification using small training sample size,” in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), 2015, pp. 730–734.
- S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige, N. Navab, and V. Lepetit, “Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes,” in 2011 International Conference on Computer Vision, 2011, pp. 858–865.
- T. Hodan, P. Haluza, S. Obdrzálek, J. Matas, M. I. A. Lourakis, and X. Zabulis, “T-LESS: an RGB-D dataset for 6d pose estimation of texture-less objects,” CoRR, vol. abs/1701.05498, 2017. [Online]. Available: http://arxiv.org/abs/1701.05498
- S. Garrido-Jurado, R. Muñoz-Salinas, F. Madrid-Cuevas, and M. Marín-Jiménez, “Automatic generation and detection of highly reliable fiducial markers under occlusion,” Pattern Recognition, vol. 47, no. 6, pp. 2280–2292, 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0031320314000235
- G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
- S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab, “Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes,” in Computer Vision – ACCV 2012, K. M. Lee, Y. Matsushita, J. M. Rehg, and Z. Hu, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 548–562.
- V. Lepetit, F. Moreno-Noguer, and P. Fua, “Ep n p: An accurate o (n) solution to the p n p problem,” International journal of computer vision, vol. 81, pp. 155–166, 2009.
- C.-Y. Wang, H.-Y. Mark Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “Cspnet: A new backbone that can enhance learning capability of cnn,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 390–391.
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010.
- I. Sárándi, T. Linder, K. O. Arras, and B. Leibe, “Synthetic occlusion augmentation with volumetric heatmaps for the 2018 eccv posetrack challenge on 3d human pose estimation,” arXiv preprint arXiv:1809.04987, 2018.
- T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll’a r, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312
- E. Brachmann, F. Michel, A. Krull, M. Y. Yang, S. Gumhold et al., “Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3364–3372.
- W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, “Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1521–1529.
- Y. Bukschat and M. Vetter, “EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach,” Tech. Rep., 2020. [Online]. Available: https://github.com/ybkscht/EfficientPose.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
- F. Manni, A. Elmi-Terander, G. Burström, O. Persson, E. Edström, R. Holthuizen, C. Shan, S. Zinger, F. van der Sommen, and P. H. N. de With, “Towards optical imaging for spine tracking without markers in navigated spine surgery,” Sensors, vol. 20, no. 13, 2020. [Online]. Available: https://www.mdpi.com/1424-8220/20/13/3641
- M. Kisantal, S. Sharma, T. H. Park, D. Izzo, M. Märtens, and S. D’Amico, “Satellite pose estimation challenge: Dataset, competition design, and results,” IEEE Transactions on Aerospace and Electronic Systems, vol. 56, no. 5, pp. 4083–4098, 2020.
- P. Carcagnì, M. Leo, P. Spagnolo, P. L. Mazzeo, and C. Distante, “A lightweight model for satellite pose estimation,” in Image Analysis and Processing–ICIAP 2022: 21st International Conference, Lecce, Italy, May 23–27, 2022, Proceedings, Part I. Springer, 2022, pp. 3–14.