MOT-DETR: 3D Single Shot Detection and Tracking with Transformers to build 3D representations for Agro-Food Robots (2311.15674v3)
Abstract: In the current demand for automation in the agro-food industry, accurately detecting and localizing relevant objects in 3D is essential for successful robotic operations. However, this is a challenge due the presence of occlusions. Multi-view perception approaches allow robots to overcome occlusions, but a tracking component is needed to associate the objects detected by the robot over multiple viewpoints. Most multi-object tracking (MOT) algorithms are designed for high frame rate sequences and struggle with the occlusions generated by robots' motions and 3D environments. In this paper, we introduce MOT-DETR, a novel approach to detect and track objects in 3D over time using a combination of convolutional networks and transformers. Our method processes 2D and 3D data, and employs a transformer architecture to perform data fusion. We show that MOT-DETR outperforms state-of-the-art multi-object tracking methods. Furthermore, we prove that MOT-DETR can leverage 3D data to deal with long-term occlusions and large frame-to-frame distances better than state-of-the-art methods. Finally, we show how our method is resilient to camera pose noise that can affect the accuracy of point clouds. The implementation of MOT-DETR can be found here: https://github.com/drapado/mot-detr
- G. Kootstra, X. Wang, P. M. Blok, J. Hemming, and E. van Henten, “Selective Harvesting Robotics: Current Research, Trends, and Future Directions,” Current Robotics Reports, vol. 2, no. 1, pp. 95–104, Mar. 2021. [Online]. Available: https://doi.org/10.1007/s43154-020-00034-1
- J. Crowley, “Dynamic world modeling for an intelligent mobile robot using a rotating ultra-sonic ranging device,” in Proceedings. 1985 IEEE International Conference on Robotics and Automation, vol. 2. St. Louis, MO, USA: Institute of Electrical and Electronics Engineers, 1985, pp. 128–135. [Online]. Available: http://ieeexplore.ieee.org/document/1087380/
- J. Elfring, S. van den Dries, M. van de Molengraft, and M. Steinbuch, “Semantic world modeling using probabilistic multiple hypothesis anchoring,” Robotics and Autonomous Systems, vol. 61, no. 2, pp. 95–105, Feb. 2013. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0921889012002163
- B. Arad, J. Balendonck, R. Barth, O. Ben‐Shahar, Y. Edan, T. Hellström, J. Hemming, P. Kurtser, O. Ringdahl, T. Tielen, and B. v. Tuijl, “Development of a sweet pepper harvesting robot,” Journal of Field Robotics, vol. n/a, no. n/a, 2020, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/rob.21937. [Online]. Available: https://www.onlinelibrary.wiley.com/doi/abs/10.1002/rob.21937
- A. K. Burusa, J. Scholten, D. R. Rincon, X. Wang, E. J. van Henten, and G. Kootstra, “Efficient Search and Detection of Relevant Plant Parts using Semantics-Aware Active Vision,” June 2023, arXiv:2306.09801 [cs]. [Online]. Available: http://arxiv.org/abs/2306.09801
- L. L. Wong, L. P. Kaelbling, and T. Lozano-Pérez, “Data association for semantic world modeling from partial views,” The International Journal of Robotics Research, vol. 34, no. 7, pp. 1064–1082, June 2015, publisher: SAGE Publications Ltd STM. [Online]. Available: https://doi.org/10.1177/0278364914559754
- A. Persson, P. Z. D. Martires, A. Loutfi, and L. De Raedt, “Semantic Relational Object Tracking,” IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 1, pp. 84–97, Mar. 2020, arXiv: 1902.09937. [Online]. Available: http://arxiv.org/abs/1902.09937
- D. Rapado-Rincón, E. J. van Henten, and G. Kootstra, “Development and evaluation of automated localisation and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking,” Biosystems Engineering, vol. 231, pp. 78–91, July 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1537511023001162
- A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple Online and Realtime Tracking,” 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468, Sept. 2016, arXiv: 1602.00763. [Online]. Available: http://arxiv.org/abs/1602.00763
- N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in 2017 IEEE International Conference on Image Processing (ICIP), Sept. 2017, pp. 3645–3649, iSSN: 2381-8549.
- Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “FairMOT: On the Fairness of Detection and Re-identification in Multiple Object Tracking,” International Journal of Computer Vision, vol. 129, no. 11, pp. 3069–3087, Nov. 2021. [Online]. Available: https://doi.org/10.1007/s11263-021-01513-4
- T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer, “TrackFormer: Multi-Object Tracking with Transformers,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA: IEEE, June 2022, pp. 8834–8844. [Online]. Available: https://ieeexplore.ieee.org/document/9879668/
- F. Zeng, B. Dong, Y. Zhang, T. Wang, X. Zhang, and Y. Wei, “MOTR: End-to-End Multiple-Object Tracking with Transformer,” July 2022, arXiv:2105.03247 [cs]. [Online]. Available: http://arxiv.org/abs/2105.03247
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection with Transformers,” May 2020, arXiv:2005.12872 [cs]. [Online]. Available: http://arxiv.org/abs/2005.12872
- M. Halstead, C. McCool, S. Denman, T. Perez, and C. Fookes, “Fruit Quantity and Ripeness Estimation Using a Robotic Vision System,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 2995–3002, Oct. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8392450/
- R. Kirk, M. Mangan, and G. Cielniak, “Robust Counting of Soft Fruit Through Occlusions with Re-identification,” in Computer Vision Systems, ser. Lecture Notes in Computer Science, M. Vincze, T. Patten, H. I. Christensen, L. Nalpantidis, and M. Liu, Eds. Cham: Springer International Publishing, 2021, pp. 211–222.
- M. Halstead, A. Ahmadi, C. Smitt, O. Schmittmann, and C. McCool, “Crop Agnostic Monitoring Driven by Deep Learning,” Frontiers in Plant Science, vol. 12, 2021. [Online]. Available: https://www.frontiersin.org/article/10.3389/fpls.2021.786702
- J. Villacrés, M. Viscaino, J. Delpiano, S. Vougioukas, and F. Auat Cheein, “Apple orchard production estimation using deep learning strategies: A comparison of tracking-by-detection algorithms,” Computers and Electronics in Agriculture, vol. 204, p. 107513, Jan. 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168169922008213
- D. Rapado-Rincón, E. J. van Henten, and G. Kootstra, “MinkSORT: A 3D deep feature extractor using sparse convolutions to improve 3D multi-object tracking in greenhouse tomato plants,” July 2023, arXiv:2307.05219 [cs]. [Online]. Available: http://arxiv.org/abs/2307.05219
- R. Hemmerling, O. Kniemeyer, D. Lanwert, W. Kurth, and G. Buck-Sorlin, “The rule-based language XL and the modelling environment GroIMP illustrated with simulated tree competition,” Functional Plant Biology, vol. 35, no. 10, pp. 739–750, Nov. 2008, publisher: CSIRO PUBLISHING. [Online]. Available: https://www.publish.csiro.au/fp/FP08052
- Q.-Y. Zhou, J. Park, and V. Koltun, “Open3D: A Modern Library for 3D Data Processing,” Jan. 2018, arXiv:1801.09847 [cs]. [Online]. Available: http://arxiv.org/abs/1801.09847
- M. Afonso, H. Fonteijn, F. S. Fiorentin, D. Lensink, M. Mooij, N. Faber, G. Polder, and R. Wehrens, “Tomato Fruit Detection and Counting in Greenhouses Using Deep Learning,” Frontiers in Plant Science, vol. 11, 2020. [Online]. Available: https://www.frontiersin.org/article/10.3389/fpls.2020.571299