MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection (2205.05979v2)

Published 12 May 2022 in cs.CV

Abstract: Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots. In this paper, we present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to achieve better detection. The three hierarchies conduct per-frame feature encoding, short-clip feature fusion, and whole-sequence feature aggregation, respectively. To enable processing long-sequence point clouds with reasonable computational resources, intra-group feature mixing and inter-group feature attention are proposed to form the second and third feature encoding hierarchies, which are recurrently applied for aggregating multi-frame trajectory features. The proxy points not only act as consistent object representations for each frame, but also serve as the courier to facilitate feature interaction between frames. The experiments on large Waymo Open dataset show that our approach outperforms state-of-the-art methods with large margins when applied to both short (e.g., 4-frame) and long (e.g., 16-frame) point cloud sequences. Code is available at https://github.com/open-mmlab/OpenPCDet.

Authors (6)

Xuesong Chen (13 papers)
Shaoshuai Shi (39 papers)
Benjin Zhu (6 papers)
Ka Chun Cheung (32 papers)
Hang Xu (205 papers)
Hongsheng Li (340 papers)

Citations (66)

View on Semantic Scholar

Summary

The paper introduces a novel two-stage framework that uses proxy points for effective multi-frame feature blending in 3D temporal object detection.
The framework employs a hierarchical approach combining per-frame encoding, intra-group mixing, and inter-group attention to improve detection accuracy.
Experimental results on the Waymo Open Dataset show significant precision gains over state-of-the-art methods in autonomous driving scenarios.

MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

The paper introduces MPPNet, a sophisticated framework designed for 3D temporal object detection utilizing point cloud sequences. This framework addresses challenges associated with multi-frame feature integration, particularly in long sequence detection, which are critical for applications like autonomous driving.

Methodology Overview

MPPNet employs a novel two-stage detection framework. The first stage involves generating 3D proposal trajectories from point cloud sequences using existing single-stage detection models such as CenterPoint. Following this, MPPNet focuses on effectively aggregating multi-frame object features.

A core innovation is the introduction of proxy points, uniformly distributed within the 3D proposal boxes and consistently aligned across frames. These proxy points facilitate consistent per-frame representation and efficient multi-frame feature interaction.

The framework employs a three-hierarchy model for robust feature aggregation:

Per-Frame Feature Encoding: This hierarchy encodes geometry and motion features separately. Geometry features are derived using set abstraction across proxy points, while motion features capture trajectories relative to the latest proposal box, aiding precise object state estimation over time.
Intra-Group Feature Mixing: Proxy points from short temporal clips undergo feature mixing using a 3D MLP Mixer, which processes data along spatial and channel dimensions to strengthen group feature synthesis.
Inter-Group Feature Attention: This hierarchy uses cross-attention to propagate and integrate features across groups, enriching the object's contextual representation and facilitating accurate 3D bounding box predictions.

Experimental Evaluation and Results

Experiments conducted on the Waymo Open Dataset underline MPPNet's efficacy. The approach showcases substantial improvements over existing methods in terms of mean Average Precision, particularly when handling both short and long point cloud sequences. MPPNet outperforms notable previous works like 3D-MAN and CenterPoint, demonstrating superior ability to integrate and utilize temporal information.

Implications and Future Directions

MPPNet’s introduction of proxy points and its hierarchical feature aggregation strategy mark significant advancements in temporal 3D object detection. The alignment and interaction facilitated by proxy points could steer future research towards even more resource-efficient and precise models, improving object detection in increasingly complex and dynamic real-world environments.

Future progress may involve refining these techniques to enhance processing efficiency further and expand their adaptability to other types of temporal data challenges. Additionally, the integration of real-time processing capabilities could broaden MPPNet's application across dynamic autonomous systems.

Overall, MPPNet contributes a robust framework that not only advances current methodologies but also lays groundwork for further exploration in 3D computer vision challenges.

PDF Markdown

Related Papers

GitHub

GitHub - open-mmlab/OpenPCDet: OpenPCDet Toolbox for LiDAR-based 3D Object Detection. (4,671 stars)