A Sparse Point Cloud 3D Object Detection Method Based on Size-Aware Attention Mechanism and BEV Backbone Network
DOI:
https://doi.org/10.54097/c537nh48Keywords:
3D object detection, PV-RCNN, Size-aware attention, BEV backbone networkAbstract
Addressing the challenge of effectively detecting distant and small-scale objects in sparse point cloud scenarios, this paper proposes an improved 3D object detection method based on a size-aware attention mechanism and a BEV backbone network, achieving performance enhancements within the PV-RCNN framework. First, the Size-Aware Dual-Branch Attention mechanism is introduced during the keypoint feature fusion stage. By estimating the voxel density in keypoint neighborhoods, it dynamically determines target scales. Channel attention and spatial attention are applied differentially to enhance keypoint features, thereby improving the model's ability to represent multi-scale features. Second, the BaseBEV backbone network is designed and integrated into the BEV feature extraction stage, replacing the original SECOND and SECONDFPN structures to enhance multi-scale feature modeling capabilities in the BEV space. Experimental validation on the KITTI 3D object detection dataset demonstrates that the proposed method achieves consistent improvements in both BEV detection and 3D detection tasks. Ablation studies further validate the effectiveness of each improved module, confirming that our approach effectively enhances 3D object detection performance in sparse point cloud environments.
Downloads
References
[1] Arnold, E., Al-Jarrah, O.Y., Dianati, M., Fallah, S., Oxtoby, D., & Mouzakitis, A. (2019). A Survey on 3D Object Detection Methods for Autonomous Driving Applications. IEEE Transactions on Intelligent Transportation Systems, 20, 3782-3795.
[2] Li, Z., Du, Y., Zhu, M., Zhou, S., & Zhang, L. (2021). A survey of 3D object detection algorithms for intelligent vehicles development. Artificial Life and Robotics, 27, 115 - 122.
[3] Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition, 3354-3361.
[4] Qi, C., Su, H., Mo, K., & Guibas, L.J. (2016). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 77-85.
[5] Qi, C., Yi, L., Su, H., & Guibas, L.J. (2017). PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. ArXiv, abs/1706.02413.
[6] Shi, S., Wang, X., & Li, H. (2018). PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 770-779.
[7] Zhou, Y., & Tuzel, O. (2017). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4490-4499.
[8] Yan, Y., Mao, Y., & Li, B. (2018). SECOND: Sparsely Embedded Convolutional Detection. Sensors (Basel, Switzerland), 18.
[9] Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2018). PointPillars: Fast Encoders for Object Detection From Point Clouds. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12689-12697.
[10] Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., & Li, H. (2019). PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10526-10535.
[11] Hu, J., Shen, L., Albanie, S., Sun, G., & Wu, E. (2017). Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7132-7141.
[12] Woo, S., Park, J., Lee, J., & Kweon, I. (2018). CBAM: Convolutional Block Attention Module. ArXiv, abs/1807.06521.
[13] Yang, B., Luo, W., & Urtasun, R. (2018). PIXOR: Real-time 3D Object Detection from Point Clouds. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7652-7660.
[14] Yin, T., Zhou, X., & Krähenbühl, P. (2020). Center-based 3D Object Detection and Tracking. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11779-11788.
[15] Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D., & Han, S. (2022). BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation. 2023 IEEE International Conference on Robotics and Automation (ICRA), 2774-2781.
[16] Shi, S., Wang, Z., Wang, X., & Li, H. (2019). Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud. ArXiv, abs/1907.03670.
[17] Shi, S., Jiang, L., Deng, J., Wang, Z., Guo, C., Shi, J., Wang, X., & Li, H. (2021). PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection. International Journal of Computer Vision, 131, 531-551.
[18] Yang, Z., Sun, Y., Liu, S., & Jia, J. (2020). 3DSSD: Point-Based 3D Single Stage Object Detector. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11037-11045.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Computing and Electronic Information Management

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








