基于双重池化注意力机制和竖直特征融合的DV-PointPillars三维目标检测模型

doi:10.3969/j.issn.1674-8484.2025.05.014

摘要/Abstract

摘要： 为了改善柱体化三维目标检测模型中存在的柱体特征表征能力不足、误检漏检的问题，提出一种基于双重池化注意力机制和竖直特征融合的DV-PointPillars模型。在编码网络中引入最大和平均双重池化注意力机制，充分利用柱体内的点云信息，提升柱体特征的表征能力；设计竖直区域特征提取网络，获取柱体在竖直方向上的特征信息，并在主干网络融合特征，改善编码方式导致的信息压缩问题，减少误判并提升遮挡情况的识别能力。采用KITTI数据集对汽车、行人、骑行者3个类别从简单、中等、困难3个难度进行实验。结果表明: 相较于PointPillars模型，DV-PointPillars模型在增加3个模块后对车辆、行人、骑行者3个类别的3D平均检测准确度分别提升4.02%、5.17%、5.09%，显示出该模型的有效性。

关键词: 自动驾驶, 环境感知, 三维目标检测, 点云, 注意力池化

Abstract:

A DV-PointPillars 3D object detection model based on dual pooling attention mechanism and vertical feature fusion was proposed to improve the issues of insufficient pillar feature representation ability and false/missed detection in pillar-based 3D object detection methods for point clouds. The max and average dual pooling attention mechanism was introduced into the encoding network. By utilizing both max pooling attention and average pooling attention mechanisms, this approach can fully leverage the point cloud information within pillars, thereby the representation ability of pillar features was improved. A vertical region feature generation network was designed to obtain the feature information of the pillars in the vertical direction, and the features were fused in the backbone network to improve the information compression problem caused by the encoding method, reduce misjudgment and improve the recognition ability of occlusion. Experiments were conducted on three categories of cars, pedestrians and cyclists using the KITTI dataset from three levels of difficulty: simple, medium and difficult. The results show that: compared with the PointPillars model, the average 3D detection average precision of the DV-PointPillars model for the three categories of vehicles, pedestrians, and cyclists increased by 4.02%, 5.17%, and 5.09% respectively after adding three modules, which verifies the effectiveness of the proposed method.

Key words: autonomous driving, environmental perception, 3D object detection, point cloud, attention pooling

中图分类号:

TP391.4

潘玉恒, 任晨, 鲁维佳, 李洋. 基于双重池化注意力机制和竖直特征融合的DV-PointPillars三维目标检测模型[J]. 汽车安全与节能学报, 2025, 16(5): 793-801.

PAN Yuheng, REN Chen, LU Weijia, LI Yang. DV-PointPillars 3D object detection model based on dual pooling attention mechanism and vertical feature fusion[J]. Journal of Automotive Safety and Energy, 2025, 16(5): 793-801.

图/表 8

参考文献 26

[1]	《中国公路学报》编辑部. 中国汽车工程学术研究综述·2023[J]. 中国公路学报, 2023, 36(11): 1-192. doi: 10.19721/j.cnki.1001-7372.2023.11.001
	Editorial Department of China Journal of Highway and Transport. Review on China's automotive engineering research progress:2023[J]. China J Highw Transport, 2023, 36(11): 1-192. (in Chinese)
[2]	QIAN Rui, LAI Xin, LI Xirong. 3D object detection for autonomous driving: A survey[J]. Patt Recogn, 2022, 130: No 108796.
[3]	ZHOU Yunsong, HE Yuan, ZHU Hongzi, et al. Monocular 3d object detection: an extrinsic parameter free approach[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Nashville, TN, USA, 2021: 7556-7566.
[4]	LI Peiliang, CHEN Xiaozhi, SHEN Shaojie. Stereo R-CNN based 3D object detection for autonomous driving[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Long Beach, CA, USA, 2019: 7636-7644.
[5]	CHEN Xiaozhi, MA Huimin, WAN Ji, et al. Multi-view 3D object detection network for autonomous driving[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Nashville, TN, USA, 2018: 6526-6534.
[6]	Qi C R, LIU W, WU Chenxia, et al. Frustum PointNets for 3D object detection from RGB-D data[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Salt Lake City, UT, USA, 2018: 918-927.
[7]	K Jason, Mozifian M, Lee J, et al. Joint 3D proposal generation and object detection from view aggregation[C]// Proc IEEE/RSJ Int’l Conf Intel Robot Syst (IROS). Madrid, Spain, 2018: 1-8.
[8]	Sindagi V A, ZHOU Yin, Tuzel O. MVX-Net: Multimodal VoxelNet for 3D object detection[C]// Proc Int’l Conf Robot Auto (ICRA). Montreal, Canada, 2019: 7276-7282.
[9]	Vora S, Lang A H, Helou B, et al. PointPainting: sequential fusion for 3D object detection[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Seattle, WA, USA, 2020: 4603-4611.
[10]	李昌财, 陈刚, 侯作勋, 等. 自动驾驶中的三维目标检测算法研究综述[J]. 中国图象图形学报, 2024, 29(11): 3238-3264.
	LI Changcai, CHEN Gang, HOU Zuoxun, et al. Survey of 3D object detection algorithms for autonomous driving[J]. J Image Grap, 2024, 29(11): 3238-3264. (in Chinese)
[11]	QI C R, SU Hao, Kaichun M, et al. PointNet: deep learning on point sets for 3D classification and segmentation[C]// Proc IEEE Conf Comput Visi Patt Recogn. Honolulu, HI, USA, 2017: 77-85
[12]	Qi C R, YI Li, SU Hao, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[C]// Proc 31st Int’l Conf Neur Info Process Syst. Long Beach, California, USA, 2017: 5105-5114.
[13]	SHI Shaoshuai, WANG Xiaogang, LI Hongsheng. PointRCNN:3D object proposal generation and detection from point cloud[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Long Beach, CA, USA, 2019: 770-779.
[14]	YANG Zetong, SUN Yanan, LIU Shu, et al. 3DSSD: Point-based 3D single stage object detector[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Seattle, WA, USA, 2020: 11037-11045.
[15]	ZHOU Yin, Tuzel O. VoxelNet: End-to-end learning for point cloud based 3D object detection[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Salt Lake City, UT, USA, 2018: 4490-4499.
[16]	YAN Yan, MAO Yuxing, LI Bo. SECOND: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337. doi: 10.3390/s18103337 URL
[17]	Graham B, Engelcke M, Van D M L. 3D semantic segmentation with submanifold sparse convolutional networks[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Salt Lake City, UT, USA, 2018: 9224-9232.
[18]	Lang A H, Vora S, Caesar H, et al. PointPillars: Fast encoders for object detection from point clouds[C]// Proc IEEE/CVF Conf Comput Visi Patt Recogn. Long Beach, CA, USA, 2019: 770-779.
[19]	伍新月, 惠飞, 金鑫. 基于GCR-PointPillars的点云三维目标检测[J]. 现代电子技术, 2024, 47(11): 168-174.
	WU Xinyue, HUI Fei, JIN Xin. 3D object detection in point cloud based on GCR-PointPillars[J]. Modern Elect Tech, 2024, 47(11): 168-174.
[20]	周昊, 齐洪钢, 邓永强, 等. 融合点云深度信息的3D目标检测与分类[J]. 中国图象图形学报, 2024, 29(8): 2399-2412.
	ZHOU Hao, QI Honggang, DENG Yongqiang, et al. 3D object detection and classification combined with point cloud depth information[J]. J Image Graph, 2024, 29(8): 2399-2412. (in Chinese)
[21]	汤新华, 代道文, 陈熙源, 等. 基于PointPillars的改进三维目标检测算法[J]. 仪器仪表学报, 2024, 45(9): 260-269.
	TANG Xinhu, DAI Daowen, CHEN Xiyuan, et al. Improved three-dimensional object detection algorithm based on PointPillars[J]. Chin J Sci Inst, 2024, 45(9): 260-269. (in Chinese)
[22]	王量子, 黄妙华, 刘若璎, 等. 改进PointPillars和Transformer的路侧两阶段三维目标检测算法[J]. 激光与光电子学进展, 2024, 61(18): 413-422.
	WANG Liangzi, HUANG Miaohua, LIU Ruoying, et al. Improved Two-Stage 3D object detection algorithm for roadside scenes with enhanced pointpillars and transformer[J]. Lase Optoelect Prog, 2024, 61(18): 413-422. (in Chinese)
[23]	SHI Guangsheng, LI Ruifeng, MA Chao. PillarNet:Real-time and high-performancepillar-based 3D object detection[C]// Proc 2022 Europ Conf Comput Visi. Tel Aviv, Israel, 22022: 35-52.
[24]	ZHOU Sifan, TIAN Zhi, CHU Xiangxiang, et al. FastPillars: A deployment-friendly pillar-based 3D detector[J]. ArXiv: 2302.02367, 2023.
[25]	杨庆鑫, 孔德明, 陈晶, 等. 基于密度聚类和双重注意力机制的PointPillars改进方法[J]. 激光与光电子学进展, 2025, 62(10): 114-122.
	YANG Qingxin, KONG Deming, CHEN Jing, et al. Enhancing pointpillars three-dimensional object detection with density clustering and dual attention mechanisms[J]. Lase Optoelect Prog, 2025, 62(10): 114-122. (in Chinese)
[26]	Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]// Proc IEEE Conf Comput Visi Patt Recogn. Providence, RI, 2012: 3354-3361.

柱体竖直分区数量， K	AP / %			检测帧数/ fps
柱体竖直分区数量， K	汽车	行人	骑行者	检测帧数/ fps
16	78.78	42.03	63.61	27.58
24	77.80	46.23	64.67	27.83
32	78.49	42.47	61.89	28.06
48	78.01	44.64	64.22	27.17

柱体竖直分区数量， K	AP / %			检测帧数/ fps
柱体竖直分区数量， K	汽车	行人	骑行者	检测帧数/ fps
16	78.78	42.03	63.61	27.58
24	77.80	46.23	64.67	27.83
32	78.49	42.47	61.89	28.06
48	78.01	44.64	64.22	27.17

模型名称	汽车AP / %			行人AP / %			骑行者AP / %			检测帧数/ fps
模型名称	简单	中等	困难	简单	中等	困难	简单	中等	困难	检测帧数/ fps
PointPillars	83.80	72.56	69.49	48.19	42.53	38.66	71.96	59.64	55.37	37.32
PointRCNN	85.59	76.39	72.45	54.50	47.92	40.73	77.08	57.06	53.04	19.71
SECOND	86.93	76.88	72.75	49.59	44.58	40.28	72.40	60.17	56.46	27.87
PillarNet	84.95	75.37	72.26	46.18	40.87	36.06	76.19	56.75	53.08	25.95
GCR-PointPillars	85.26	75.08	72.29	49.42	43.00	38.72	76.93	59.12	54.73	18.64
DCG-PointPillars	87.06	75.86	72.71	50.41	43.62	39.34	75.30	57.76	53.77	23.81
本文DV-PointPillars	87.41	77.31	73.17	54.71	47.68	42.51	81.79	62.16	58.32	28.78

模型名称	汽车AP / %			行人AP / %			骑行者AP / %			检测帧数/ fps
模型名称	简单	中等	困难	简单	中等	困难	简单	中等	困难	检测帧数/ fps
PointPillars	83.80	72.56	69.49	48.19	42.53	38.66	71.96	59.64	55.37	37.32
PointRCNN	85.59	76.39	72.45	54.50	47.92	40.73	77.08	57.06	53.04	19.71
SECOND	86.93	76.88	72.75	49.59	44.58	40.28	72.40	60.17	56.46	27.87
PillarNet	84.95	75.37	72.26	46.18	40.87	36.06	76.19	56.75	53.08	25.95
GCR-PointPillars	85.26	75.08	72.29	49.42	43.00	38.72	76.93	59.12	54.73	18.64
DCG-PointPillars	87.06	75.86	72.71	50.41	43.62	39.34	75.30	57.76	53.77	23.81
本文DV-PointPillars	87.41	77.31	73.17	54.71	47.68	42.51	81.79	62.16	58.32	28.78

模块选择			实验数值AP / %			检测帧数/ fps
最大池化注意力模块	平均池化注意力模块	竖直区域特征融合	汽车	行人	骑行者	检测帧数/ fps
			75.28	43.13	62.33	37.32
√			75.43	42.21	64.87	36.51
	√		76.26	43.89	63.22	36.42
√	√		76.62	44.53	65.07	36.09
		√	77.80	46.23	64.67	29.92
√	√	√	79.30	48.30	67.42	28.78