欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2025, Vol. 16 ›› Issue (5): 793-801.DOI: 10.3969/j.issn.1674-8484.2025.05.014

• 智能驾驶与智慧交通 • 上一篇    下一篇

基于双重池化注意力机制和竖直特征融合的DV-PointPillars三维目标检测模型

潘玉恒(), 任晨, 鲁维佳(), 李洋   

  1. 天津城建大学 计算机与信息工程学院,天津 300384,中国
  • 收稿日期:2025-07-11 修回日期:2025-08-24 出版日期:2025-10-31 发布日期:2025-11-10
  • 通讯作者: *鲁维佳,副教授。E-mail:luweijia@tcu.edu.cn
  • 作者简介:潘玉恒(1978—),女(汉),山东,副教授。E-mail:panyuheng@tju.edu.cn
  • 基金资助:
    国家自然科学基金项目(62204168);天津市教委社会科学重大项目(2024JWZD37);天津市科技计划项目(21YDTPJC00780);天津市科技计划项目(23YDTPJC00450)

DV-PointPillars 3D object detection model based on dual pooling attention mechanism and vertical feature fusion

PAN Yuheng(), REN Chen, LU Weijia(), LI Yang   

  1. School of Computer and Information Engineering, Tianjin Chengjian University, Tianjin 300384, China
  • Received:2025-07-11 Revised:2025-08-24 Online:2025-10-31 Published:2025-11-10

摘要: 为了改善柱体化三维目标检测模型中存在的柱体特征表征能力不足、误检漏检的问题,提出一种基于双重池化注意力机制和竖直特征融合的DV-PointPillars模型。在编码网络中引入最大和平均双重池化注意力机制,充分利用柱体内的点云信息,提升柱体特征的表征能力;设计竖直区域特征提取网络,获取柱体在竖直方向上的特征信息,并在主干网络融合特征,改善编码方式导致的信息压缩问题,减少误判并提升遮挡情况的识别能力。采用KITTI数据集对汽车、行人、骑行者3个类别从简单、中等、困难3个难度进行实验。结果表明: 相较于PointPillars模型,DV-PointPillars模型在增加3个模块后对车辆、行人、骑行者3个类别的3D平均检测准确度分别提升4.02%、5.17%、5.09%,显示出该模型的有效性。

关键词: 自动驾驶, 环境感知, 三维目标检测, 点云, 注意力池化

Abstract:

A DV-PointPillars 3D object detection model based on dual pooling attention mechanism and vertical feature fusion was proposed to improve the issues of insufficient pillar feature representation ability and false/missed detection in pillar-based 3D object detection methods for point clouds. The max and average dual pooling attention mechanism was introduced into the encoding network. By utilizing both max pooling attention and average pooling attention mechanisms, this approach can fully leverage the point cloud information within pillars, thereby the representation ability of pillar features was improved. A vertical region feature generation network was designed to obtain the feature information of the pillars in the vertical direction, and the features were fused in the backbone network to improve the information compression problem caused by the encoding method, reduce misjudgment and improve the recognition ability of occlusion. Experiments were conducted on three categories of cars, pedestrians and cyclists using the KITTI dataset from three levels of difficulty: simple, medium and difficult. The results show that: compared with the PointPillars model, the average 3D detection average precision of the DV-PointPillars model for the three categories of vehicles, pedestrians, and cyclists increased by 4.02%, 5.17%, and 5.09% respectively after adding three modules, which verifies the effectiveness of the proposed method.

Key words: autonomous driving, environmental perception, 3D object detection, point cloud, attention pooling

中图分类号: