端到端的多任务车辆自动驾驶行为决策模型

doi:10.3969/j.issn.1674-8484.2025.04.011

汽车安全与节能学报 ›› 2025, Vol. 16 ›› Issue (4): 610-619.DOI: 10.3969/j.issn.1674-8484.2025.04.011

端到端的多任务车辆自动驾驶行为决策模型

欧阳德霖¹(), 邱一凡², 王英臣¹, 阳亮², 闵海根³, 王文军⁴, 李国法¹^,^*()

¹ 重庆大学机械与运载工程学院，重庆 400044，中国
² 深圳大学机械与控制工程学院人因工程研究所，深圳 518060，中国
³ 长安大学信息工程学院，西安 710021，中国
⁴ 清华大学车辆与运载学院，北京 100084，中国

收稿日期:2024-12-18 修回日期:2025-02-04 出版日期:2025-08-30 发布日期:2025-08-27
通讯作者: *李国法，教授。E-mail：liguofa@cqu.edu.cn。
作者简介:欧阳德霖（1998—），男（汉），广东，在读博士研究生。E-mail：delin.ouyang@stu.cqu.edu.cn。
基金资助:
智能绿色车辆与交通全国重点实验室开放基金课题(KFZ2409);国家自然科学基金(52272421)

End-to-end decision-making model for multi-task autonomous driving

OUYANG Delin¹(), QIU Yifan², WANG Yingchen¹, YANG Liang², MIN Haigen³, WANG Wenjun⁴, LI Guofa¹^,^*()

¹ College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing 400044 China
² Institute of Human Factors and Ergonomics, College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
³ School of Information Engineering, Chang’an University, Xi’an 710021, China
⁴ School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China

Received:2024-12-18 Revised:2025-02-04 Online:2025-08-30 Published:2025-08-27

摘要/Abstract

摘要：

针对自动驾驶决策任务中时空特征处理和任务间依赖性问题，该文提出一种基于三维窗口自注意力机制的端到端驾驶决策模型。通过窗口自注意力计算输入序列的时空特征，结合多任务学习和损失权重分配，提取驾驶视频特征并预测车速和转向角。结果表明：该模型在车辆转向角预测和速度预测的准确率分别达到了86.32%和85.36%，优于FMNet、Swin-Transformer和MobileT-DSM等模型，且计算量仅为57.48 GFLOPs，展现出更优的时空特征提取及性能与计算平衡。

关键词: 车辆自动驾驶, 决策控制, 深度学习, 多任务, 注意力机制

Abstract:

To address the challenges of spatiotemporal feature processing and inter-task dependencies in autonomous driving decision-making, this paper proposed an end-to-end driving decision model based on a 3D window self-attention mechanism. By applying window self-attention to compute the spatiotemporal features of the input sequence, and combining multi-task learning with loss weight allocation, the model effectively extracts features from driving videos and predicts vehicle speed and steering angle. The results demonstrate that the proposed model achieves prediction accuracies of 86.32% for steering angle and 85.36% for vehicle speed, outperforming models such as FMNet, Swin-Transformer, and MobileT-DSM. Moreover, it requires only 57.48 GFLOPs of computational cost, exhibiting superior spatiotemporal feature extraction as well as a better trade-off between performance and efficiency.

Key words: autonomous driving, decision-making and control, deep learning, multi-task, attention mechanism

中图分类号:

U461.6

欧阳德霖, 邱一凡, 王英臣, 阳亮, 闵海根, 王文军, 李国法. 端到端的多任务车辆自动驾驶行为决策模型[J]. 汽车安全与节能学报, 2025, 16(4): 610-619.

OUYANG Delin, QIU Yifan, WANG Yingchen, YANG Liang, MIN Haigen, WANG Wenjun, LI Guofa. End-to-end decision-making model for multi-task autonomous driving[J]. Journal of Automotive Safety and Energy, 2025, 16(4): 610-619.

图/表 11

参考文献 21

[1]	Federal Highway Administration. How Do Weather Events Affect Roads? [EB/OL]. [2025-07-22] https://ops.fhwa.dot.gov/weather/roadimpact.htm.
[2]	Highway Traffic Safety Administration. Early estimate of motor vehicle traffic fatalities in 2024[R]. Washington DC. U.S. Department of Transportation, 2025, DOT HS 813 710.
[3]	Handiso A, Mekebo G G, Galdassa A. Trends and determinants of road traffic accident human death in Kembata Tembaro zone, SNNPR, Ethiopia[J]. Sci J Appl Math Statist, 2022, 10(5): 85-89.
[4]	Frédéric V. A rule-based support system for dissonance discovery and control applied to car driving[J]. Expert Syst Appl, 2016, 65(12): 361-371.
[5]	LI Liangzhi, OTA Kaoru, DONG Mianxiong. Humanlike driving: Empirical decision-making system for autonomous vehicles[J]. IEEE Trans Vehi Tech, 2018, 67(8): 6814-6823.
[6]	CHEN Li, WU Penghao, CHITTA Kashyap, et al. End-to-end autonomous driving: Challenges and frontiers[J]. IEEE Trans Patt Anal Mach Intel, 2024, 46(12): 10164-10183.
[7]	LI Guofa, YANG Yifan, QU Xingda, et al. A deep learning-based image enhancement approach for autonomous driving at night[J]. Know-Based Syst, 2021, 213: 106617.
[8]	Luca A, Paola B, Silvio B, et al. An end-to-end curriculum learning approach for autonomous driving scenarios[J]. IEEE Trans Intel Transport Syst, 2022, 23(10): 19817-19826.
[9]	YI Xianyong, Hakim G, Yehia M. End-to-end neural network for autonomous steering using lidar point cloud data [C]// 2022 IEEE 65th Int’l Midwest Symp Circ Syst (MWSCAS). Fukuoka, Japan, 2022: 1-4.
[10]	TENG Siyu, CHEN Long, AI Yunfeng, et al. Hierarchical interpretable imitation learning for end-to-end autonomous driving[J]. IEEE Trans Intel Vehi, 2022, 8(1): 673-683.
[11]	Song T-Taek-Jin, Jeong Jongoh, Kim Jong-Hwan. End-to-end real-time obstacle detection network for safe self-driving via multi-task learning[J]. IEEE Trans Intel Transport Syst, 2022, 23(9): 16318-16329.
[12]	Youssef M, Abdallah W, Abdel-Halim M, et al. Time fusion net for end-to-end self-driving cars[C]// 2023 5th Novel Intel Lead Emerg Sci Conf (NILES). Giza, Egypt, 2023: 42-47.
[13]	ZHOU Wujie, DONG Shaohua, LEI Jingsheng, et al. MTANet: Multitask-aware network with hierarchical multimodal fusion for RGB-T urban scene understanding[J]. IEEE Trans Intel Vehi, 2022, 8(1): 48-58.
[14]	LI Guofa, LIN Yongjie, QU Xingda. An infrared and visible image fusion method based on multi-scale transformation and norm optimization[J]. Info Fusion, 2021, 71: 109-129.
[15]	Andrew G H, ZHU Menglong, CHEN Bo, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv: 170404861, 2017.
[16]	Maryam Z, Parham M K, Abbas K, et al. A survey of imitation learning: Algorithms, recent developments, and challenges[J]. IEEE Trans Cyber, 2024, 54(12): 7173-7186.
[17]	WU Weishang, DENG Xiaoheng, JIANG Ping, et al. Crossfuser: Multi-modal feature fusion for end-to-end autonomous driving under unseen weather conditions[J]. IEEE Trans Intel Transport Syst, 2023, 24(12): 14378-14392.
[18]	François C. Xception: Deep learning with depth wise separable convolutions[C]// Proc IEEE Conf Compu Visi Patt Recog. Honolulu, HI, USA, 2017: 1251-1258.
[19]	XIANG Qian, WANG Xiaodan, LAI Jie, et al. Quadruplet depth-wise separable fusion convolution neural network for ballistic target recognition with limited samples[J]. Expert Syst Appl, 2024, 235: No 121182.
[20]	Mark S, Andrew H, ZHU Menglong, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]// Proc IEEE Conf Compu Visi Patt Recog. Salt Lake City, UT, USA, 2018: 4510-4520.
[21]	Alex K, Yarin G, Roberto C. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]// Proc IEEE Conf Compu Visi Patt Recog. Salt Lake City, UT, USA, 2018: 7482-7491.

Operator	t	c	n	s
Conv2d 3×3	-	16	1	2
Bottlenecks	6	8	1	2
Bottlenecks	6	8	2	2
Bottlenecks	6	16	3	2
Bottlenecks	6	24	4	2
Bottlenecks	6	31	3	1
Bottlenecks	6	56	3	1
Conv2d 1×1		1280	1	1
Global Average Pooling	-	-	1	-
Linear	-	512	1	1

Operator	t	c	n	s
Conv2d 3×3	-	16	1	2
Bottlenecks	6	8	1	2
Bottlenecks	6	8	2	2
Bottlenecks	6	16	3	2
Bottlenecks	6	24	4	2
Bottlenecks	6	31	3	1
Bottlenecks	6	56	3	1
Conv2d 1×1		1280	1	1
Global Average Pooling	-	-	1	-
Linear	-	512	1	1

模型	车辆转向角				车辆行驶速度
模型	MAE / rad	RMSE / rad	AUC	Acc / %	MAE / (km·h^-1)	RMSE / (km·h^-1)	AUC	Acc / %
FMNet	0.129	0.214	0.893	83.78	-	-	-	-
Swin-Transformer	0.245	0.292	0.862	78.91	2.31	2.69	0.845	80.17
MobileT-DSM	0.163	0.212	0.878	81.69	1.91	2.47	0.857	83.17
3D ST-DSM	0.119	0.148	0.897	86.32	1.77	2.38	0.868	85.36

模型	车辆转向角				车辆行驶速度
模型	MAE / rad	RMSE / rad	AUC	Acc / %	MAE / (km·h^-1)	RMSE / (km·h^-1)	AUC	Acc / %
FMNet	0.129	0.214	0.893	83.78	-	-	-	-
Swin-Transformer	0.245	0.292	0.862	78.91	2.31	2.69	0.845	80.17
MobileT-DSM	0.163	0.212	0.878	81.69	1.91	2.47	0.857	83.17
3D ST-DSM	0.119	0.148	0.897	86.32	1.77	2.38	0.868	85.36

模型	参数量	计算量(GFLOPs)
FMNet	99 393 775	424.91
Swin-Transformer	27 521 661	4.49
MobileT-DSM	45 029 621	16.66
3D ST-DSM	27 485 088	57.48

端到端的多任务车辆自动驾驶行为决策模型

End-to-end decision-making model for multi-task autonomous driving

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 21

相关文章 15

编辑推荐

Metrics

本文评价

期刊信息

在线期刊

作者中心

审稿中心

联系我们

[1]	关永学, 刘森海, 韩勇, 徐莉, 舒伟斌, 樊晨旭. 面向多障碍物场景的车辆紧急避撞耦合决策与轨迹规划方法[J]. 汽车安全与节能学报, 2025, 16(6): 945-954.
[2]	程泽阳, 段奕阳, 杨蒙蒙, 冯忠祥, 王鹤, 朱晓俊, 保丽霞. 基于混合神经网络的交织区危险驾驶与风格的识别[J]. 汽车安全与节能学报, 2025, 16(5): 688-697.
[3]	刘国盛, 苏欣儿, 王建锋, 刘臻玮. 基于深度生成网络的夜间车道线检测方法[J]. 汽车安全与节能学报, 2025, 16(3): 452-462.
[4]	虞安军, 励英迪, 杨哲懿, 付崇宇, 童蔚苹, 余佳, 刘云海, 刘志远. 基于多维注意力机制的高速公路交通流量预测方法[J]. 汽车安全与节能学报, 2025, 16(3): 463-469.
[5]	刘育秋, 唐亮, 王宁珍. 基于视觉的汽车检测系统物理对抗攻击[J]. 汽车安全与节能学报, 2025, 16(1): 50-56.
[6]	石天京, 李旭. 基于动态图自注意力的车流参数预测方法[J]. 汽车安全与节能学报, 2024, 15(5): 680-688.
[7]	石丽英, 周国峰, 李泽星, 曹莉凌. 基于3DSSD的差异路口自适应联邦学习算法[J]. 汽车安全与节能学报, 2024, 15(5): 732-741.
[8]	高凯, 刘健, 刘林鸿, 刘欣宇, 张金来, 杜荣华. 基于LSTM-多头混合注意力的可解释换道意图预测[J]. 汽车安全与节能学报, 2024, 15(5): 763-773.
[9]	姜健, 王平. 融合注意力机制的残差型双向LSTM汽车电机轴承诊断[J]. 汽车安全与节能学报, 2024, 15(4): 511-519.
[10]	张晨, 刘畅, 赵津, 王广玮, 许庆. 基于多尺度注意力机制的实时激光雷达点云语义的分割[J]. 汽车安全与节能学报, 2024, 15(4): 591-601.
[11]	文斌, 丁弈夫, 胡一鸣, 彭顺, 胡晖. 基于MSFA-Net的车辆及车道线检测算法[J]. 汽车安全与节能学报, 2024, 15(3): 433-442.
[12]	张海民. 基于深度学习模型的疲劳驾驶行为识别算法[J]. 汽车安全与节能学报, 2024, 15(1): 121-128.
[13]	张希, 廖宇兰, 李沁逸, 陈益庆. 安全行驶下的车用电机轴承的数字孪生故障诊断[J]. 汽车安全与节能学报, 2023, 14(2): 232-238.
[14]	宋巍, 张光德. 基于改进EfficientDet网络的疲劳驾驶状态检测方法[J]. 汽车安全与节能学报, 2022, 13(4): 651-658.
[15]	房亮, 关志伟, 王涛, 龚进峰, 杜峰. 基于深度学习LSTM的智能车辆避撞模型及验证[J]. 汽车安全与节能学报, 2022, 13(1): 104-111.