基于风险敏感的自动驾驶汽车分层强化学习决策

doi:10.3969/j.issn.1674-8484.2025.02.016

摘要/Abstract

摘要：

为了使自动驾驶汽车的行为决策能充分考虑交通环境中固有的不确定性，该文在传统的RainbowDQN算法基础上，引入分位数回归和条件风险价值（CVaR），将低概率风险纳入考虑，适当平衡风险与收益，使其能做出更为安全拟人的驾驶决策。基于Markov框架建立行为决策模型，综合考虑安全性、效率和舒适性设计奖励函数和动作空间，搭建规划控制模型，利用公开自然驾驶智能汽车仿真测试环境（OnSite）平台搭建高速路汇入汇出和交叉口2场景，采用OnSite评价工具，将RainbowDQN-CVaR、RainbowDQN-QR、RainbowDQN和DSAC-T共4种算法进行仿真比较。结果表明：在复杂的高速路汇入汇出场景和交叉口场景，提出的RainbowDQN-CVaR算法得分比传统的 RainbowDQN 算法分别高55.3%和47%，比RainbowDQN-QR 算法得分高17.7%和 34.3%，比DSAC-T算法得分高2.8%和62.7%。验证了基于RainbowDQN-CVaR行为决策模型的有效性，在较复杂的交通环境下，能做出更加安全、合理的决策，使自动驾驶车辆具有更高的行车安全性和效率。

关键词: 自动驾驶, 强化学习, 行为决策, 分位数回归, 条件风险价值（CVaR）

Abstract:

In order to make the behavior decision of autonomous vehicles fully consider the inherent uncertainty in the traffic environment, this paper introduced quantile regression and Conditional Value at Risk (CVaR) based on the traditional RainbowDQN algorithm, taking low-probability risks into consideration, and properly balancing risks and benefits, so that it can make safer and more humane driving decisions. A behavioral decision model was established based on the Markov framework, and the reward function and action space were designed by comprehensively considering safety, efficiency and comfort. A planning and control model was built, and two scenarios of highway inflow and outflow and intersection were built using the Open Natural Driving Intelligent Vehicle Simulation Test Environment (OnSite) platform. The OnSite evaluation tool was used to simulate and compare the four algorithms of RainbowDQN-CVaR, RainbowDQN-QR, RainbowDQN and DSAC-T. The results show that in complex highway merging and exiting scenarios and intersection scenarios, the proposed RainbowDQN-CVaR algorithm scores 55.3% and 47% higher than the traditional RainbowDQN algorithm, 17.7% and 34.3% higher than the RainbowDQN-QR algorithm, and 2.8% and 62.7% higher than the DSAC-T algorithm. The effectiveness of the RainbowDQN-CVaR behavior decision model is verified, and it can make safer and more reasonable decisions in a more complex traffic environment, making the autonomous driving vehicle have higher driving safety and efficiency.

Key words: autonomous driving, reinforcement learning, behavioral decision-making, quantile regression, conditional value at risk (CVaR)

中图分类号:

U461.91

胡志龙, 裴晓飞, 周洪龙, 魏炜冉. 基于风险敏感的自动驾驶汽车分层强化学习决策[J]. 汽车安全与节能学报, 2025, 16(2): 326-333.

HU Zhilong, PEI Xiaofei, ZHOU Honglong, WEI Weiran. Risk-sensitive hierarchical reinforcement learning decision-making for autonomous vehicles[J]. Journal of Automotive Safety and Energy, 2025, 16(2): 326-333.

图/表 11

参考文献 17

[1]	Bojarski M, Testa D, Dworakowski D, et al. End to end learning for self-driving car[J/OL]. (2016-04-25) https://doi.org/10.48550/arXiv.1604.07316.
[2]	杨顺. 从虚拟到现实的智能车辆深度强化学习控制研究[D]. 长春: 吉林大学, 2019.
	YANG Shun. Research on deep reinforcement learning control of intelligent vehicles from virtual to real[D]. Changchun: Jilin University, 2019. (in Chinese)
[3]	李伟东, 马草原, 史浩, 等. 基于分层强化学习的自动驾驶决策控制算法[J/OL]. 吉林大学学报(工学版), 2023: 1-8. (2023-12-19) https://doi.org/10.13229/j.cnki.jdxbgxb.20230891.
	LI Weidong, MA Caoyuan, SHI Hao, et al. Autonomous driving decision-making control algorithm based on hierarchical reinforcement learning[J/OL]. J Jilin University (Engi Tech Edit), 2023: 1-8. (2023-12-19) https://doi.org/10.13229/j.cnki.jdxbgxb.20230891. (in Chinese)
[4]	YANG Lan, HU Zhiqiang, WANG Liang, et al. Entire route eco-driving method for electric bus based on rule-based reinforcement learning[J]. **Transport Res, Part E: Logist Transport Rev, 2024, 189**: 1-24.
[5]	Min K, Kim H, Huh K. Deep distributional reinforcement learning based high-Level driving policy determination[J]. IEEE Transport Intel Vehi, 2019, 4(3): 416-424.
[6]	ZHOU Lun, WANG Ke, YU Huang, et al. Path planning of improved DQN based on quantile regression [C]// 2022 Int’l Conf Artif Intel Comput Info Tech (AICIT). IEEE, 2022: 1-4.
[7]	MA Xiaoteng, XIA Li, ZHOU Zhengyuan, et al. DSAC: Distributional soft actor critic for risk-sensitive reinforcement learning[J/OL]. (2020-04-30) https://doi.org/10.48550/arXiv.2004.14547.
[8]	Ugur Y, Tufan K, Kemal U. A new approach for tactical decision making in lane changing:Sample efficient deep Q learning with a safety feedback reward [C]// 2020 IEEE Intel Vehi Symp (IV). Las Vegas, NV, USA, 2020: 1156-1161.
[9]	Bouton M, Nakhaei A, Fujimura K, et al. Safe reinforcement learning with scene decomposition for navigating complex urban environments [C]// 2019 IEEE Intel Vehi Symp (IV). Paris, France, 2019: 1469-1476.
[10]	CHEN Dong, JIANG Longsheng, WANG Yue, et al. Autonomous driving using safe reinforcement learning by incorporating a regret-based human lane-changing decision model [C]// 2020 Ame Contr Conf (ACC). Denver, CO, USA, 2020: 4355-4361.
[11]	Baheri A, Nageshrao S, Tseng H, et al. Deep reinforcement learning with enhanced safety for autonomous highway driving [C]// 2020 IEEE Intel Vehi Symp (IV). Las Vegas, NV, USA, 2020: 1550-1555.
[12]	Bellemare M G, Dabney W, Munos R. A distributional perspective on reinforcement learning[C]// Int’l Conf Mach Learn. PMLR, 2017: 449-458.
[13]	Danial K, Carlos L, Martin L, et al. Risk-aware high-level decisions for automated driving at occluded intersections with reinforcement learning [C]// 2020 IEEE Intel Vehi Symp (IV). Las Vegas, NV, USA, 2020: 1205-1212.
[14]	WEN Lu, DUAN Jingliang, LI Shengbo, et al. Safe reinforcement learning for autonomous vehicles through parallel constrained policy optimization [C]// 2020 IEEE 23rd Int’l Conf Intel Transport Syst (ITSC). Rhodes, Greece, 2020: 1-7.
[15]	詹吟霄, 刘潇, 梁军. 基于深度强化学习与风险矫正的智能车辆决策研究[J]. 汽车工程学报, 2023, 13(5): 656-667.
	ZHAN Yinxiao, LIU Xiao, LIANG Jun. Research on intelligent vehicle decision-making based on deep reinforcement learning and risk correction[J]. China J Autom Engi, 2023, 13(5): 656-667. (in Chinese)
[16]	Dabney W, Rowland M, Bellemare M, et al. Distributional reinforcement learning with quantile regression[C]// Proc AAAI Conf Artif Intel. New Orleans, USA, 2018: 2892-2901.
[17]	DUAN Jingliang, WANG Wenxuan, XIAO Liming, et al. DSAC-T: Distributional soft actor-critic with three refinements[J/OL]. (2023-10-09) https://doi.org/10.48550/arXiv.2310.05858.

工况	车速区间/ (m·s^-1)	预瞄距离/ m
直行	0~40	v_a×1.0
直行	> 40	v_a×0.8
换道	0~15	v_a×2.5
	15~40	v_a×2.0
	> 40	v_a×1.5

工况	车速区间/ (m·s^-1)	预瞄距离/ m
直行	0~40	v_a×1.0
直行	> 40	v_a×0.8
换道	0~15	v_a×2.5
	15~40	v_a×2.0
	> 40	v_a×1.5

车速区间(m/s)	PID参数	预瞄距离/ m
0~5	[0.075, 0, 0]	v_a×2.5
5~10		v_a×2.0
> 10		v_a×1.5

车速区间(m/s)	PID参数	预瞄距离/ m
0~5	[0.075, 0, 0]	v_a×2.5
5~10		v_a×2.0
> 10		v_a×1.5

参数名称	描述	参数值
隐藏层(Linear)参数	各层神经元数	(256, 128)
优化器	用于梯度下降	Adam
折扣系数	计算折扣奖励	0.995
网络学习率	策略梯度更新步长	0.000 1
激活函数	增加神经网络非线性	Relu
批量大小	批量梯度下降中样本数量	32
分布最大最小值	将Q值分布为一个区间	[-100, 100]
分布个数	将Q值所属区间等分	51
分位数数量	估计Q分布的分位数数量	200
置信水平	条件风险价值参数	0.7