欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2025, Vol. 16 ›› Issue (2): 326-333.DOI: 10.3969/j.issn.1674-8484.2025.02.016

• 智能驾驶与智慧交通 • 上一篇    下一篇

基于风险敏感的自动驾驶汽车分层强化学习决策

胡志龙1(), 裴晓飞1,2,*(), 周洪龙1, 魏炜冉2   

  1. 1.武汉理工大学 现代汽车零部件技术湖北省重点实验室,武汉 430070,中国
    2.武汉理工大学 汽车零部件技术湖北省协同创新中心,武汉 430070,中国
  • 收稿日期:2024-07-08 修回日期:2024-09-24 出版日期:2025-04-30 发布日期:2025-04-22
  • 通讯作者: * 裴晓飞,副教授。E-mail:Peixiaofei7@163.com
  • 作者简介:胡志龙(2000—),男(汉),四川,硕士研究生。E-mail:286326@whut.edu.cn
  • 基金资助:
    国家自然科学基金项目(52272426)

Risk-sensitive hierarchical reinforcement learning decision-making for autonomous vehicles

HU Zhilong1(), PEI Xiaofei1,2,*(), ZHOU Honglong1, WEI Weiran2   

  1. 1. Hubei Key Laboratory of Advanced Technology of Automotive Components, Wuhan University of Technology, Wuhan 430070, China
    2. Hubei Collaborative Innovation Center of Automotive Components Technology, Wuhan University of Technology, Wuhan 430070, China
  • Received:2024-07-08 Revised:2024-09-24 Online:2025-04-30 Published:2025-04-22

摘要:

为了使自动驾驶汽车的行为决策能充分考虑交通环境中固有的不确定性,该文在传统的RainbowDQN算法基础上,引入分位数回归和条件风险价值(CVaR),将低概率风险纳入考虑,适当平衡风险与收益,使其能做出更为安全拟人的驾驶决策。基于Markov框架建立行为决策模型,综合考虑安全性、效率和舒适性设计奖励函数和动作空间,搭建规划控制模型,利用公开自然驾驶智能汽车仿真测试环境(OnSite)平台搭建高速路汇入汇出和交叉口2场景,采用OnSite评价工具,将RainbowDQN-CVaR、RainbowDQN-QR、RainbowDQN和DSAC-T共4种算法进行仿真比较。结果表明:在复杂的高速路汇入汇出场景和交叉口场景,提出的RainbowDQN-CVaR算法得分比传统的 RainbowDQN 算法分别高55.3%和47%,比RainbowDQN-QR 算法得分高17.7%和 34.3%,比DSAC-T算法得分高2.8%和62.7%。验证了基于RainbowDQN-CVaR行为决策模型的有效性,在较复杂的交通环境下,能做出更加安全、合理的决策,使自动驾驶车辆具有更高的行车安全性和效率。

关键词: 自动驾驶, 强化学习, 行为决策, 分位数回归, 条件风险价值(CVaR)

Abstract:

In order to make the behavior decision of autonomous vehicles fully consider the inherent uncertainty in the traffic environment, this paper introduced quantile regression and Conditional Value at Risk (CVaR) based on the traditional RainbowDQN algorithm, taking low-probability risks into consideration, and properly balancing risks and benefits, so that it can make safer and more humane driving decisions. A behavioral decision model was established based on the Markov framework, and the reward function and action space were designed by comprehensively considering safety, efficiency and comfort. A planning and control model was built, and two scenarios of highway inflow and outflow and intersection were built using the Open Natural Driving Intelligent Vehicle Simulation Test Environment (OnSite) platform. The OnSite evaluation tool was used to simulate and compare the four algorithms of RainbowDQN-CVaR, RainbowDQN-QR, RainbowDQN and DSAC-T. The results show that in complex highway merging and exiting scenarios and intersection scenarios, the proposed RainbowDQN-CVaR algorithm scores 55.3% and 47% higher than the traditional RainbowDQN algorithm, 17.7% and 34.3% higher than the RainbowDQN-QR algorithm, and 2.8% and 62.7% higher than the DSAC-T algorithm. The effectiveness of the RainbowDQN-CVaR behavior decision model is verified, and it can make safer and more reasonable decisions in a more complex traffic environment, making the autonomous driving vehicle have higher driving safety and efficiency.

Key words: autonomous driving, reinforcement learning, behavioral decision-making, quantile regression, conditional value at risk (CVaR)

中图分类号: