欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2022, Vol. 13 ›› Issue (4): 705-717.DOI: 10.3969/j.issn.1674-8484.2022.04.012

• 智能驾驶与智慧交通 • 上一篇    下一篇

基于深度强化学习的智能网联车辆换道轨迹规划方法

冯耀1(), 景首才1,3,*(), 惠飞1, 赵祥模1, 刘建蓓2,3   

  1. 1.长安大学 信息工程学院,陕西 710064,中国
    2.交通运输部交通安全与应急保障技术行业研发中心,陕西 710075,中国
    3.中交第一公路勘察设计研究院有限公司,陕西 710075,中国
  • 收稿日期:2021-11-27 修回日期:2022-07-18 出版日期:2022-12-31 发布日期:2023-01-01
  • 通讯作者: 景首才
  • 作者简介:*景首才 (1991—),男 (汉),甘肃,讲师。E-mail:scjing@che.edu.cn
    冯耀 (1998—),男 (汉),山西,硕士研究生。E-mail:yaofeng@chd.eud.cn
  • 基金资助:
    国家重点研发计划课题(2021YFB250120002);陕西省重点产业创新链(2021ZDLGY04-06)

Deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles

FENG Yao1(), JING Shoucai1,3,*(), HUI Fei1, ZHAO Xiangmo1, LIU Jianbei2,3   

  1. 1. School of Information Engineering, Chang’an University, Shaanxi 710064, China
    2. Research Center of Traffic Safety and Emergency Security Technology, Ministry of Transport, Shaanxi 710075, China
    3. CCCC First Highway Consultants Co., Ltd, Shaanxi 710075, China
  • Received:2021-11-27 Revised:2022-07-18 Online:2022-12-31 Published:2023-01-01
  • Contact: JING Shoucai

摘要:

以提高智能网联车辆换道安全和效率,降低燃油消耗为目的,该文提出了一种基于深度强化学习的智能网联车辆(ICV)换道轨迹规划方法。分析复杂交通场景智能网联车辆换道功能需求,设计了分层式智能网联车辆换道轨迹规划架构;兼顾车辆安全和换道效率,设计了基于完全信息纯策略博弈的换道行为决策模型;解耦车辆纵横向运动状态,构造了以燃油消耗和乘客舒适度为目标的联合优化函数,提出了基于双延迟深度确定性策略梯度(TD3)的智能网联车辆纵横向换道轨迹规划方法,得到了车辆纵横向优化换道轨迹,并利用搭建的3个典型换道仿真场景,验证了算法的有效性。结果表明:与深度确定性策略梯度(DDPG)算法相比,提出的方法在左换道和右换道实验中的训练效率平均提升了约10.5%,平均油耗分别减少了65%和44%,而且单步轨迹规划时间在10 ms内,能够实时获取安全、节能、舒适的换道轨迹。

关键词: 智能网联车辆(ICV), 深度强化学习, 换道, 轨迹规划

Abstract:

A deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles (ICVs) was proposed to improve the lane-changing safety and efficiency of ICVs and reduce fuel consumption. A hierarchical ICV lane-changing trajectory planning architecture was designed based on the functional requirements of ICVs in complex traffic scenarios. Considering vehicle safety and lane-changing efficiency, a lane-changing behavior decision model was constructed based on complete information pure strategy game. A joint optimization function representing fuel consumption and passenger comfort was also constructed with decoupling the longitudinal and lateral motion states of vehicles. And based on twin delayed deep deterministic policy gradient (TD3) algorithm, a longitudinal and lateral lane-changing trajectory planning method of ICVs was proposed to achieve the longitudinal and lateral optimized lane-changing trajectory. The effectiveness of the algorithm was verified by using 3 typical lane-changing simulation scenarios. The results show that compared with the deep deterministic policy (DDPG) algorithm, the training efficiency of the proposed method in the experiment of left lane-changing and right lane-changing is increased by about 10.5% on average, the average fuel consumption is reduced by 65% and 44%, respectively, and the single step trajectory planning time is within 10 ms, which can obtain a safe, energy-saving and comfortable lane-changing trajectory in real time.

Key words: intelligent and connected vehicles (ICVs), deep reinforcement learning, lane-changing, trajectory planning

中图分类号: