欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2024, Vol. 15 ›› Issue (5): 702-710.DOI: 10.3969/j.issn.1674-8484.2024.05.008

• 智能驾驶与智慧交通 • 上一篇    下一篇

基于深度强化学习CLPER-DDPG的车辆纵向速度规划

柳鹏1(), 赵克刚1,*(), 梁志豪1, 叶杰2   

  1. 1.华南理工大学 机械与汽车工程学院,广州 510641,中国
    2.佛山大学 机电工程与自动化学院,佛山 528225,中国
  • 收稿日期:2024-05-15 修回日期:2024-10-09 出版日期:2024-10-31 发布日期:2024-11-07
  • 通讯作者: 赵克刚,副教授。E-mail:kgzhao@scut.edu.cn
  • 作者简介:柳鹏(2001—),男(汉),江西,硕士研究生。E-mail:202320100998@mail.scut.edu.cn
  • 基金资助:
    车联网环境下智能网联汽车群体协作与优化控制项目(2019A1515110562)

Vehicle longitudinal speed planning based on deep reinforcement learning CLPER-DDPG

LIU Peng1(), ZHAO Kegang1,*(), LIANG Zhihao1, YE Jie2   

  1. 1. South China University of Technology, School of Mechanical & Automotive Engineering, Guangzhou 510641, China
    2. Foshan University, School of Mechatronic Engineering and Automation, Foshan 528225, China
  • Received:2024-05-15 Revised:2024-10-09 Online:2024-10-31 Published:2024-11-07

摘要:

为了解决车辆纵向速度规划任务中规划器不易收敛以及在多场景之间切换时稳定性差的问题,基于多层感知机设计了车辆纵向速度规划器,构建了结合优先经验回放机制和课程学习机制的深度确定性策略梯度算法。该文设计了仿真场景进行模型的训练和测试,并对深度确定性策略梯度(DDPG)、结合优先经验回放机制的深度确定性策略梯度(PER-DDPG)、结合优先经验回放机制和课程学习机制的深度确定性策略梯度(CLPER-DDPG)3种算法进行对比实验,并在园区内的真实道路上进行实车实验。 结果表明:相比于DDPG算法,CLPER-DDPG算法使规划器的收敛速度提高了56.45%,距离差均值降低了16.61%,速度差均值降低了15.25%,冲击度均值降低了18.96%。此外,当实验场景的环境气候和传感器硬件等参数发生改变时,模型能保证在安全的情况下完成纵向速度规划任务。

关键词: 自动驾驶, 纵向速度规划, 深度确定性策略梯度(DDPG)算法, 课程学习机制, 优先经验回放机制

Abstract:

To solve the problems of planner convergence difficulty in vehicle longitudinal speed planning and stability issues during scenario transitions, a planner was designed using a multilayer perceptron, incorporating the Deep Deterministic Policy Gradient (DDPG) algorithm with Prioritized Experience Replay (PER) and Curriculum Learning (CL). The simulation scenarios were designed for model training and testing, as well as comparative experiments among the three algorithms of DDPG, DDPG with Prioritized Experience Replay (PER-DDPG), and DDPG with both Prioritized Experience Replay and Curriculum Learning (CLPER-DDPG). Real-vehicle experiments were also carried out on actual roads within the Park. The results show that the CLPER-DDPG algorithm, comparing with the DDPG algorithm, the convergence speed of the planner is improved by 56.45%, the mean distance error is reduced by 16.61%, the mean speed error is decreased by 15.25%, and the mean jerk is lowered by 18.96%. Furthermore, when the parameters of environmental conditions and sensor hardware in the experimental scenarios are changed, the model could ensure that the longitudinal speed planning task will be completed safely.

Key words: autonomous driving, longitudinal velocity planning, deep deterministic policy gradient (DDPG) algorithm, curriculum learning mechanism, prioritized experience replay mechanism

中图分类号: