基于深度强化学习CLPER-DDPG的车辆纵向速度规划

doi:10.3969/j.issn.1674-8484.2024.05.008

摘要/Abstract

摘要：

为了解决车辆纵向速度规划任务中规划器不易收敛以及在多场景之间切换时稳定性差的问题，基于多层感知机设计了车辆纵向速度规划器，构建了结合优先经验回放机制和课程学习机制的深度确定性策略梯度算法。该文设计了仿真场景进行模型的训练和测试，并对深度确定性策略梯度（DDPG）、结合优先经验回放机制的深度确定性策略梯度（PER-DDPG）、结合优先经验回放机制和课程学习机制的深度确定性策略梯度（CLPER-DDPG）3种算法进行对比实验，并在园区内的真实道路上进行实车实验。结果表明：相比于DDPG算法，CLPER-DDPG算法使规划器的收敛速度提高了56.45%，距离差均值降低了16.61%，速度差均值降低了15.25%，冲击度均值降低了18.96%。此外，当实验场景的环境气候和传感器硬件等参数发生改变时，模型能保证在安全的情况下完成纵向速度规划任务。

关键词: 自动驾驶, 纵向速度规划, 深度确定性策略梯度（DDPG）算法, 课程学习机制, 优先经验回放机制

Abstract:

To solve the problems of planner convergence difficulty in vehicle longitudinal speed planning and stability issues during scenario transitions, a planner was designed using a multilayer perceptron, incorporating the Deep Deterministic Policy Gradient (DDPG) algorithm with Prioritized Experience Replay (PER) and Curriculum Learning (CL). The simulation scenarios were designed for model training and testing, as well as comparative experiments among the three algorithms of DDPG, DDPG with Prioritized Experience Replay (PER-DDPG), and DDPG with both Prioritized Experience Replay and Curriculum Learning (CLPER-DDPG). Real-vehicle experiments were also carried out on actual roads within the Park. The results show that the CLPER-DDPG algorithm, comparing with the DDPG algorithm, the convergence speed of the planner is improved by 56.45%, the mean distance error is reduced by 16.61%, the mean speed error is decreased by 15.25%, and the mean jerk is lowered by 18.96%. Furthermore, when the parameters of environmental conditions and sensor hardware in the experimental scenarios are changed, the model could ensure that the longitudinal speed planning task will be completed safely.

Key words: autonomous driving, longitudinal velocity planning, deep deterministic policy gradient (DDPG) algorithm, curriculum learning mechanism, prioritized experience replay mechanism

中图分类号:

柳鹏, 赵克刚, 梁志豪, 叶杰. 基于深度强化学习CLPER-DDPG的车辆纵向速度规划[J]. 汽车安全与节能学报, 2024, 15(5): 702-710.

LIU Peng, ZHAO Kegang, LIANG Zhihao, YE Jie. Vehicle longitudinal speed planning based on deep reinforcement learning CLPER-DDPG[J]. Journal of Automotive Safety and Energy, 2024, 15(5): 702-710.

图/表 15

参考文献 19

[1]	林泓熠, 刘洋, 李深, 等. 车路协同系统关键技术研究进展[J]. 华南理工大学学报(自然科学版), 2023, 51(10): 46-67. doi: 10.12141/j.issn.1000-565X.230200
	LIN Hongyi, LIU Yang, LI Shen, et al. Research progress on key technologies in the cooperative vehicle infrastructure system[J]. J South Chin Univ Tech (Nat Sci Edit), 2023, 51(10): 46-67. (in Chinese)
[2]	芦勇, 何一超, 田贺, 等. 面向量产的自适应巡航控制系统纵向加速度规划方法研究[J]. 汽车工程, 2023, 45(10): 1803-1814.
	LU Yong, LE Yichao, TIAN He, et al. Research on longitudinal acceleration planning method of adaptive cruise control system for mass production[J]. Autom Engineering, 2023, 45(10): 1803-1814. (in Chinese)
[3]	李旭, 谢宁, 王建春, 等. 面向切入场景的变权重自适应巡航控制策略[J]. 重庆理工大学学报(自然科学), 2023, 37(4): 10-18.
	LI Xu, XIE Ning, WANG Jianchun. Variable weight adaptive cruise control strategy for cut-in scenes[J]. J Chongqing Univ Tech (Nat Sci), 2023, 37(4): 10-18. (in Chinese)
[4]	张德兆, 王建强, 刘佳熙, 等. 加速度连续型自适应巡航控制模式切换策略[J]. 清华大学学报(自然科学版), 2010, 50(8): 1277-1281.
	ZHANG Dezhao, WANG Jianqiang, LIU Jiaxi, et al. Switching strategy for adaptive cruise control modes for continuous acceleration[J]. J Tsinghua Univ (Sci Tech), 2010, 50(8): 1277-1281. (in Chinese)
[5]	ZHOU Yang, Ahn S, Chitturi M, et al. Rolling horizon stochastic optimal control strategy for ACC and CACC under uncertainty[J]. Transport Res Part C: Emerg Tech, 2017, 83: 61 Chinese -76.
[6]	CHU Hongqing, GUO Lulu, YAN Yongjun, et al. Self-learning optimal cruise control based on individual car-following style[J]. IEEE Trans Intel Transport Syst, 2020, 22(10): 6622-6633.
[7]	韩天园, 沈永俊, 鲍琼, 等. 基于类人决策与横纵向协同的车辆弯道自适应巡航控制策略[J]. 中国公路学报, 2023, 36(10): 211-223. doi: 10.19721/j.cnki.1001-7372.2023.10.017
	HAN Tianyuan, SHEN Yongjun, BAO Qiong, et al. Adaptive cruise control strategy for vehicle at curves based on human-like decision-making and lateral-longitudinal coordination[J]. China J Highway Transport, 2023, 36(10): 211-223. (in Chinese)
[8]	李涵, 余贵珍, 周彬, 等. 面向非结构化道路场景的车辆全局速度规划[J]. 汽车安全与节能学报, 2023, 14(3): 319-328.
	LI Han, YU Guizhen, ZHOU Bin, et al. Vehicle global speed planning for unstructured roads scenario[J]. J Autom Safe Energ, 2023, 14(3): 319-328. (in Chinese)
[9]	何逸煦, 林泓熠, 刘洋, 等. 强化学习在自动驾驶技术中的应用与挑战[J]. 同济大学学报(自然科学版), 2024, 52(4): 520-531.
	HE Yixu, LIN Hongyi, LIU Yang, et al. Applications and challenges of reinforcement learning in autonomous driving technology[J]. J Tongji Univ (Nat Sci), 2024, 52(4): 520-531. (in Chinese)
[10]	张新锋, 吴琳. 基于集成深度强化学习的自动驾驶车辆行为决策模型[J]. 汽车安全与节能学报, 2023, 14(4): 472-479.
	ZHANG Xinfeng, WU Lin. Behavior decision-making model for autonomous vehicles based on an ensemble deep reinforcement learning[J]. J Autom Safe Energ, 2023, 14(4): 472-479. (in Chinese)
[11]	LI Guoqiang, Görges D. Ecological adaptive cruise control for vehicles with step-gear transmission based on reinforcement learning[J]. IEEE Trans Intel Transport Syst, 2019, 21(11): 4895-4905. (in Chinese)
[12]	朱冰, 蒋渊德, 赵健, 等. 基于深度强化学习的车辆跟驰控制[J]. 中国公路学报, 2019, 32(6): 53-60. doi: 10.19721/j.cnki.1001-7372.2019.06.005
	ZHU Bing, JIANG Yuande, ZHAO Jian, et al. A car-following control algorithm based on deep reinforcement learning[J]. Chin J Highway Transport, 2019, 32(6): 53-60. (in Chinese)
[13]	Nevmyvaka Y, FENG Yi, Kearns M. Reinforcement learning for optimized trade execution[C]// Proceed 23rd Int’l Conf Mach Learn. Pittsburgh, Pennsylvania, USA. 2006: 673-680.
[14]	CHENG Nuo, WANG Peng, ZHANG Guangyuan, et al. Prioritized experience replay in path planning via multi-dimensional transition priority fusion[J]. Front Neurorobot, 2023, 15(17): 171281166-1281166.
[15]	Golovin N, Rahm E. Reinforcement learning architecture for web recommendations[C]// Int’l Conf Info Tech: Coding Comput, 2004. Proceed. ITCC 2004. IEEE, 2004, 1: 398-402.
[16]	XUE Honghu, Benedikt H, Mohamed B, et al. Using deep reinforcement learning with automatic curriculum learning for mapless navigation in intralogistics[J]. Appl Sci, 2022, 12(6): 3153-3153. (in Chinese)
[17]	万里鹏, 兰旭光, 张翰博, 等. 深度强化学习理论及其应用综述[J]. 模式识别与人工智能, 2019, 32(1): 67-81. doi: 10.16451/j.cnki.issn1003-6059.201901009
	PENG Wanli, LAN Xuguang, ZHANG Zhang. A review of deep reinforcement learning theory and application[J]. Patt Recog Artif Intel, 2019, 32(1): 67-81. (in Chinese)
[18]	Volodymyr M, Koray K, David S, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-33.
[19]	Lillicrap P T, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[C]// Int’l Conf Learn Represent. Google Deepmind, London, UK, 2016.

训练阶段	合并数据集		训练回合数
训练阶段	理想数据	仿真数据	训练回合数
S₁	v_ego, v_oth, d_rea	-	K₁
S₂	v_oth, d_rea	v_ego	K₂
S₃	v_oth	v_ego, d_rea	K₃
S₄	-	v_ego, v_oth, d_rea	K₄

训练阶段	合并数据集		训练回合数
训练阶段	理想数据	仿真数据	训练回合数
S₁	v_ego, v_oth, d_rea	-	K₁
S₂	v_oth, d_rea	v_ego	K₂
S₃	v_oth	v_ego, d_rea	K₃
S₄	-	v_ego, v_oth, d_rea	K₄

超参数	数值
最大回合数，K	1 000
每回合时间步，T	1 200
经验回放池最大容量，D_max	100 000
批量训练数量，batch	64
网络学习率，l_r	0.001
折扣因子，γ	0.995
软更新系数，η	0.02
PER超参数，[ε, α, β]	[0.01, 0.8, 0.4]
权重系数，[ζ₁, ζ₂, ζ₃, ζ₄]	[8, 2, 1, 0.1]
聚合因子，χ	0.000 1

超参数	数值
最大回合数，K	1 000
每回合时间步，T	1 200
经验回放池最大容量，D_max	100 000
批量训练数量，batch	64
网络学习率，l_r	0.001
折扣因子，γ	0.995
软更新系数，η	0.02
PER超参数，[ε, α, β]	[0.01, 0.8, 0.4]
权重系数，[ζ₁, ζ₂, ζ₃, ζ₄]	[8, 2, 1, 0.1]
聚合因子，χ	0.000 1

激光雷达组		天气环境组
最小距离/ m	1→2	时间	12→24
频率/ Hz	10→5	下雨程度	0→0.80
最大距离/ m	150→100	湿度	0→1.00
中心角/ (°)	10→20	雾气	0→1.00
		云	0→0.80
		地面状态	0→0.60