基于深度强化学习的智能网联车辆换道轨迹规划方法

doi:10.3969/j.issn.1674-8484.2022.04.012

汽车安全与节能学报 ›› 2022, Vol. 13 ›› Issue (4): 705-717.DOI: 10.3969/j.issn.1674-8484.2022.04.012

基于深度强化学习的智能网联车辆换道轨迹规划方法

冯耀¹(), 景首才¹^,³^,^*(), 惠飞¹, 赵祥模¹, 刘建蓓²^,³

1.长安大学信息工程学院，陕西 710064，中国
2.交通运输部交通安全与应急保障技术行业研发中心，陕西 710075，中国
3.中交第一公路勘察设计研究院有限公司，陕西 710075，中国

收稿日期:2021-11-27 修回日期:2022-07-18 出版日期:2022-12-31 发布日期:2023-01-01
通讯作者: 景首才
作者简介:*景首才 (1991—)，男 (汉)，甘肃，讲师。E-mail：scjing@che.edu.cn。
冯耀 (1998—)，男 (汉)，山西，硕士研究生。E-mail：yaofeng@chd.eud.cn。
基金资助:
国家重点研发计划课题(2021YFB250120002);陕西省重点产业创新链(2021ZDLGY04-06)

Deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles

FENG Yao¹(), JING Shoucai¹^,³^,^*(), HUI Fei¹, ZHAO Xiangmo¹, LIU Jianbei²^,³

1. School of Information Engineering, Chang’an University, Shaanxi 710064, China
2. Research Center of Traffic Safety and Emergency Security Technology, Ministry of Transport, Shaanxi 710075, China
3. CCCC First Highway Consultants Co., Ltd, Shaanxi 710075, China

Received:2021-11-27 Revised:2022-07-18 Online:2022-12-31 Published:2023-01-01
Contact: JING Shoucai

摘要/Abstract

摘要：

以提高智能网联车辆换道安全和效率，降低燃油消耗为目的，该文提出了一种基于深度强化学习的智能网联车辆（ICV）换道轨迹规划方法。分析复杂交通场景智能网联车辆换道功能需求，设计了分层式智能网联车辆换道轨迹规划架构；兼顾车辆安全和换道效率，设计了基于完全信息纯策略博弈的换道行为决策模型；解耦车辆纵横向运动状态，构造了以燃油消耗和乘客舒适度为目标的联合优化函数，提出了基于双延迟深度确定性策略梯度（TD3）的智能网联车辆纵横向换道轨迹规划方法，得到了车辆纵横向优化换道轨迹，并利用搭建的3个典型换道仿真场景，验证了算法的有效性。结果表明：与深度确定性策略梯度（DDPG）算法相比，提出的方法在左换道和右换道实验中的训练效率平均提升了约10.5%，平均油耗分别减少了65%和44%，而且单步轨迹规划时间在10 ms内，能够实时获取安全、节能、舒适的换道轨迹。

关键词: 智能网联车辆（ICV）, 深度强化学习, 换道, 轨迹规划

Abstract:

A deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles (ICVs) was proposed to improve the lane-changing safety and efficiency of ICVs and reduce fuel consumption. A hierarchical ICV lane-changing trajectory planning architecture was designed based on the functional requirements of ICVs in complex traffic scenarios. Considering vehicle safety and lane-changing efficiency, a lane-changing behavior decision model was constructed based on complete information pure strategy game. A joint optimization function representing fuel consumption and passenger comfort was also constructed with decoupling the longitudinal and lateral motion states of vehicles. And based on twin delayed deep deterministic policy gradient (TD3) algorithm, a longitudinal and lateral lane-changing trajectory planning method of ICVs was proposed to achieve the longitudinal and lateral optimized lane-changing trajectory. The effectiveness of the algorithm was verified by using 3 typical lane-changing simulation scenarios. The results show that compared with the deep deterministic policy (DDPG) algorithm, the training efficiency of the proposed method in the experiment of left lane-changing and right lane-changing is increased by about 10.5% on average, the average fuel consumption is reduced by 65% and 44%, respectively, and the single step trajectory planning time is within 10 ms, which can obtain a safe, energy-saving and comfortable lane-changing trajectory in real time.

Key words: intelligent and connected vehicles (ICVs), deep reinforcement learning, lane-changing, trajectory planning

中图分类号:

TP273

冯耀, 景首才, 惠飞, 赵祥模, 刘建蓓. 基于深度强化学习的智能网联车辆换道轨迹规划方法[J]. 汽车安全与节能学报, 2022, 13(4): 705-717.

FENG Yao, JING Shoucai, HUI Fei, ZHAO Xiangmo, LIU Jianbei. Deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles[J]. Journal of Automotive Safety and Energy, 2022, 13(4): 705-717.

图/表 16

参考文献 31

[1]	李克强, 戴一凡, 李升波, 等. 智能网联汽车(ICV)技术的发展现状及趋势[J]. 汽车安全与节能学报, 2017, 8(1): 1-14.
	LI Keqiang, DAI Yifan, LI Shengbo, et al. State-of-the-art and technical trends of intelligent and connected vehicles[J]. *J Autom Safe Energ*, 2017, 8(1): 1-14. (in Chinese)
[2]	李立, 徐志刚, 赵祥模, 等. 智能网联汽车运动规划方法研究综述[J]. 中国公路学报, 2019, 32(6): 20-33. doi: 10.19721/j.cnki.1001-7372.2019.06.002
	LI Li, XU Zhigang, ZHAO Xiangmo, et al. Review of motion planning methods of intelligent connected vehicles[J]. *Chin J High Transport*, 2019, 32(6): 20-33. (in Chinese)
[3]	ZHAO Ding, Lam H, PENG Huei, et al. Accelerated evaluation of automated vehicles safety in lane-change scenarios based on importance sampling techniques[J]. *IEEE Trans Intel Transport Syst*, 2017, 18(3): 595-607.
[4]	卢兆麟, 李升波, Schroeder F, 等. 结合自然语言处理与改进层次分析法的乘用车驾驶舒适性评价[J]. 清华大学学报(自然科学版), 2016, 56(2): 137-143.
	LU Zhaolin, LI Shengbo, Schroeder F, et al. Driving comfort evaluation of passenger vehicles with natural language processing and improved AHP[J]. J Tsinghua Univ (Sci Tech), 2016, 56(2): 137-143. (in Chinese)
[5]	MA Liang, XUE Jianru, Kawabata K, et al. Efficient sampling-based motion planning for on-road autonomous driving[J]. *IEEE Trans Intel Transport Syst*, 2015, 16(4): 1961-1976.
[6]	Shaikh E A, Dhale A. AGV path planning and obstacle avoidance using Dijkstra’s algorithm[J]. *Int’l J Appl Innov Engi Manag*, 2013, 2(6): 77-83.
[7]	ZHANG Jing, WU Jun, SHEN Xiao, et al. Autonomous land vehicle path planning algorithm based on improved heuristic function of A-star[J]. *Int’l J Advan Robo Syst*, 2021, 18(5): 1-10.
[8]	张卫波, 肖继亮. 改进RRT算法在复杂环境下智能车路径规划中的应用[J]. 中国公路学报, 2021, 34(3): 225-234. doi: 10.19721/j.cnki.1001-7372.2021.03.017
	ZHANG Weibo, XIAO Jiliang. Application of improved RRT algorithm in intelligent vehicle path planning under complicated environment[J]. *Chin J High Transport*, 2021, 34(3): 225-234. (in Chinese)
[9]	YOU Feng, ZHANG Ronghui, LIE Guo, et al. Trajectory planning and tracking control for autonomous lane change maneuver based on the cooperative vehicle infrastructure system[J]. Expert Syst Appl: An Int’l J, 2015, 42(14): 5932-5946.
[10]	Broggi A, Medici P, Zani P, et al. Autonomous vehicles control in the vislab intercontinental autonomous challenge[J]. *Annu Rev Contr*, 2012, 36(1): 161-171.
[11]	Chu K, Lee M, Sunwoo M. Local path planning for off-Road autonomous driving with avoidance of static obstacles[J]. *IEEE Trans Intel Transport Syst*, 2012, 13(4): 1599-1616.
[12]	陈成, 何玉庆, 卜春光, 等. 基于四阶贝塞尔曲线的无人车可行轨迹规划[J]. 自动化学报, 2015, 41(3): 486-496.
	CHEN Cheng, HE Yuqing, BU Chunguang, et al. Feasible trajectory generation for autonomous vehicles based on quartic bezier curve[J]. *Acta Autom Sinica*, 2015, 41(3): 486-496. (in Chinese)
[13]	徐杨, 陆丽萍, 褚端峰, 等. 无人车辆轨迹规划与跟踪控制的统一建模方法[J]. 自动化学报, 2019, 45(4): 799-807.
	XU Yang, LU Liping, CHU Duanfeng, et al. Unified modeling of trajectory planning and tracking for unmanned vehicle[J]. *Acta Autom Sinica*, 2019, 45(4): 799-807. (in Chinese)
[14]	江浩斌, 施凯津, 华一丁, 等. 基于HP自适应伪谱法的智能汽车紧急变道轨迹规划与优化[J]. 中国公路学报, 2019, 32(6): 71-78. doi: 10.19721/j.cnki.1001-7372.2019.06.007
	JIANG Haobin, SHI Kaijin, HUA Yiding, et al. Lane-changing trajectory planning and optimization for intelligent vehicle through HP-adaptive pseudospectral method[J]. *Chin J High Transport*, 2019, 32(6): 71-78. (in Chinese)
[15]	Sallab A, Abdou M, Perot E, et al. Deep reinforcement learning framework for autonomous driving[J]. *Electr Imag*, 2017(19): 70-76.
[16]	FU Yuchuan, LI Changle, YU Richard, et al. A Decision-making strategy for vehicle autonomous braking in emergency via deep reinforcement learning[J]. *IEEE Trans Vehi Tech*, 2020, 69(6): 5876-5888.
[17]	Saxena D M, Bae S, Nakhaei A, et al. Driving in dense traffic with model-free reinforcement learning[J]. *IEEE Int’l Conf Robot Auto*, 2020(99): 5385-5392.
[18]	Hoel C J, Wolff K, Laine L. Automated speed and lane change decision making using deep reinforcement learning[J]. *IEEE Int’l Conf Intel Transport Syst*, 2018(99): 2148-2155.
[19]	PENG Baiyu, SUN Qi, LI Shengbo E, et al. End-to-end autonomous driving through dueling double deep Q-network[J]. *Autom Innocation*, 2021, 4(3): 328-337.
[20]	CHEN Jianyu, LI Shengbo E, Tomizuka M. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning[J]. *IEEE Trans Intel Transport Syst*, 2021(99): 1-11.
[21]	LI Guofa, YANG Yifan, LI Shen, et al. Decision making of autonomous vehicles in lane change scenarios: Deep reinforcement learning approaches with risk awareness[J]. Transport Res Part C: Emerg Tech, 2021, 134: 1-18.
[22]	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J/OL]. (2015-09-09), http//arxiv.org/abs/1509.02971.
[23]	Fujimoto S, Hoof H V, Meger D. Addressing function approximation error in actor-critic methods[J/OL]. (2018-02-26), http//arxiv.org/abs/1802.09477. http//arxiv.org/abs/1802.09477
[24]	采国顺, 刘昊吉, 冯吉伟, 等. 智能汽车的运动规划与控制研究综述[J]. 汽车安全与节能学报, 2021, 12(3): 279-297.
	CAI Guoshun, LIU Haoji, FENG Jiwei, et al. Review on the research of motion planning and control for intelligent vehicles[J]. J Autom Safe Energ, 2021, 12(3): 279-297. (in Chinese)
[25]	CAO Zhong, XU Shaobing, PENG Huei, et al. Confidence-aware reinforcement learning for self-driving cars[J]. *IEEE Trans Intel Transport Syst*, 2021(99): 1-12.
[26]	龚建伟, 姜岩, 徐威. 无人驾驶车辆模型预测控制[M]. 北京: 北京理工大学出版社, 2014: 5-20.
	GONG Jianwei, JIANG Yan, XU Wei, Model Predictive Control for Self-Driving Vehicles[M]. Beijing: Institute of Technology Press, Beijing, 2014: 5-20. (in Chinese)
[27]	Talebpour A, Mahmassani H S, Hamdar S H. Modeling lane-changing behavior in a connected environment: A game theory approach[J]. *Transport Res Part C*, 2015, 59(7): 216-232.
[28]	Shakarian P, Roos P, Johnson A. A review of evolutionary graph theory with applications to game theory[J]. *Bio Syst, 2012, 107*(2): 66-80.
[29]	Volodymyr M, Koray K, David S, et al. Human-level control through deep reinforcement learning[J]. *Nature, 2019, 518*(7540): 529-533.
[30]	JING Shoucai, HUI Fei, ZHAO Xiangmo, et al. Cooperative game approach to optimal merging sequence and on-ramp merging control of connected and automated vehicles[J]. *IEEE Trans Intel Transport Syst*, 2019 (99): 1-11.
[31]	Ahn K, Rakha H, Trani A, et al. Estimating vehicle fuel consumption and emissions based on instantaneous speed and acceleration levels[J]. *J Transport Engi, 2002, 128*(2): 182-190.

目标车道车辆	换道车辆M
目标车道车辆	m₁（换道）	m₂（不换道）
d_i₁(允许)	[R_M (d_i₁, m₁), R_D (d_i₁, m₁)]	[R_M (d_i₁, m₂), R_D (d_i₁, m₂)]
d_i₂(拒绝)	[R_M (d_i₂, m₁), R_D (d_i₂, m₁)]	[R_M (d_i₂, m₂), R_D (d_i₂, m₂)]

目标车道车辆	换道车辆M
目标车道车辆	m₁（换道）	m₂（不换道）
d_i₁(允许)	[R_M (d_i₁, m₁), R_D (d_i₁, m₁)]	[R_M (d_i₁, m₂), R_D (d_i₁, m₂)]
d_i₂(拒绝)	[R_M (d_i₂, m₁), R_D (d_i₂, m₁)]	[R_M (d_i₂, m₂), R_D (d_i₂, m₂)]

网络	层	维度	激活函数
Actor网络	L₁	(s, 30)	Relu
	L₂	(30, 30)	Relu
	L₃	(30, a)	Tanh
Critic1网络	L₁	(s+a, 30)	Relu
	L₂	(30, 30)	Relu
	L₃	(30, 1)	无
Critic2网络	L₁	(s+a, 30)	Relu
	L₂	(30, 30)	Relu
	L₃	(30, 1)	无

网络	层	维度	激活函数
Actor网络	L₁	(s, 30)	Relu
	L₂	(30, 30)	Relu
	L₃	(30, a)	Tanh
Critic1网络	L₁	(s+a, 30)	Relu
	L₂	(30, 30)	Relu
	L₃	(30, 1)	无
Critic2网络	L₁	(s+a, 30)	Relu
	L₂	(30, 30)	Relu
	L₃	(30, 1)	无

参数名称	参数值
训练回合数	500
每回合最大步数	200
批量大小	64
经验池容量	104
折扣系数	0.99
Actor网络学习率	3×10^-4
Critic网络学习率	3 ×10^-4
软更新速率	1×10^-3
动作探索噪声	0.1
策略噪声	0.5
延迟更新频率	2
优化器类型	Adam

基于深度强化学习的智能网联车辆换道轨迹规划方法

Deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 31

相关文章 7

编辑推荐

Metrics

本文评价

期刊信息

在线期刊

作者中心

审稿中心

联系我们

目标车道车辆	换道车辆M
目标车道车辆	m₁（换道）	m₂（不换道）
d_i₁（允许）	(-0.12, -0.07)	(0.30, -0.05)
d_i₂（拒绝）	(-0.12, 0.00)	(0.30, 0.05)

场景	训练前/后	回合总油耗 L·s^-1	回合总步数	平均油耗 mL·s^-1
左换道	训练前	3.67	124	30
左换道	训练后	1.37	124	11
右换道	训练前	3.98	125	32
右换道	训练后	2.24	125	18

场景	算法	总训练时间/ s	每回合平均训练时间 / s	加载模型、规划时间/ s	单步规划时间 / ms
左换道	TD3	935.27	1.87	1.23（144 step）	8.48
左换道	DDPG	1 065.12	2.13	1.11（129 step）	8.54
右换道	TD3	918.40	1.84	1.26（132 step）	9.47
右换道	DDPG	1 012.24	2.02	1.17（122 step）	9.51

[1]	李文礼, 邱凡珂, 廖达明, 任勇鹏, 易帆. 基于深度强化学习的高速公路换道跟踪控制模型[J]. 汽车安全与节能学报, 2022, 13(4): 750-759.
[2]	张平, 陈一凡, 江书真, 韩毅. 高速公路上自动超车过程的轨迹规划与跟踪控制[J]. 汽车安全与节能学报, 2022, 13(3): 463-472.
[3]	彭涛, 许庆, 陈强, 关志伟, 侯海晶, 王涛, 李佳林. 异构载货车辆队列高速换道分布式反馈线性化控制[J]. 汽车安全与节能学报, 2022, 13(3): 473-481.
[4]	郑阳俊, 贺帅, 帅志斌, 李建秋, 盖江涛, 李勇, 张颖, 李国辉. 基于DRL的四轮独立驱动电动车辆的侧向车速估计[J]. 汽车安全与节能学报, 2022, 13(2): 309-316.
[5]	刘兵, 王锦锐, 谢济铭, 陈金宏, 段国忠, 叶保权, 侯效伟, 彭博. 微观轨迹数据驱动的交织区换道概率分布模型[J]. 汽车安全与节能学报, 2022, 13(2): 333-340.
[6]	魏祥民，严明月，汪䶮, 等. 基于路面识别的汽车紧急避撞控制算法[J]. JASE, 2017, 08(04): 359-366.
[7]	杨林，胡艳青，闫斌. 基于行驶工况的插电式混合动力汽车电能消耗最优控制[J]. JASE, 2017, 08(01): 87-96.