Learning-based automatic driving decision-making integrated with vehicle trajectory prediction

doi:10.3969/j.issn.1674-8484.2022.02.012

Abstract

Abstract:

On the basis of considering the future trajectory of the vehicle, reinforcement learning was used to realize the decision-making problem of driving in a complex scenario. A long-term interaction trajectory prediction model of surrounding vehicles was built based on the graph structure and Long Short Term Memory (LSTM) and Rainbow DQN algorithm was used to build a behavioral decision model. In this model, the state space not only considered the current time of the vehicle information, but also considered the future trajectories of these vehicles. The corresponding reward function was designed from the perspectives of safety, comfort, driving efficiency, etc. Safety rules were set to improve the safety of selected actions. The results show that at the end of 5 s, the method with vehicle trajectory prediction has a longitudinal location error of 1.54 m with a lateral location error of 0.32 m, which are relatively accurate. Therefore, this method improves the safety and efficiency of decision-making for autonomous vehicles.

Key words: vehicle engineering, autonomous driving, reinforcement learning, decision-making model, vehicle trajectory prediction

CLC Number:

U491.2

XU Jie, PEI Xiaofei, YANG Bo, FANG Zhigang. Learning-based automatic driving decision-making integrated with vehicle trajectory prediction[J]. Journal of Automotive Safety and Energy, 2022, 13(2): 317-324.

Figures/Tables 9

References 21

[1]	XIN Long, WANG Pin, CHAN Chingyao, et al. Intention-aware long horizon trajectory prediction of surrounding vehicles using Dual LSTM networks[C]// 2018: 1441-1446.
[2]	谢国涛. 不确定性条件下智能车辆动态环境认知方法研究[D]. 合肥: 合肥工业大学, 2018.
	XIE Guotao. Research on the cognition method of intelligent vehicle dynamic environment under uncertainty conditions[D]. Hefei: Hefei University of Technology, 2018. (in Chinese)
[3]	ZHU Kun. Trajectory prediction based on Gaussian Mixture-Bayesian model[J]. Computer and Modernization, 2019, 20(2): 72-81.
[4]	Schreier M, Willert V, Adamy J. An Integrated approach to maneuver-based trajectory prediction and criticality assessment in arbitrary road environments[J]. IEEE Trans Intell Transp Syst, 2016, 17(10): 2751-2766. doi: 10.1109/TITS.2016.2522507 URL
[5]	李建平. 面向智能驾驶的交通车辆运动预测方法研究[D]. 长春: 吉林大学, 2018.
	LI Jianpin. Research on prediction method of traffic vehicle motion for intelligent driving[D]. Changchun: Jilin University, 2018.. (in Chinese)
[6]	HOU Lian, XIN Long, LI Shengbo, et al. Interactive trajectory prediction of surrounding road users for autonomous driving using structural-LSTM network[J]. IEEE Transa Intell Transp Syst, 2019, 21(11): 4615-4625.
[7]	DUAN Jingliang, LI Shengbo, YANG Guan, et al. Hierarchical reinforcement learning for self-driving decision-making without reliance on labeled driving data[J]. IET Intel Transp Syst, 2020, 14(1): 1-9. doi: 10.1049/iet-its.2019.0088 URL
[8]	Furda A, Vlacic L. Enabling safe autonomous driving in real-world city traffic using multiple criteria decision making[J]. IEEE Intell Transp Syst Maga, 2011, 3(1): 4-17.
[9]	Glaser S, Vanholme B, Mammar S, et al. Maneuver-based trajectory planning for highly autonomous vehicles on real road with traffic and driver interaction[J]. IEEE Trans Intell Transp Syst, 2010, 11(3): 589-606. doi: 10.1109/TITS.2010.2046037 URL
[10]	Kala R, Warwick K. Motion planning of autonomous vehicles in a non-autonomous vehicle environment without speed lanes[J]. Engi Appl Arti Intell, 2013, 26(5-6): 1588-1601.
[11]	HOU Lian, DUAN Jingliang, WANG Wenjun, et al. Drivers’ braking behaviors in different motion patterns of vehicle-bicycle conflicts[J]. J Adva Transp, 2019, 2019(1): 1-17.
[12]	Silver D, HUANG Aja, Maddison C, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(28): 484-489. doi: 10.1038/nature16961 URL
[13]	Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(29): 354-359. doi: 10.1038/nature24270 URL
[14]	Kiran B R, Sobh I, Talpaert V, et al. Deep reinforcement learning for autonomous driving: a survey[J]. IEEE Trans Intell Transp Syst, 2020, 1558(16): 1-18.
[15]	Chen Dong, Jiang Longsheng, Wang Yue, et al. Autonomous driving using safe reinforcement learning by incorporating a regret-based human lane-changing decision model[C]// 2020 Ame Contr Conf (ACC). 2020: 4355-4361.
[16]	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[C]// 2016 Int’l Conf Learning Repres (ICLR). 2016: 1-14.
[17]	Mnih V, Badia A P, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]// The 33rd Int’l Conf Mach Learning. 2016: 1-19.
[18]	LIU Huaping, WU Yupei, SUN Fuchun. Extreme trust region policy optimization for active object recognition[J]. IEEE Trans Neural Networks Learning Syst, 2018, 29(6): 1-6.
[19]	Subramanya N, Eric T H, Dimitar F. Autonomous highway driving using deep reinforcement learning[C]// IEEE Int’l Conf Syst Man Cybernetics (SMC). 2019: 2326-2331.
[20]	Velikovi P, Cucurull G, Casanova A, et al. Graph attention networks[C]// 2018 Int’l Conf Learning Repres (ICLR). 2017: 1-12.
[21]	HOU Lian, XIN Long, LI Shengbo, et al. Interactive trajectory prediction of surrounding road users for autonomous driving using structural-LSTM network[J]. IEEE Trans Intell Transp Syst, 2019, 21(11): 1-11.

模型	纵向位置误差 / m					横向位置误差 / m
模型	t / s = 1	2	3	4	5	t / s = 1	2	3	4	5
CTRA	0.33	0.86	1.93	3.59	5.85	0.14	0.35	0.57	0.78	0.99
Structural-LSTM	0.39	0.73	1.07	1.45	1.89	0.10	0.17	0.23	0.29	0.33
图结构(无注意力机制)	0.39	0.87	1.41	2.05	2.85	0.09	0.17	0.23	0.29	0.33
本文所提模型	0.31	0.35	0.57	0.78	1.54	0.09	0.16	0.22	0.27	0.32

模型	纵向位置误差 / m					横向位置误差 / m
模型	t / s = 1	2	3	4	5	t / s = 1	2	3	4	5
CTRA	0.33	0.86	1.93	3.59	5.85	0.14	0.35	0.57	0.78	0.99
Structural-LSTM	0.39	0.73	1.07	1.45	1.89	0.10	0.17	0.23	0.29	0.33
图结构(无注意力机制)	0.39	0.87	1.41	2.05	2.85	0.09	0.17	0.23	0.29	0.33
本文所提模型	0.31	0.35	0.57	0.78	1.54	0.09	0.16	0.22	0.27	0.32

参数名称	描述	参数值
隐藏层参数	全连接层神经元数	(256,128)
折扣系数	计算长期折扣奖励	0.99
网络学习率	策略梯度更新步长	5×10^-4
学习率衰减率	随时间衰减学习率	0.5
学习率衰减步长	每隔一定回合减小学习率	500
批量大小	批量梯度下降中样本数量	32
软更新速率	Polyak移动平均衰减系数	0.01
经验池尺寸	样本存储	1×10⁵
返回步数	考虑多个回合奖励值返回	3
分布最大最小值	将Q值分布为一个区间	[-100,100]
分布个数	将Q值所属区间等分	51

参数名称	描述	参数值
隐藏层参数	全连接层神经元数	(256,128)
折扣系数	计算长期折扣奖励	0.99
网络学习率	策略梯度更新步长	5×10^-4
学习率衰减率	随时间衰减学习率	0.5
学习率衰减步长	每隔一定回合减小学习率	500
批量大小	批量梯度下降中样本数量	32
软更新速率	Polyak移动平均衰减系数	0.01
经验池尺寸	样本存储	1×10⁵
返回步数	考虑多个回合奖励值返回	3
分布最大最小值	将Q值分布为一个区间	[-100,100]
分布个数	将Q值所属区间等分	51

模型	η / ％	v_av m·s^-1	v_av方差 m²·s^-2
DDQN	91.0	18.82	0.025
DDQN +预测	96.4	19.95	0.678
Rainbow DQN	98.8	20.71	0.001
Rainbow DQN +预测	99.2	20.91	0.004