基于集成深度强化学习的自动驾驶车辆行为决策模型

doi:10.3969/j.issn.1674-8484.2023.04.009

摘要/Abstract

摘要：

提出一种基于集成的深度强化学习的自动驾驶车辆的行为决策模型。基于Markov决策过程(MDP)理论，采用标准投票法，将深度Q学习网络 (DQN)、双DQN (DDQN)和竞争双DDQN (Dueling DDQN)等3种基础网络模型集成。在高速公路仿真环境、在单向3车道、4车道、5车道数量场景下，对向左换道、车道保持、向右换道、同车道加速和减速等5种车辆驾驶行为，进行测试和泛化性验证。结果表明：与其它3种网络模型相比，该模型的决策成功率分别提高了6%、3%和6%；平均车速也有提升；100回合的测试，耗时小于1 ms，满足决策实时性要求。因而，该决策模型提高了行车安全和决策效率。

关键词: 自动驾驶, 深度强化学习, 集成学习, 深度Q网络(DQN), 标准投票法

Abstract:

A behavior decision-making model for autonomous vehicles was proposed based on an ensemble deep reinforcement learning method. The decision model was constructed based on the Markov decision process (MDP) theory. Three basic network models were integrated, including the Deep Q-learning Network (DQN), the Double DQN (DDQN), and the Dueling double DDQN (Dueling DDQN), by using the Standard Voting Method. Some tests and the generalization validation tests were done, for 5 vehicle driving behaviors, including the lane changing to the left, the lane keeping, the lane changing to the right, the accelerating in the same lane, and the decelerating in the same lane, in highway simulation environments under the scenarios of 3-lane, 4-lane, and 5-lane in one direction. The results show that the decision success rate of the proposed model increase 6%, 3% and 6%, respectively, compare with the other three network models. The average vehicle speed has also been improved; And the 100-round tests take less than 1 ms, which meets the requirement for real-time decision-making. Therefore, the decision-making model improves driving safety and decision-making efficiency.

Key words: autonomous driving, deep reinforcement learning, ensemble learning, deep Q-network (DQN), standard voting method

中图分类号:

U461

张新锋, 吴琳. 基于集成深度强化学习的自动驾驶车辆行为决策模型[J]. 汽车安全与节能学报, 2023, 14(4): 472-479.

ZHANG Xinfeng, WU Lin. Behavior decision-making model for autonomous vehicles based on an ensemble deep reinforcement learning[J]. Journal of Automotive Safety and Energy, 2023, 14(4): 472-479.

图/表 14

参考文献 25

[1]	LIU Teng, TIAN Bin, AI Yunfei, et al. Dynamic states prediction in autonomous vehicles: Comparison of three different methods[C]// 2019 IEEE Intel Transp Syst Conf (ITSC), Auckland, New Zealand. 2019: 3750-3755.
[2]	Gkartzonikas C, Gkritza K. What have we learned? A review of stated preference and choice studies on autonomous vehicles[J]. Transp Rese Part C: Emerg Tech, 2019, 98: 323-337.
[3]	SHI Yunxia, LI Ying, FAN Jiahao, et al. A novel network architecture of decision-making for self-driving vehicles based on long short-term memory and grasshopper optimization algorithm[J]. IEEE Access, 2020, 8: 155429-155440. doi: 10.1109/Access.6287639 URL
[4]	Furda A, Vlacic L. Enabling safe autonomous driving in real-world city traffic using multiple criteria decision making[J]. IEEE Intel Transp Syst Mag, 2011, 3(1): 4-17.
[5]	CHEN Jiajia, ZHAO Pan, LIANG Huawei, et al. A multiple attribute-based decision making model for autonomous vehicle in urban environment[C]// 2014 IEEE Intell Vehi Symp Proc, Dearborn, MI, USA. 2014: 480-485.
[6]	CHONG Linsen, Abbas M M, Flintsch A M, et al. A rule-based neural network approach to model driver naturalistic behavior in traffic[J]. Transp Res Part C: Emerg Tech, 2013, 32: 207-223. doi: 10.1016/j.trc.2012.09.011 URL
[7]	Barman B, Kanjilal R, Mukhopadhyay A. Neuro-fuzzy controller design to navigate unmanned vehicle with construction of traffic rules to avoid obstacles[J]. Int’l J Uncert, Fuzz Knowl-Based Syst, 2016, 24(3): 433-449.
[8]	LI Sixian, ZHANG Junyou, WANG Shufeng, et al. Ethical and legal dilemma of autonomous vehicles: Study on driving decision-making model under the emergency situations of red light-running behaviors[J]. Electronics, 2018, 7(10): 264. doi: 10.3390/electronics7100264 URL
[9]	Bojarski M, Del Testa D, Dworakowski D, et al. End to end learning for self-driving cars[EB/OL]. (2016-04-25). https://arxiv.org/abs/1604.07316.
[10]	LI Liangzhi, Ota K, DONG Mianxiong. Humanlike driving: Empirical decision-making system for autonomous vehicles[J]. IEEE Trans Vehi Tech, 2018, 67(8): 6814-6823.
[11]	CHEN Shitao, ZHANG Songyi, SHANG Jinghao, et al. Brain-inspired cognitive model with attention for self-driving cars[J]. IEEE Trans Cogni Deve Syst, 2017, 11(1): 13-25.
[12]	LIU Teng, HUANG Bing, DENG Zejian, et al. Heuristics-oriented overtaking decision making for autonomous vehicles using reinforcement learning[J]. IET Elec Syst Transp, 2020, 10(4): 417-424.
[13]	Mirchevska B, Pek C, Werling M, et al. High-level decision making for safe and reasonable autonomous lane changing using reinforcement learning[C]// 2018 21st Int’l Conf Intell Transp Syst (ITSC), Maui, HI, USA. 2018: 2156-2162.
[14]	LI Dong, ZHAO Dongbin, ZHANG Qichao. Reinforcement learning based lane change decision- making with imaginary sampling[C]// 2019 IEEE Symp Series Comput Intell (SSCI), Xiamen, China. 2019: 16-21.
[15]	张鑫辰, 张军, 刘元盛, 等. 基于Dueling DDQN的无人车换道决策模型[J]. 东北师大学报:自然科学版, 2022, 54(1): 63-71.
	ZHANG Xinchen, ZHANG Jun, LIU Yuansheng, et al. Lane-changing decision model for unmanned vehicles based on Dueling DDQN[J]. J Northeast Norm Univ: Nat Sci, 2022, 54(1): 63-71. (in Chinese)
[16]	Valiente R, Toghi B, Pedarsani R, et al. Robustness and adaptability of reinforcement learning-based cooperative autonomous driving in mixed-autonomy traffic[J]. IEEE Open J Intell Transp Syst, 2022, 3: 397-410. doi: 10.1109/OJITS.2022.3172981 URL
[17]	罗鹏, 黄珍, 秦易晋, 等. 基于DQN的车辆驾驶行为决策方法[J]. 交通信息与安全, 2020, 38(5): 67-77.
	LUO Peng, HUANG Zhen, QIN Yijin, et al. A method of vehicle driving behavior decision based on DQN algorithm[J]. J Transp Info Safe, 2022, 38(5): 67-77. (in Chinese)
[18]	Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
[19]	刘俊峰, 陈剑龙, 王晓生, 等. 基于深度强化学习的微能源网能量管理与优化策略研究[J]. 电网技术, 2020, 44(10): 3794-3803.
	LIU Junfeng, CHEN Jianlong, WANG Xiaosheng, et al. Energy management and optimization of multi-eneryg grid based on deep reinforcement learning[J]. Power Syst Tech, 2020, 44(10): 3794-3803. (in Chinese)
[20]	Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]// Proc AAAI Conf Artif Intell, Phoenix, Arizona, USA. 2016: 2094-2100.
[21]	WANG Ziyu, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning[C]// Int’l Conf Machine Learning, New York, USA. 2016: 1995-2003.
[22]	梁兵涛, 倪云峰. 基于集成学习的中文命名实体识别方法[J]. 南京师大学报(自然科学版), 2022, 45(3): 123-131.
	LIANG Bingtao, NI Yunfeng. Chinese named entity recognition method based on ensemble learning[J]. J Nanjing Normal Univ (Nat Sci), 2022, 45(3): 123-131. (in Chinese)
[23]	Carrara N, Leurent E, Laroche R, et al. Budgeted reinforcement learning in continuous state space[C]// 33rd Conf Neural Info Pro Syst (NeurIPS 2019), Vancouver, Canada. 2016, 32: 1-11.
[24]	Kong J, Pfeiffer M, Schildbach G, et al. Kinematic and dynamic vehicle models for autonomous driving control design[C]// 2015 IEEE Intell Vehi Symp (IV), Seoul, Korea. 2015: 1094-1099.
[25]	ZHOU Mofan, QU Xiaobo, JIN Sheng. On the impact of cooperative autonomous vehicles in improving freeway merging: A modified intelligent driver model-based approach[J]. IEEE Trans Intell Transp Syst, 2016, 18(6): 1422-1428.

最大加速度, a_max	6 m/s²
加速度指数, k	4
期望安全时距, T	1.5 s
期望减速度值, b	-5 m/s²
最小安全距离, d₀	10 m

最大加速度, a_max	6 m/s²
加速度指数, k	4
期望安全时距, T	1.5 s
期望减速度值, b	-5 m/s²
最小安全距离, d₀	10 m

最高安全减速度，b_safe	2 m/s²
礼貌因子，η	0
换道决策阈值，Δa_th	0.2

最高安全减速度，b_safe	2 m/s²
礼貌因子，η	0
换道决策阈值，Δa_th	0.2

批尺寸	32
学习率	0.001
折扣因子	0.9
记忆库容量	2 000
目标网络更新步长	1 000
隐藏层个数	2
隐藏层神经元个数	256
r_fast	0.4
r_coll	-1.0
r_out	-1.0
r_chg	-0.1