基于改进深度强化学习的全局路径规划策略

doi:10.3969/j.issn.1674-8484.2023.02.007

摘要/Abstract

摘要：

为了解决模型过度依赖与过度估计的问题，提出一种基于传统深度强化学习（DRL）的抑制过度估计深度Q网络（SQDQN）算法，来建立全局路径规划策略。该SQDQN算法，结合深度Q网络(DQN)算法与信息熵，来抑制过度估计；借助信息熵，实时评估更新过程，来抑制DQN策略算法过度地估计损害性能；借助SQDQN算法与环境模型的交互作用，建立了获取全局路径规划策略的环境模型。结果表明：与DQN算法相比，SQDQN算法在20次实验中3次选择为更优策略；与Dijkstra传统路径规划方法相比，SQDQN算法所规划路程通行时间减少11.32%；本文的全局路径规划策略，减少了由于DQN对动作预期过高所导致的输出错误动作。

关键词: 智能交通, 路径规划, 深度强化学习（DRL）, 信息熵, 抑制过度估计

Abstract:

A Suppresses Q Deep Q Network (SQDQN) algorithm was proposed based on traditional deep reinforcement learning (DRL), with being established a global path planning strategy, to solve the problem of model over-dependence and overestimation. The SQDQN algorithm combined the Deep Q Network (DQN) algorithm with information entropy to suppress overestimation; Evaluated the update process in real time, with the help of information entropy, to suppress the over-estimation of the damage performances of the DQN strategy. An environmental model to obtain the global path planning strategy was established with the help of the interaction between the SQDQN algorithm and the environment model. The results show that the SQDQN algorithm selects three better strategies from 20 experiments compared with the DQN strategy. And reduces the route planning travel time by 11.32% than that by the Dijkstra's traditional route planning method. The global path planning strategy of this paper reduces the output error caused by DQN's over expectation of actions.

Key words: intelligent transportation, path planning, deep reinforcement learning (DRL), information entropy, suppress overestimation

中图分类号:

U463.6

韩玲, 张晖, 方若愚, 刘国鹏, 朱长盛, 迟瑞丰. 基于改进深度强化学习的全局路径规划策略[J]. 汽车安全与节能学报, 2023, 14(2): 202-211.

HAN Ling, ZHANG Hui, FANG Ruoyu, LIU Guopeng, ZHU Changsheng, CHI Ruifeng. Global path planning strategy based on an improved deep reinforcement learning[J]. Journal of Automotive Safety and Energy, 2023, 14(2): 202-211.

图/表 14

参考文献 26

[1]	Ibarra-Rojas O J, Delgado F. Planning, operation, and control of bus transport systems: a literature review[J]. Transp Res Part B:Methodolog, 2015, 77: 38-75.
[2]	KE Li, RAO Xuan, PANG Xiaobing, et al. Route Search and Planning: a survey[J]. Big Data Res, 2021, 26: 1-11.
[3]	Dudeja C, Kumar P. An improved weighted sum-fuzzy Dijkstra’s algorithm for shortest path problem[J]. Soft Comput, 2022, 26(7): 3217-3226. doi: 10.1007/s00500-022-06871-w
[4]	Senbiswas R, Pal A, Werho T, et al. A graph theoretic approach to power system vulnerability identification[J]. IEEE Trans Power Syst, 2021, 36(2): 923-935. doi: 10.1109/TPWRS.59 URL
[5]	CHEN Yijing. Application of Improved dijkstra algorithm in coastal tourism route planning[J]. J Coastal Res, 2020, 106: 251-254. doi: 10.2112/SI106-059.1 URL
[6]	XU Minghua, Liu Yuqing, HUANG Qilin, et al. An improved dijkstra’s shortest path algorithm for sparse network[J]. Appl Math Compu, 2007, 185(1): 247-254.
[7]	ZHANG Yan, LI Lingling, LIN Xiongzheng, et al. Development of path planning approach using improved a-star algorithm in agv system[J]. J Internet Tech, 2019, 20(3): 915-924.
[8]	杜茂, 杨林. 基于交通时空特征的车辆全局路径规划算法[J]. 汽车安全与节能学报, 2021, 12(1): 52-61.
	DU Mao, YANG Lin. Vehicle global path planning algorithm based on traffic space-time characteristics[J]. J Auto Safe Energ, 2021, 12(1): 52-61. (in Chinese)
[9]	SONG Rui, LIU Yuanchang, Bucknall R. Smoothed A* algorithm for practical unmanned surface vehicle path planning[J]. Appl Ocean Res, 2019, 83: 9-20. doi: 10.1016/j.apor.2018.12.001 URL
[10]	Pereira F, Brasil P, Cuadros M, et al. Analysis of local trajectory planners for mobile robot with robot operating system[J]. IEEE Latin Ame Trans, 2022, 20(1): 92-99.
[11]	QIAN Xiaohui, ZHONG Xiaopeng. Optimal individualized multimedia tourism route planning based on ant colony algorithms and large data hidden mining[J]. Multimed Tools, 2019, 78(15): 22099-22108.
[12]	LI Changgeng, HUANG Xia, DING Jun, et al. Global path planning based on a bidirectional alternating search A* algorithm for mobile robots[J]. Compu Indu Eng, 2022, 168: 1-17.
[13]	张瑞鑫, 王伟, 田泽, 等. 基于模型约束A*算法的无人机三维航迹规划[J]. 国外电子测量技术, 2022, 41(9): 163-169.
	ZHANG Ruixin, WANG Wei, TIAN Ze, et al. Three dimensional path planning of uav based on model constrained A * algorithm[J]. Fore Electro Meas Tech, 2022, 41(9): 163-169. (in Chinese)
[14]	JIANG Chunyan, Fu Jingfang, LIU Weiyan. Research on vehicle routing planning based on adaptive ant colony and particle swarm optimization algorithm[J]. Int’l J Intell Transp Syst Res, 2021, 19(1): 83-91.
[15]	肖金壮, 余雪乐, 周刚, 等. 一种面向室内AGV路径规划的改进蚁群算法[J]. 仪器仪表学报, 2022, 43(3): 277-285.
	XIAO Jinzhuang, YU Xuele, ZHOU Gang, et al. An improved ant colony algorithm for indoor agv path planning[J]. J Instru, 2022, 43(3): 277-285. (in Chinese)
[16]	杨立炜, 付丽霞, 王倩, 等. 多层优化蚁群算法的移动机器人路径规划研究[J]. 电子测量与仪器学报, 2021, 35(9): 10-18.
	YANG Liwei, FU Lixia, WANG Qian, et al. Research on path planning of mobile robot based on multi-level optimization ant colony algorithm[J]. J Electro Meas Instr, 2021, 35(9): 10-18. (in Chinese)
[17]	LI Xiaojing, YU Dongman. Study on an optimal path planning for a robot based on an improved ant colony algorithm[J]. Automatic Contr Compu Sci, 2019, 53(3): 236-243.
[18]	LOU Ping, XU Kun, JIANG Xuemei, et al. Path planning in an unknown environment based on deep reinforcement learning with prior knowledge[J]. J Intell Fuzzy Syst, 2021, 41(6): 5773-5789.
[19]	PAN Jie, WANG Xuesong, CHENG Yuhu, et al. Multisource transfer double dqn based on actor learning[J]. IEEE Trans Neur Networks Learn Syst, 2018 29(6): 2227-2238.
[20]	李文礼, 张友松. 基于深度强化学习的车辆自主避撞决策控制模型[J]. 汽车安全与节能学报, 2021, 12(2): 201-209.
	LI Wenli, ZHANG Yousong. Vehicle autonomous collision avoidance decision control model based on deep reinforcement learning[J]. J Auto Safe Energy 2021, 12(2): 201-209. (in Chinese)
[21]	LI Jianxin, CHEN Yiting, ZHAO Xiuniao, et al. An improved DQN path planning algorithm[J]. J Supercomput, 2022, 78(1): 616-639. doi: 10.1007/s11227-021-03878-2
[22]	PENG Ningyezi, XI Yuliang, RAO Jinmeng. Urban multiple route planning model using dynamic programming in reinforcement learning[J]. IEEE Trans Intel Transp Syst, 2021, 23(7): 8037-8047. doi: 10.1109/TITS.2021.3075221 URL
[23]	Watkins C J C H. Learning from delayed rewards[D]. Cambridge: University of Cambridge, 1989.
[24]	Van Hasselt H. Double Q¬learning[C]// 23rd Adv Neur Info Proc Syst (NeurIPS). Canada, Vancouver, British Columbia, 2010: 1-19.
[25]	Martin M L, Carro B, Esguevillas A. Application of deep reinforcement learning to intrusion detection for supervised problems[J]. Expert Syst Appl, 2020, 141: 1-15.
[26]	黄琰, 张锦. 基于深度强化学习的车辆路径问题求解方法[J]. 交通运输工程与信息学报, 2022, 20(3): 114-127.
	HUANG Yan, ZHANG Jin. Vehicle routing problem solving method based on deep reinforce-ment learning[J]. J Transp Eng Info, 2022, 20(3): 114-127. (in Chinese)

道路编号	道路长度 / m	道路状态	限速 / (km·h^-1)
e₁-e₂	234	1.0	60
e₁-e₃	70	0.9	55
e₂-e₁	234	1.0	50
e₂-e₅	75	0.8	55
…	…	…	…

道路编号	道路长度 / m	道路状态	限速 / (km·h^-1)
e₁-e₂	234	1.0	60
e₁-e₃	70	0.9	55
e₂-e₁	234	1.0	50
e₂-e₅	75	0.8	55
…	…	…	…

学习率	0.1％
批量数量	16
经验池大小	2万
车辆限速	当前道路限速
每回合最大探索次数	30
起点至终点最小距离网络层数	3.0 km 13

学习率	0.1％
批量数量	16
经验池大小	2万
车辆限速	当前道路限速
每回合最大探索次数	30
起点至终点最小距离网络层数	3.0 km 13

场景	通行距离 / km
场景	DQN	SQDQN	Dijkstra
1	4.879	4.879	4.879
2	5.760	5.760	5.760
3	6.320	6.320	6.320
4	3.850	3.850	3.850
5	4.329	4.329	4.329
6	6.783	6.415	6.415
7	3.987	3.987	3.987
8	5.310	5.310	5.310
9	4.982	4.982	4.982
10	5.567	5.567	5.567
11	6.587	6.587	6.587
12	4.872	4.872	4.872
13	6.606	6.350	6.350
14	5.876	5.876	5.876
15	4.469	4.469	4.469
16	5.524	5.310	5.310
17	4.860	4.860	4.860
18	4.187	4.187	4.187
19	4.621	4.621	4.621
20	4.897	4.897	4.897