欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2023, Vol. 14 ›› Issue (2): 202-211.DOI: 10.3969/j.issn.1674-8484.2023.02.007

• 智能驾驶与智慧交通 • 上一篇    下一篇

基于改进深度强化学习的全局路径规划策略

韩玲(), 张晖, 方若愚, 刘国鹏, 朱长盛, 迟瑞丰   

  1. 长春工业大学 机电工程学院,长春130012,中国
  • 收稿日期:2022-09-14 修回日期:2022-11-21 出版日期:2023-04-30 发布日期:2023-04-27
  • 作者简介:韩玲(1984—),女(汉),吉林,副教授。E-mail:hanling@ccut.edu.cn
  • 基金资助:
    吉林省自然科学基金(20220101236JC);吉林省科技计划项目(2023042064GH);汽车安全与节能国家重点实验室开放基金(清华大学)

Global path planning strategy based on an improved deep reinforcement learning

HAN Ling(), ZHANG Hui, FANG Ruoyu, LIU Guopeng, ZHU Changsheng, CHI Ruifeng   

  1. College of Mechanical and Electrical Engineering, Changchun University of Technology, Changchun 130012, China
  • Received:2022-09-14 Revised:2022-11-21 Online:2023-04-30 Published:2023-04-27

摘要:

为了解决模型过度依赖与过度估计的问题,提出一种基于传统深度强化学习(DRL)的抑制过度估计深度Q网络 (SQDQN)算法,来建立全局路径规划策略。该SQDQN算法,结合深度Q网络(DQN)算法与信息熵,来抑制过度估计;借助信息熵,实时评估更新过程,来抑制DQN策略算法过度地估计损害性能;借助SQDQN算法与环境模型的交互作用,建立了获取全局路径规划策略的环境模型。结果表明:与DQN算法相比,SQDQN算法在20次实验中3次选择为更优策略;与Dijkstra传统路径规划方法相比,SQDQN算法所规划路程通行时间减少11.32%;本文的全局路径规划策略,减少了由于DQN对动作预期过高所导致的输出错误动作。

关键词: 智能交通, 路径规划, 深度强化学习(DRL), 信息熵, 抑制过度估计

Abstract:

A Suppresses Q Deep Q Network (SQDQN) algorithm was proposed based on traditional deep reinforcement learning (DRL), with being established a global path planning strategy, to solve the problem of model over-dependence and overestimation. The SQDQN algorithm combined the Deep Q Network (DQN) algorithm with information entropy to suppress overestimation; Evaluated the update process in real time, with the help of information entropy, to suppress the over-estimation of the damage performances of the DQN strategy. An environmental model to obtain the global path planning strategy was established with the help of the interaction between the SQDQN algorithm and the environment model. The results show that the SQDQN algorithm selects three better strategies from 20 experiments compared with the DQN strategy. And reduces the route planning travel time by 11.32% than that by the Dijkstra's traditional route planning method. The global path planning strategy of this paper reduces the output error caused by DQN's over expectation of actions.

Key words: intelligent transportation, path planning, deep reinforcement learning (DRL), information entropy, suppress overestimation

中图分类号: