欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2021, Vol. 12 ›› Issue (2): 201-209.DOI: 10.3969/j.issn.1674-8484.2021.02.008

• 汽车安全 • 上一篇    下一篇

基于深度强化学习的车辆自主避撞决策控制模型

李文礼(), 张友松(), 韩迪, 钱洪, 石晓辉   

  1. 重庆理工大学 汽车零部件先进制造技术教育部重点实验室,重庆400054,中国
  • 收稿日期:2021-03-01 出版日期:2021-06-30 发布日期:2021-06-30
  • 作者简介:李文礼(1983-),男(汉),河南,副教授。E-mail:liwenli@cqut.edu.cn
    张友松(1996-),男(汉),湖北,硕士研究生。E-mail:zhangyousong@2019.cqut.edu.cn
  • 基金资助:
    重庆理工大学研究生创新项目资助(clgycx20202021);重庆市巴南区科技成果转化及产业化专项(2020TJZ022)

Vehicle autonomous collision avoidance decision control model based on deep reinforcement learning

LI Wenli(), ZHANG Yousong(), HAN Di, QIAN Hong, SHI Xiaohui   

  1. Key Laboratory of Advanced Manufacture Technology for Automobile Parts, Ministry of Education, Chongqing University of Technology, Chongqing 400054, China
  • Received:2021-03-01 Online:2021-06-30 Published:2021-06-30

摘要:

为提高车辆对行驶环境的自我学习和决策能力,提出了一种基于深度确定性策略梯度(DDPG)的车辆自主避撞决策控制模型。基于Markov决策过程的强化学习理论和车辆纵向运动学特性,设计了决策所需目标对象及自车信息的状态空间和自车减速度的动作空间,以安全性、舒适性和效率因素为多目标奖励函数的端到端的车辆自主避撞决策模型。利用Matlab/Simulink构建的DDPG算法与交通环境的交互模型,通过了前车静止(CCRs)和前车制动(CCRb)场景测试。结果表明:本决策算法具有很好的收敛性,引入加速度和冲击度的极限值,在实现车辆有效避撞的同时,兼顾乘坐舒适性,且性能优于模糊控制。

关键词: 车辆安全, 自主避撞, 深度确定性策略梯度(DDPG), 控制模型, 多目标奖励函数

Abstract:

A vehicle autonomous collision avoidance decision control model was proposed based on a deep deterministic policy gradient (DDPG) to improve the self-learning and decision-making capabilities of vehicle in driving environment. A state space containing self-vehicle and target object information, and an action space including the self-vehicle deceleration were designed based on a reinforcement learning theory of Markov decision process and a longitudinal kinematic of vehicle. An end-to-end vehicle autonomous collision avoidance decision model was constructed which takes safety, comfort and efficiency into a multi-objective reward function. An interaction model was built by using MATLAB/Simulink with the DDPG algorithm and the traffic environment, and the model passed through test for scenarios of car to car stationary (CCRs) and scenarios of car to car braking (CCRb). The results show that the proposed decision-making algorithm has good convergence with introducing limit values of acceleration and jerk, realizes the effective collision avoidance of vehicle with considering ride comfort. Therefore, it has better performances than using fuzzy control.

Key words: vehicle safety, autonomous collision avoidance, deep deterministic policy gradient (DDPG), control model, multi-objective reward function

中图分类号: