欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2024, Vol. 15 ›› Issue (6): 923-933.DOI: 10.3969/j.issn.1674-8484.2024.06.014

• 智能驾驶与智慧交通 • 上一篇    下一篇

基于MADDPG算法的匝道合流区多车协同控制

蔡田茂1(), 孔伟伟1,*(), 罗禹贡2, 石佳2, 姬鹏霄1, 李聪民1   

  1. 1.中国农业大学 工学院,北京 100083,中国
    2.清华大学 车辆与运载学院,北京 100083,中国
  • 收稿日期:2023-10-20 修回日期:2024-07-09 出版日期:2024-12-31 发布日期:2025-01-01
  • 通讯作者: *孔伟伟,副教授。E-mail:kongweiwei@cau.edu.cn
  • 作者简介:蔡田茂(1999—),男(汉),河北,硕士研究生。E-mail:caitianmao@cau.edu.cn
  • 基金资助:
    汽车安全与节能国家重点实验室开放基金课题(KFY2210);国家自然科学基金创新研究群体项目(52221005);北京市科技新星计划(20220484040)

Multi-vehicle cooperative control in ramp merging area based on MADDPG algorithm

CAI Tianmao1(), KONG Weiwei1,*(), LUO Yugong2, SHI Jia2, JI Pengxiao1, LI Congmin1   

  1. 1. College of Engineering, China Agricultural University, Beijing 100083, China
    2. School of Vehicle and Transportation, Tsinghua University, Beijing 100083, China
  • Received:2023-10-20 Revised:2024-07-09 Online:2024-12-31 Published:2025-01-01

摘要:

为了保障匝道合流区的安全高效通行,提出了一种基于多智能体强化学习算法的多车协同控制方法。以提升系统计算效率为目标,设计了基于多智能体确定性策略梯度算法(MADDPG)的分布式训练框架;针对智能体模型难以应对连续车流场景的问题,通过构建相对静止环境,改进策略更新梯度,保障智能体面向连续车流环境的平稳性;拆分匝道合流区场景为准备区和汇入区,分别依据两区域控制目标设计了状态、动作空间及奖励函数。 结果表明:在不同交通流量下,与基于规则的方法相比,该方法通行合流区的总延误时间平均缩短25.46%;与全局优化方法相比,延误时间相差8.47%,但控制时长上不会随车辆数量增加而增长。该文所提出匝道合流区多车协同控制方法能够更好地兼顾通行效率提升与系统实时性。

关键词: 多智能体确定性策略梯度算法(MADDPG), 多智能体强化学习, 多车协同控制, 匝道合流

Abstract:

A multi-vehicle cooperative control method based on the multi-agent reinforcement learning algorithm was proposed to ensure the safety and efficiency of the ramp merging area. A distributed training framework based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm was designed with the goal of enhancing the computational efficiency of the system; In response to the challenge of the agent model dealing with continuous traffic flow scenarios, the stability of the agent towards the continuous traffic flow environment was guaranteed by constructing a relatively stationary environment and improving the strategy update gradient. The ramp merging area scenario was split into a preparation area and an entry area, and according to the control objectives of the two areas, the state and action spaces and reward functions were designed separately. The results show that, under different traffic flows, the proposed method reduces the total delay time in the merging area by an average of 25.46% comparing with the rule-based method, the delay time difference is 8.47% comparing with the global optimization method, but the control duration does not increase with the number of vehicles. Therefore, the proposed multi-vehicle cooperative control method for the ramp merging area can better balance the improvement of traffic efficiency and the real-time performance of the system.

Key words: multi-agent deep deterministic policy gradient (MADDPG), multi-agent reinforcement learning, multi-vehicle cooperative control, ramp merging

中图分类号: