基于MADDPG算法的匝道合流区多车协同控制

doi:10.3969/j.issn.1674-8484.2024.06.014

汽车安全与节能学报 ›› 2024, Vol. 15 ›› Issue (6): 923-933.DOI: 10.3969/j.issn.1674-8484.2024.06.014

基于MADDPG算法的匝道合流区多车协同控制

蔡田茂¹(), 孔伟伟¹^,^*(), 罗禹贡², 石佳², 姬鹏霄¹, 李聪民¹

1.中国农业大学工学院，北京 100083，中国
2.清华大学车辆与运载学院，北京 100083，中国

收稿日期:2023-10-20 修回日期:2024-07-09 出版日期:2024-12-31 发布日期:2025-01-01
通讯作者: *孔伟伟，副教授。E-mail：kongweiwei@cau.edu.cn。
作者简介:蔡田茂（1999—），男（汉），河北，硕士研究生。E-mail：caitianmao@cau.edu.cn。
基金资助:
汽车安全与节能国家重点实验室开放基金课题(KFY2210);国家自然科学基金创新研究群体项目(52221005);北京市科技新星计划(20220484040)

Multi-vehicle cooperative control in ramp merging area based on MADDPG algorithm

CAI Tianmao¹(), KONG Weiwei¹^,^*(), LUO Yugong², SHI Jia², JI Pengxiao¹, LI Congmin¹

1. College of Engineering, China Agricultural University, Beijing 100083, China
2. School of Vehicle and Transportation, Tsinghua University, Beijing 100083, China

Received:2023-10-20 Revised:2024-07-09 Online:2024-12-31 Published:2025-01-01

摘要/Abstract

摘要：

为了保障匝道合流区的安全高效通行，提出了一种基于多智能体强化学习算法的多车协同控制方法。以提升系统计算效率为目标，设计了基于多智能体确定性策略梯度算法（MADDPG）的分布式训练框架；针对智能体模型难以应对连续车流场景的问题，通过构建相对静止环境，改进策略更新梯度，保障智能体面向连续车流环境的平稳性；拆分匝道合流区场景为准备区和汇入区，分别依据两区域控制目标设计了状态、动作空间及奖励函数。结果表明：在不同交通流量下，与基于规则的方法相比，该方法通行合流区的总延误时间平均缩短25.46%；与全局优化方法相比，延误时间相差8.47%，但控制时长上不会随车辆数量增加而增长。该文所提出匝道合流区多车协同控制方法能够更好地兼顾通行效率提升与系统实时性。

关键词: 多智能体确定性策略梯度算法(MADDPG), 多智能体强化学习, 多车协同控制, 匝道合流

Abstract:

A multi-vehicle cooperative control method based on the multi-agent reinforcement learning algorithm was proposed to ensure the safety and efficiency of the ramp merging area. A distributed training framework based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm was designed with the goal of enhancing the computational efficiency of the system; In response to the challenge of the agent model dealing with continuous traffic flow scenarios, the stability of the agent towards the continuous traffic flow environment was guaranteed by constructing a relatively stationary environment and improving the strategy update gradient. The ramp merging area scenario was split into a preparation area and an entry area, and according to the control objectives of the two areas, the state and action spaces and reward functions were designed separately. The results show that, under different traffic flows, the proposed method reduces the total delay time in the merging area by an average of 25.46% comparing with the rule-based method, the delay time difference is 8.47% comparing with the global optimization method, but the control duration does not increase with the number of vehicles. Therefore, the proposed multi-vehicle cooperative control method for the ramp merging area can better balance the improvement of traffic efficiency and the real-time performance of the system.

Key words: multi-agent deep deterministic policy gradient (MADDPG), multi-agent reinforcement learning, multi-vehicle cooperative control, ramp merging

中图分类号:

U491.1

蔡田茂, 孔伟伟, 罗禹贡, 石佳, 姬鹏霄, 李聪民. 基于MADDPG算法的匝道合流区多车协同控制[J]. 汽车安全与节能学报, 2024, 15(6): 923-933.

CAI Tianmao, KONG Weiwei, LUO Yugong, SHI Jia, JI Pengxiao, LI Congmin. Multi-vehicle cooperative control in ramp merging area based on MADDPG algorithm[J]. Journal of Automotive Safety and Energy, 2024, 15(6): 923-933.

图/表 19

参考文献 19

[1]	ZHAO Zhouqiao, WANG Ziran, WU Guoyuan, et al. The state-of-the-art of coordinated ramp control with mixed traffic conditions [C]// 2019 IEEE Intel Transport Syst Conf (ITSC). IEEE, 2019: 1741-1748.
[2]	李克强, 戴一凡, 李升波. 智能网联汽车(ICV)技术的发展现状及趋势[J]. 汽车安全与节能学报, 2017, 8(1): 1-14.
	LI Keqiang, DAI Yifan, LI Shengbo, et al. State-of-the-art and technical trends of intelligent and connected vehicles[J]. J Autom Safe Energ, 2017, 8(1): 1-14. (in Chinese)
[3]	Rios-Torres J, Malikopoulos A A. A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps[J]. IEEE Trans Intel Transport Syst, 2016, 18(5): 1066-1077.
[4]	LI Li, WEN Ding, YAO Danya. A survey of traffic control with vehicular communications[J]. IEEE Trans Intel Transport Syst, 2013, 15(1): 425-432.
[5]	刘畅. 匝道合流区智能网联多车协调规划与控制研究[D]. 南京: 东南大学, 2021.
	LIU Chang. Coordination and control of multiple connected and automated vehicles for cooperative on-ramps merging[D]. Jiangsu: Southeast University, 2021. (in Chinese)
[6]	XU Linghui, LU Jia, RAN Bin, et al. Cooperative merging strategy for connected vehicles at highway on-ramps[J]. J Transport Engi, Part A: Syst, 2019, 145(6): 04019022.
[7]	HUANG Tianyu, SUN Zhanbo. Cooperative ramp merging for mixed traffic with connected automated vehicles and human-operated vehicles[J]. IFAC-Papers On Line, 2019, 52(24): 76-81.
[8]	XUE Yongjie, DING Chuan, YU Bin, et al. A platoon-based hierarchical merging control for on-ramp Vehicles under connected environment[J]. IEEE Trans Intel Transport Syst, 2022. 23(11): 21821-21832.
[9]	关小魁, 胡茂彬. 智能网联汽车基于分组交替的协同合并策略[J]. 汽车安全与节能学报, 2022, 13(3): 482-488.
	GUAN Xiaokui, HU Maobin. Grouping-alternation-based cooperative merging strategy for connected and automated vehicles[J]. J Autom Safe Energ, 2022, 13(3): 482-488. (in Chinese)
[10]	冯耀, 景首才, 惠飞. 基于深度强化学习的智能网联车辆换道轨迹规划方法[J]. 汽车安全与节能学报, 2022, 13(4): 705-717.
	FENG Yao, JING Shoucai, HUI Fei, et al. Deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles[J]. J Autom Safe Energ, 2022, 13(4): 705-717. (in Chinese)
[11]	REN Tianzhu, XIE Yuanchang, JIANG Liming. Cooperative highway work zone merge control based on reinforcement learning in a connected and automated environment[J]. Transport Res Record, 2020, 2674(10): 363-374.
[12]	ZHOU Shanxing, ZHUANG Weichao, YIN Guodong, et al. Cooperative on-ramp merging control of connected and automated vehicles: Distributed multi-agent deep reinforcement learning approach [C]// 2022 IEEE 25th Int’l Conf Intel Transport Syst (ITSC). IEEE, 2022: 402-408.
[13]	ZHUANG Huanbiao, LEI Chaofan, CHEN Yuanhang, et al. Cooperative decision-making for mixed traffic at an unsignalized intersection based on multi-agent reinforcement learning[J]. Appl Sci, 2023, 13(8): 5018.
[14]	CHEN Dong, Hajidavalloo M R, LI Zhaojian, et al. Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic[J]. IEEE Trans Intel Transport Syst, 2023, 24(11): 11623-11638.
[15]	WANG Pin, CHAN Ching-Yao. Formulation of deep reinforcement learning architecture toward autonomous driving for on-ramp merge [C]// 2017 IEEE 20th Int’l Conf Intel Transport Syst (ITSC). IEEE, 2017: 1-6.
[16]	Yadav P, Mishra A, Kim S. A comprehensive survey on multi-agent reinforcement learning for connected and automated vehicles[J]. Sensors, 2023, 23(10): 4710.
[17]	CHEN Liang, GUO Ting, LIU Yun-ting, et al. Survey of multi-agent strategy based on reinforcement learning [C]// 2020 Chin Contr Deci Conf (CCDC). IEEE, 2020: 604-609.
[18]	Ammoun S, Nashashibi F, Laurgeau C. An analysis of the lane changing manoeuvre on roads: the contribution of inter-vehicle cooperation via communication[C]// 2007 IEEE Intel Vehi Symp. IEEE, 2007: 1095-1100.
[19]	邸允冉. 混合驾驶环境下快速路入口匝道协调控制[D]. 合肥: 合肥工业大学, 2021.
	DI Yunran. Cooperative on-ramp control of freeway in mixed driving environment[D]. Hefei: Hefei Polytechnic University, 2021. (in Chinese)

准备区路段长度	800 m
汇入区路段长度	200 m
路宽	4 m
速度	[10, 25] m/s
加速度	[-3, 2] m/s^-2
安全跟驰时距，H_l	1.6 s
安全汇入时距，H_m	2.5 s
最小安全距离，D_safe	5 m
最小安全时距	0.2 s

准备区路段长度	800 m
汇入区路段长度	200 m
路宽	4 m
速度	[10, 25] m/s
加速度	[-3, 2] m/s^-2
安全跟驰时距，H_l	1.6 s
安全汇入时距，H_m	2.5 s
最小安全距离，D_safe	5 m
最小安全时距	0.2 s

准备区折扣因子，γ_p	0.97
汇入区折扣因子，γ_m	0.9
Soft update 因子，τ	0.99
准备区奖励函数系数(主道) [w₁，w₂，w₃，w₄，w₅，w₆]	[0.8，1.6，0.1，0.25，20，0.25]
汇入区奖励函数系数 [w₁，w₂，w₃，w₄，w₇，w₈]	[1.2，2.1，0.8，0.4，0.25]
训练批次大小，Batch Size	128
策略网络学习率	0.000 1
价值网络学习率	0.001
经验池大小	2¹⁹

准备区折扣因子，γ_p	0.97
汇入区折扣因子，γ_m	0.9
Soft update 因子，τ	0.99
准备区奖励函数系数(主道) [w₁，w₂，w₃，w₄，w₅，w₆]	[0.8，1.6，0.1，0.25，20，0.25]
汇入区奖励函数系数 [w₁，w₂，w₃，w₄，w₇，w₈]	[1.2，2.1，0.8，0.4，0.25]
训练批次大小，Batch Size	128
策略网络学习率	0.000 1
价值网络学习率	0.001
经验池大小	2¹⁹

方法	总延误时间均值 / s
方法	900辆 / h	1 200辆 / h	1 500辆 / h
NLP	45.37	55.27	64.17
MADDPG	49.97	59.57	68.98
FIFO	68.57	77.57	93.28

基于MADDPG算法的匝道合流区多车协同控制

Multi-vehicle cooperative control in ramp merging area based on MADDPG algorithm

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 19

参考文献 19

相关文章 3

编辑推荐

Metrics

本文评价

期刊信息

在线期刊

作者中心

审稿中心

联系我们

[1]	姚荣涵, 徐文韬, 林子敬, 王立冰. 公交-合乘车道优化设计的多目标双层规划模型[J]. 汽车安全与节能学报, 2024, 15(5): 711-722.
[2]	李耀华, 邵攀登, 翟登旺, 任田园, 宋伟萍, 刘洋, 赵承辉. 基于聚类与Markov链法的西安市某线路城市客车工况构建[J]. 汽车安全与节能学报, 2022, 13(2): 341-349.
[3]	张锐, 姚恩建, 张永生. 电动汽车混入条件下多方式动态交通分配模型[J]. 汽车安全与节能学报, 2021, 12(4): 540-550.