欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2025, Vol. 16 ›› Issue (4): 587-597.DOI: 10.3969/j.issn.1674-8484.2025.04.009

• 智能驾驶与智慧交通 • 上一篇    下一篇

基于深度强化学习的入口匝道流量调控方法

韩雨1(), 陈志轩1, 王翊萱1,*(), 李春杰1, 雷伟2, 焦彦利2, 刘攀1   

  1. 1 东南大学 交通学院南京 211189, 中国
    2 河北省交通规划设计研究院有限公司 自动驾驶技术交通运输行业研发中心石家庄 050011, 中国
  • 收稿日期:2024-11-29 修回日期:2025-04-27 出版日期:2025-08-30 发布日期:2025-08-27
  • 通讯作者: *王翊萱,助理研究员。E-mail:yixuanseu@126.com
  • 作者简介:韩雨(1989—),男(汉),山东,副教授。E-mail:yuhan@seu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(52232012);国家自然科学基金资助项目(52402384);国家自然科学基金资助项目(52131203)

Deep reinforcement learning-based strategy for freeway ramp metering

HAN Yu1(), CHEN Zhixuan1, WANG Yixuan1,*(), LI Chunjie1, LEI Wei2, JIAO Yanli2, LIU Pan1   

  1. 1 School of Transportation, Southeast University, Nanjing 211189, China
    2 Hebei Provincial Communications Planning, Design and Research Institute Co. Ltd., Research and Development Center of Transport Industry of Self-Driving Technology, Shijiazhuang 050011, China
  • Received:2024-11-29 Revised:2025-04-27 Online:2025-08-30 Published:2025-08-27

摘要:

针对当前基于强化学习的匝道控制方法对策略训练中的学习成本、策略迁移性等研究不充分,导致控制策略难以在实际中应用的问题,该文提出一种匝道控制策略优化的强化学习方法,并通过大量仿真实验对方法的可移植性进行了深入研究。构建匝道控制模型,提出基于深度强化学习的模型训练方法;选取雄安新区对外主干路网中荣乌高速公路某合流区瓶颈作为实验场景,利用深度强化学习算法对模型进行训练,并将训练过程中控制策略的表现与经典匝道控制方法比较,从而对学习成本进行量化分析;选取不同仿真模型及多组模型参数作为测试环境,分析训练环境与测试环境差异对控制策略的影响。结果表明:当训练环境与测试环境差异程度在20%以内时,强化学习控制方法在提升通行效率方面显著优于经典匝道控制方法;而当差异程度超过20%时,两种方法效果差异不明显。

关键词: 匝道控制, 强化学习, 迁移性, 学习成本

Abstract:

Given that current research on ramp control methods based on reinforcement learning (RL) has not thoroughly addressed key issues such as learning cost and policy transferability during policy training, the practical application of these control strategies remains challenging. To address this issue, this paper proposed a RL approach aimed at optimizing ramp control strategies and conducted extensive simulation experiments to investigate the portability of the proposed method. A ramp control model was constructed, and a model training method based on deep reinforcement learning was proposed. The bottleneck in a certain convergence area of Rongwu Expressway in the main external road network of Xiongan District was selected as the experimental scenario. The deep RL algorithm was used to train the ramp metering model, and the performance of the control strategy during the training process was compared with the classical ramp control method, thereby quantitatively analyzing the learning cost. Different simulation models and multiple sets of model parameters were selected as the test environment, and the influence of the differences between the training environment and the test environment on the control strategy was analyzed. The results show that when the difference between the training environment and the test environment is within 20%, the RL control method is significantly superior to the classical ramp control method in improving the traffic efficiency. However, when the difference exceeds 20%, the effects of the two methods are comparable.

Key words: ramp metering, reinforcement learning, transferability, learning cost

中图分类号: