汽车安全与节能学报 ›› 2022, Vol. 13 ›› Issue (4): 705-717.DOI: 10.3969/j.issn.1674-8484.2022.04.012
冯耀1(), 景首才1,3,*(), 惠飞1, 赵祥模1, 刘建蓓2,3
收稿日期:
2021-11-27
修回日期:
2022-07-18
出版日期:
2022-12-31
发布日期:
2023-01-01
通讯作者:
景首才
作者简介:
*景首才 (1991—),男 (汉),甘肃,讲师。E-mail:scjing@che.edu.cn。基金资助:
FENG Yao1(), JING Shoucai1,3,*(), HUI Fei1, ZHAO Xiangmo1, LIU Jianbei2,3
Received:
2021-11-27
Revised:
2022-07-18
Online:
2022-12-31
Published:
2023-01-01
Contact:
JING Shoucai
摘要:
以提高智能网联车辆换道安全和效率,降低燃油消耗为目的,该文提出了一种基于深度强化学习的智能网联车辆(ICV)换道轨迹规划方法。分析复杂交通场景智能网联车辆换道功能需求,设计了分层式智能网联车辆换道轨迹规划架构;兼顾车辆安全和换道效率,设计了基于完全信息纯策略博弈的换道行为决策模型;解耦车辆纵横向运动状态,构造了以燃油消耗和乘客舒适度为目标的联合优化函数,提出了基于双延迟深度确定性策略梯度(TD3)的智能网联车辆纵横向换道轨迹规划方法,得到了车辆纵横向优化换道轨迹,并利用搭建的3个典型换道仿真场景,验证了算法的有效性。结果表明:与深度确定性策略梯度(DDPG)算法相比,提出的方法在左换道和右换道实验中的训练效率平均提升了约10.5%,平均油耗分别减少了65%和44%,而且单步轨迹规划时间在10 ms内,能够实时获取安全、节能、舒适的换道轨迹。
中图分类号:
冯耀, 景首才, 惠飞, 赵祥模, 刘建蓓. 基于深度强化学习的智能网联车辆换道轨迹规划方法[J]. 汽车安全与节能学报, 2022, 13(4): 705-717.
FENG Yao, JING Shoucai, HUI Fei, ZHAO Xiangmo, LIU Jianbei. Deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles[J]. Journal of Automotive Safety and Energy, 2022, 13(4): 705-717.
目标车道车辆 | 换道车辆M | |
---|---|---|
m1(换道) | m2(不换道) | |
di1(允许) | [RM (di1, m1), RD (di1, m1)] | [RM (di1, m2), RD (di1, m2)] |
di2(拒绝) | [RM (di2, m1), RD (di2, m1)] | [RM (di2, m2), RD (di2, m2)] |
目标车道车辆 | 换道车辆M | |
---|---|---|
m1(换道) | m2(不换道) | |
di1(允许) | [RM (di1, m1), RD (di1, m1)] | [RM (di1, m2), RD (di1, m2)] |
di2(拒绝) | [RM (di2, m1), RD (di2, m1)] | [RM (di2, m2), RD (di2, m2)] |
1 Initialization parameters: μφ、 Qθ1、 Qθ2、 μφ′、 Qθ1′、 Qθ2′、ReplayBuffer、T |
2 for i in T do: |
3 st = env.reset() |
4 if not done then: |
5 at = μφ(st) +ε, ε∈N(0, σ) |
6 st+1, rt, done = env.reset(at) |
7 ReplayBuffer.append(st, at, rt, st+1, done) |
8 sampleBatch (N) |
9 at+1 = μφ′(st+1) +ε, ε∈clip[N(0, σ), -c, c] |
10 $Q_{\text {targ }}=r+\gamma \min _{i=1,2} Q_{\theta i}^{\prime}\left(\boldsymbol{s}_{t+1}, \boldsymbol{a}_{t+1}\right)$ |
11 update Critic: $Loss =1 / N * \sum_{t=1}^{N}\left\{Q_{\text {targ }}-Q_{\theta i}\left[s_{\mathrm{t}}, \mu_{\varphi}\left(s_{\mathrm{t}}\right)\right]^{2}\right\}, i=1,2$ |
12 update Actor every 2 steps: $\nabla_{\varphi} J(\varphi)=\nabla_{\varphi} 1 / N * \sum_{i=1}^{N} Q_{\theta}\left[s_{\mathrm{t}}, \mu_{\varphi}\left(s_{\mathrm{t}}\right)\right]$ |
13 update target network, τ∈(0,1) |
14 $\varphi^{\prime}=\tau \varphi+(1-\tau) \varphi^{\prime}$ |
15 $\theta_{i}^{\prime}=\tau \theta_{i}+(1-\tau) \theta_{i}^{\prime}, i=1,2$ |
16 end if |
17 end for |
1 Initialization parameters: μφ、 Qθ1、 Qθ2、 μφ′、 Qθ1′、 Qθ2′、ReplayBuffer、T |
2 for i in T do: |
3 st = env.reset() |
4 if not done then: |
5 at = μφ(st) +ε, ε∈N(0, σ) |
6 st+1, rt, done = env.reset(at) |
7 ReplayBuffer.append(st, at, rt, st+1, done) |
8 sampleBatch (N) |
9 at+1 = μφ′(st+1) +ε, ε∈clip[N(0, σ), -c, c] |
10 $Q_{\text {targ }}=r+\gamma \min _{i=1,2} Q_{\theta i}^{\prime}\left(\boldsymbol{s}_{t+1}, \boldsymbol{a}_{t+1}\right)$ |
11 update Critic: $Loss =1 / N * \sum_{t=1}^{N}\left\{Q_{\text {targ }}-Q_{\theta i}\left[s_{\mathrm{t}}, \mu_{\varphi}\left(s_{\mathrm{t}}\right)\right]^{2}\right\}, i=1,2$ |
12 update Actor every 2 steps: $\nabla_{\varphi} J(\varphi)=\nabla_{\varphi} 1 / N * \sum_{i=1}^{N} Q_{\theta}\left[s_{\mathrm{t}}, \mu_{\varphi}\left(s_{\mathrm{t}}\right)\right]$ |
13 update target network, τ∈(0,1) |
14 $\varphi^{\prime}=\tau \varphi+(1-\tau) \varphi^{\prime}$ |
15 $\theta_{i}^{\prime}=\tau \theta_{i}+(1-\tau) \theta_{i}^{\prime}, i=1,2$ |
16 end if |
17 end for |
网络 | 层 | 维度 | 激活函数 |
---|---|---|---|
Actor网络 | L1 | (s, 30) | Relu |
L2 | (30, 30) | Relu | |
L3 | (30, a) | Tanh | |
Critic1网络 | L1 | (s+a, 30) | Relu |
L2 | (30, 30) | Relu | |
L3 | (30, 1) | 无 | |
Critic2网络 | L1 | (s+a, 30) | Relu |
L2 | (30, 30) | Relu | |
L3 | (30, 1) | 无 |
网络 | 层 | 维度 | 激活函数 |
---|---|---|---|
Actor网络 | L1 | (s, 30) | Relu |
L2 | (30, 30) | Relu | |
L3 | (30, a) | Tanh | |
Critic1网络 | L1 | (s+a, 30) | Relu |
L2 | (30, 30) | Relu | |
L3 | (30, 1) | 无 | |
Critic2网络 | L1 | (s+a, 30) | Relu |
L2 | (30, 30) | Relu | |
L3 | (30, 1) | 无 |
场景 | 算法 | 总训练时间/ s | 每回合平均训练时间 / s | 加载模型、规划时间/ s | 单步规划时间 / ms |
---|---|---|---|---|---|
左换道 | TD3 | 935.27 | 1.87 | 1.23(144 step) | 8.48 |
DDPG | 1 065.12 | 2.13 | 1.11(129 step) | 8.54 | |
右换道 | TD3 | 918.40 | 1.84 | 1.26(132 step) | 9.47 |
DDPG | 1 012.24 | 2.02 | 1.17(122 step) | 9.51 |
场景 | 算法 | 总训练时间/ s | 每回合平均训练时间 / s | 加载模型、规划时间/ s | 单步规划时间 / ms |
---|---|---|---|---|---|
左换道 | TD3 | 935.27 | 1.87 | 1.23(144 step) | 8.48 |
DDPG | 1 065.12 | 2.13 | 1.11(129 step) | 8.54 | |
右换道 | TD3 | 918.40 | 1.84 | 1.26(132 step) | 9.47 |
DDPG | 1 012.24 | 2.02 | 1.17(122 step) | 9.51 |
[1] | 李克强, 戴一凡, 李升波, 等. 智能网联汽车(ICV)技术的发展现状及趋势[J]. 汽车安全与节能学报, 2017, 8(1): 1-14. |
LI Keqiang, DAI Yifan, LI Shengbo, et al. State-of-the-art and technical trends of intelligent and connected vehicles[J]. J Autom Safe Energ, 2017, 8(1): 1-14. (in Chinese) | |
[2] |
李立, 徐志刚, 赵祥模, 等. 智能网联汽车运动规划方法研究综述[J]. 中国公路学报, 2019, 32(6): 20-33.
doi: 10.19721/j.cnki.1001-7372.2019.06.002 |
LI Li, XU Zhigang, ZHAO Xiangmo, et al. Review of motion planning methods of intelligent connected vehicles[J]. Chin J High Transport, 2019, 32(6): 20-33. (in Chinese) | |
[3] | ZHAO Ding, Lam H, PENG Huei, et al. Accelerated evaluation of automated vehicles safety in lane-change scenarios based on importance sampling techniques[J]. IEEE Trans Intel Transport Syst, 2017, 18(3): 595-607. |
[4] | 卢兆麟, 李升波, Schroeder F, 等. 结合自然语言处理与改进层次分析法的乘用车驾驶舒适性评价[J]. 清华大学学报(自然科学版), 2016, 56(2): 137-143. |
LU Zhaolin, LI Shengbo, Schroeder F, et al. Driving comfort evaluation of passenger vehicles with natural language processing and improved AHP[J]. J Tsinghua Univ (Sci Tech), 2016, 56(2): 137-143. (in Chinese) | |
[5] | MA Liang, XUE Jianru, Kawabata K, et al. Efficient sampling-based motion planning for on-road autonomous driving[J]. IEEE Trans Intel Transport Syst, 2015, 16(4): 1961-1976. |
[6] | Shaikh E A, Dhale A. AGV path planning and obstacle avoidance using Dijkstra’s algorithm[J]. Int’l J Appl Innov Engi Manag, 2013, 2(6): 77-83. |
[7] | ZHANG Jing, WU Jun, SHEN Xiao, et al. Autonomous land vehicle path planning algorithm based on improved heuristic function of A-star[J]. Int’l J Advan Robo Syst, 2021, 18(5): 1-10. |
[8] |
张卫波, 肖继亮. 改进RRT算法在复杂环境下智能车路径规划中的应用[J]. 中国公路学报, 2021, 34(3): 225-234.
doi: 10.19721/j.cnki.1001-7372.2021.03.017 |
ZHANG Weibo, XIAO Jiliang. Application of improved RRT algorithm in intelligent vehicle path planning under complicated environment[J]. Chin J High Transport, 2021, 34(3): 225-234. (in Chinese) | |
[9] | YOU Feng, ZHANG Ronghui, LIE Guo, et al. Trajectory planning and tracking control for autonomous lane change maneuver based on the cooperative vehicle infrastructure system[J]. Expert Syst Appl: An Int’l J, 2015, 42(14): 5932-5946. |
[10] | Broggi A, Medici P, Zani P, et al. Autonomous vehicles control in the vislab intercontinental autonomous challenge[J]. Annu Rev Contr, 2012, 36(1): 161-171. |
[11] | Chu K, Lee M, Sunwoo M. Local path planning for off-Road autonomous driving with avoidance of static obstacles[J]. IEEE Trans Intel Transport Syst, 2012, 13(4): 1599-1616. |
[12] | 陈成, 何玉庆, 卜春光, 等. 基于四阶贝塞尔曲线的无人车可行轨迹规划[J]. 自动化学报, 2015, 41(3): 486-496. |
CHEN Cheng, HE Yuqing, BU Chunguang, et al. Feasible trajectory generation for autonomous vehicles based on quartic bezier curve[J]. Acta Autom Sinica, 2015, 41(3): 486-496. (in Chinese) | |
[13] | 徐杨, 陆丽萍, 褚端峰, 等. 无人车辆轨迹规划与跟踪控制的统一建模方法[J]. 自动化学报, 2019, 45(4): 799-807. |
XU Yang, LU Liping, CHU Duanfeng, et al. Unified modeling of trajectory planning and tracking for unmanned vehicle[J]. Acta Autom Sinica, 2019, 45(4): 799-807. (in Chinese) | |
[14] |
江浩斌, 施凯津, 华一丁, 等. 基于HP自适应伪谱法的智能汽车紧急变道轨迹规划与优化[J]. 中国公路学报, 2019, 32(6): 71-78.
doi: 10.19721/j.cnki.1001-7372.2019.06.007 |
JIANG Haobin, SHI Kaijin, HUA Yiding, et al. Lane-changing trajectory planning and optimization for intelligent vehicle through HP-adaptive pseudospectral method[J]. Chin J High Transport, 2019, 32(6): 71-78. (in Chinese) | |
[15] | Sallab A, Abdou M, Perot E, et al. Deep reinforcement learning framework for autonomous driving[J]. Electr Imag, 2017(19): 70-76. |
[16] | FU Yuchuan, LI Changle, YU Richard, et al. A Decision-making strategy for vehicle autonomous braking in emergency via deep reinforcement learning[J]. IEEE Trans Vehi Tech, 2020, 69(6): 5876-5888. |
[17] | Saxena D M, Bae S, Nakhaei A, et al. Driving in dense traffic with model-free reinforcement learning[J]. IEEE Int’l Conf Robot Auto, 2020(99): 5385-5392. |
[18] | Hoel C J, Wolff K, Laine L. Automated speed and lane change decision making using deep reinforcement learning[J]. IEEE Int’l Conf Intel Transport Syst, 2018(99): 2148-2155. |
[19] | PENG Baiyu, SUN Qi, LI Shengbo E, et al. End-to-end autonomous driving through dueling double deep Q-network[J]. Autom Innocation, 2021, 4(3): 328-337. |
[20] | CHEN Jianyu, LI Shengbo E, Tomizuka M. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning[J]. IEEE Trans Intel Transport Syst, 2021(99): 1-11. |
[21] | LI Guofa, YANG Yifan, LI Shen, et al. Decision making of autonomous vehicles in lane change scenarios: Deep reinforcement learning approaches with risk awareness[J]. Transport Res Part C: Emerg Tech, 2021, 134: 1-18. |
[22] | Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J/OL]. (2015-09-09), http//arxiv.org/abs/1509.02971. |
[23] | Fujimoto S, Hoof H V, Meger D. Addressing function approximation error in actor-critic methods[J/OL]. (2018-02-26), http//arxiv.org/abs/1802.09477. http//arxiv.org/abs/1802.09477 |
[24] | 采国顺, 刘昊吉, 冯吉伟, 等. 智能汽车的运动规划与控制研究综述[J]. 汽车安全与节能学报, 2021, 12(3): 279-297. |
CAI Guoshun, LIU Haoji, FENG Jiwei, et al. Review on the research of motion planning and control for intelligent vehicles[J]. J Autom Safe Energ, 2021, 12(3): 279-297. (in Chinese) | |
[25] | CAO Zhong, XU Shaobing, PENG Huei, et al. Confidence-aware reinforcement learning for self-driving cars[J]. IEEE Trans Intel Transport Syst, 2021(99): 1-12. |
[26] | 龚建伟, 姜岩, 徐威. 无人驾驶车辆模型预测控制[M]. 北京: 北京理工大学出版社, 2014: 5-20. |
GONG Jianwei, JIANG Yan, XU Wei, Model Predictive Control for Self-Driving Vehicles[M]. Beijing: Institute of Technology Press, Beijing, 2014: 5-20. (in Chinese) | |
[27] | Talebpour A, Mahmassani H S, Hamdar S H. Modeling lane-changing behavior in a connected environment: A game theory approach[J]. Transport Res Part C, 2015, 59(7): 216-232. |
[28] | Shakarian P, Roos P, Johnson A. A review of evolutionary graph theory with applications to game theory[J]. Bio Syst, 2012, 107(2): 66-80. |
[29] | Volodymyr M, Koray K, David S, et al. Human-level control through deep reinforcement learning[J]. Nature, 2019, 518(7540): 529-533. |
[30] | JING Shoucai, HUI Fei, ZHAO Xiangmo, et al. Cooperative game approach to optimal merging sequence and on-ramp merging control of connected and automated vehicles[J]. IEEE Trans Intel Transport Syst, 2019 (99): 1-11. |
[31] | Ahn K, Rakha H, Trani A, et al. Estimating vehicle fuel consumption and emissions based on instantaneous speed and acceleration levels[J]. J Transport Engi, 2002, 128(2): 182-190. |
[1] | 李文礼, 邱凡珂, 廖达明, 任勇鹏, 易帆. 基于深度强化学习的高速公路换道跟踪控制模型[J]. 汽车安全与节能学报, 2022, 13(4): 750-759. |
[2] | 张平, 陈一凡, 江书真, 韩毅. 高速公路上自动超车过程的轨迹规划与跟踪控制[J]. 汽车安全与节能学报, 2022, 13(3): 463-472. |
[3] | 彭涛, 许庆, 陈强, 关志伟, 侯海晶, 王涛, 李佳林. 异构载货车辆队列高速换道分布式反馈线性化控制[J]. 汽车安全与节能学报, 2022, 13(3): 473-481. |
[4] | 郑阳俊, 贺帅, 帅志斌, 李建秋, 盖江涛, 李勇, 张颖, 李国辉. 基于DRL的四轮独立驱动电动车辆的侧向车速估计[J]. 汽车安全与节能学报, 2022, 13(2): 309-316. |
[5] | 刘兵, 王锦锐, 谢济铭, 陈金宏, 段国忠, 叶保权, 侯效伟, 彭博. 微观轨迹数据驱动的交织区换道概率分布模型[J]. 汽车安全与节能学报, 2022, 13(2): 333-340. |
[6] | 魏祥民, 严明月, 汪䶮, 等. 基于路面识别的汽车紧急避撞控制算法[J]. JASE, 2017, 08(04): 359-366. |
[7] | 杨林,胡艳青,闫斌. 基于行驶工况的插电式混合动力汽车电能消耗最优控制[J]. JASE, 2017, 08(01): 87-96. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||