结合专家经验的深度Q网络信号控制优化算法

王慧; WANG Hui; 叶曦昊; YE Xihao

doi:10.48014/ais.20250526001

结合专家经验的深度Q网络信号控制优化算法

Deep Q-Networks Signal Control Optimization Algorithm Combined with Expert Experience

交叉科学学报 2025年第2卷第2期页码[86-92] 下载全文[2.6MB]

摘要

科学合理的交通信号控制方案通过优化路权分配, 有效减少车辆等待时间, 提升交叉口的通行效率, 从而实现更高效的道路资源利用。随着强化学习和人工智能技术的持续发展, 交通信号灯控制算法得到了不断改进和优化。然而, 以往基于强化学习的信控方案优化模型研究大多侧重于智能体的学习过程改进, 较少关注信号控制方案的输入。为更有效地提高交叉口通行效率, 本研究提出了一种结合专家经验的深度Q网络 (Deep Q Network, DQN) 信号控制优化算法。该算法首先设计定时信号配时方案, 然后在微观交通仿真环境SUMO中利用DQN算法进行训练, 以获得交叉口的最优信号控制执行方案。由于定时信号配时方案是基于交叉口的道路条件、流向及实际车流量计算得到的, 输入的信号控制方案更为合理。本研究在华穗路与花城大道交叉口开展实验, 得出了以下结果: 与现实信号控制方案相比, DQN算法和结合专家经验的DQN优化算法均将交叉口的平均速度提高了7. 9%; DQN算法将平均等待车辆数量降低了23. 1%, 结合专家经验的DQN优化算法将其降低了69. 2%。实验结果表明, 应用这两种优化算法都能够有效提升交叉口的通行效率, 其中结合专家经验的DQN信号控制优化算法在所有算法中表现最佳。

Abstract

A scientifically reasonable traffic signal control scheme optimizes the allocation of road rights, effectively reduces vehicle waiting time, improves intersection traffic efficiency, and achieves more efficient utilization of road resources. With the continuous development of reinforcement learning and artificial intelligence technologies, traffic signal light control algorithms have been constantly improved and optimized. However, previous studies on optimization models of signal control schemes based on reinforcement learning mostly focused on the improvement of the learning process of agents, and paid less attention to the input of signal control schemes. To improve the traffic efficiency of intersections more effectively, this study proposes a Deep Q Network (DQN) signal control optimization algorithm combined with expert experience. This algorithm first designs a timing signal timing scheme, and then uses the DQN algorithm for training in the microscopic traffic simulation environment SUMO to obtain the optimal signal control execution scheme of the intersection. Since the traffic signal timing scheme is calculated based on the road conditions, flow direction and actual traffic volume of the intersection, the input signal control scheme is more reasonable. This study conducted experiments at the intersection of Huasui Road and Huacheng Avenue and obtained the following results: Compared with the actual signal control scheme, both the DQN algorithm and the DQN optimization algorithm combined with expert experience increased the average speed of the intersection by 7. 9%; The DQN algorithm reduced the average number of waiting vehicles by 23. 1%, and the DQN optimization algorithm combined with expert experience reduced it by 69. 2%. The experimental results show that the application of both of these two optimization algorithms can effectively improve the traffic efficiency of intersections. Among them, the DQN signal control optimization algorithm combined with expert experience performs the best among all algorithms.

DOI

10.48014/ais.20250526001

文章类型

研究性论文

收稿日期

2025-05-26

接收日期

2025-06-06

出版日期

2025-06-28

关键词

强化学习, 深度Q 网络, 微观交通仿真, SUMO, 定时信号配时方案

Keywords

Reinforcement learning, deep Q-networks, microscopic traffic simulation, SUMO, traffic signal timing scheme

作者

王慧^*, 叶曦昊

Author

WANG Hui^*, YE Xihao

所在单位

佳都科技集团股份有限公司, 广州 510660

Company

PCI Technology Group Co. , Ltd. , Guangzhou 510660, China

浏览量

108

下载量

139

参考文献

[1] 郭继孚, 刘莹, 余柳. 对中国大城市交通拥堵问题的认识[J].城市交通, 2011, 9(02): 8-14+6.
https://doi.org/10.13813/j.cn11-5141/u.2011.02.006.
[2] 陈飞, 诸大建, 许琨. 城市低碳交通发展模型、现状问题及目标策略———以上海市实证分析为例[J]. 城市规划学刊, 2009(06): 39-46.
[3] 冯海霞, 王琦, 杨立才, 等. 拥堵环境下道路交通对城市空气质量的影响[J].山东大学学报(工学版), 2021, 51(01): 128-134.
https://doi.org/10.6040/j.issn.1672-3961.0.2020.211.
[4] Levy, Jonathan I. , Jonathan J. Buonocore, and Katherine V S. Evaluation of the public health impacts of traffic congestion: a health risk assessment[J]. Environmental health, 2010, 9(65): 1-12.
https://doi.org/10.1186/1476-069X-9-65.
[5] Kiunsi, Robert B. A review of traffic congestion in Dar es Salaam city from the physical planning perspective[J]. Journal of sustainable development, 2013, 6(02): 94.
https://doi.org/10.5539/JSD.V6N2P94.
[6] 吴兵, 李晔. 交通管理与控制[M]. 北京: 人民交通出版社, 2020: 169-183.
[7] 蔡锦德. 城市道路交叉口的信号配时优化研究[D]. 北京交通大学, 2012.
[8] 高航, 王伟光. 基于深度强化学习的道路信号灯控制算法[J]. 计算机仿真, 2021, 38(10): 154-159.
[9] Wei, Hua, et al. Recent advances in reinforcement learning for traffic signal control: A survey of models and evaluation[J]. ACM SIGKDD explorations newsletter, 2021, 22(02): 12-18.
https://doi.org/10.1145/3447556.3447565.
[10] Rasheed, Faizan, et al. Deep reinforcement learning for traffic signal control: A review[J]. IEEE Access, 2020, 8: 208016-208044.
https://doi.org/10.1109/ACCESS.2020.3034141.
[11] 刘义, 何均宏. 强化学习在城市交通信号灯控制方法中的应用[J]. 科技导报, 2019, 37(06): 84-90.
[12] Li Z, Xu C, and Zhang G. A deep reinforcement learning approach for traffic signal control optimization[J]. arxiv, 2021, arXiv: 2107. 06115.
https://doi.org/10.48550/arXiv.2107.06115.
[13] Mohammad A, Stefan S, and Marco W. Continuous residual reinforcement learning for traffic signal control optimization[J]. Canadian Journal of Civil Engineering, 2018, 45(8): 690-702.
https://doi.org/10.1139/cjce-2017-0408.
[14] Park S, Han E, Park S, Jeong H, Yun I. Deep Q-network- based traffic signal control models[J]. PLoS ONE, 2021, 16(9): e0256405.
https://doi.org/10.1371/journal.pone.0256405.
[15] Haklay, Mordechai, and Patrick W. Open streetmap: User-generated street maps[J]. IEEE Pervasive computing, 2008, 7(04): 12-18.
https://doi.org/10.1109/MPRV.2008.80.
[16] 周晨静, 张蕊, 刘思杨等. 微观交通仿真理论与实训 [M]. 北京: 机械工业出版社, 2020: 5-11.
[17] Lopez, Pablo A, et al. Microscopic traffic simulation using sumo[C]. 2018 21st international conference on intelligent transportation systems(ITSC), Maui, HI, USA, 2018: 2575-2582.
https://doi.org/10.1109/ITSC.2018.8569938.
[18] 冯超. 强化学习精要: 核心算法与TensorFlow 实现[M]. 北京: 电子工业出版社, 2018: 173-202.
[19] Mnih, Volodymyr, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533.
https://doi.org/10.1038/nature14236.
[20] 马万经, 吴兵. 现代交通管理与控制概论[M]. 北京: 高等教育出版社, 2019: 149-156

引用本文

王慧, 叶曦昊. 结合专家经验的深度Q网络信号控制优化算法[J]. 交叉科学学报, 2025, 2(2): 86-92.

Citation

WANG Hui, YE Xihao. Deep Q-networks signal control optimization algorithm combined with expert experience[J]. Acta Interdisciplinary Science, 2025, 2(2): 86-92.