深度强化学习驱动的水光储互补系统优化调度

doi:10.11660/slfdxb.20260405

水力发电学报 ›› 2026, Vol. 45 ›› Issue (4): 59-72.doi: 10.11660/slfdxb.20260405

深度强化学习驱动的水光储互补系统优化调度

出版日期:2026-04-25 发布日期:2026-04-25

Deep reinforcement learning-driven optimal scheduling for hydro-photovoltaic-storage complementary systems

Online:2026-04-25 Published:2026-04-25

摘要/Abstract

摘要： 水、光、储电站独立运行的模式受输电通道容量约束，易出现弃水、弃光，制约电网系统的清洁能源消纳能力。为解决此问题，本文提出基于异步优势动作评价算法的水光储互补系统优化调度方法，适用于大规模水光储协同运行场景。首先，搭建水光储电站运行场景，以短期—中长期互补引导机制为基础构造优化调度模型；其次，将水光储互补系统的优化调度问题转化为马尔科夫决策过程，通过深度强化学习算法实现策略的高效探索与学习；最后以新疆叶尔羌河流域水光储互补系统为实例进行验证。结果表明，异步优势动作评价算法能稳定收敛到高奖励值，系统消纳电量显著提升，且计算时间显著低于其他算法，具有良好的工程应用价值。

关键词: 水光储系统, 互补机制, 深度强化学习, 叶尔羌河流域, 异步优势动作评价算法, 马尔科夫决策过程

Abstract: The independent operation of hydropower, photovoltaic (PV) and energy storage stations is constrained by transmission channel capacity, leading to frequent curtailment of water and PV power, which limits the clean energy absorption capacity of power grids. To address such an issue, this paper describes an optimal scheduling method for hydro-PV-storage complementary systems based on the Asynchronous Advantage Actor-Critic (A3C) algorithm, which is applicable to large-scale hydro-PV-storage coordinated operation scenarios. First, an operational scenario of hydro-PV-storage stations is constructed, and an optimal scheduling model is built based on the short-term versus medium- and long-term complementary guidance mechanism. Then, for a hydro-PV-storage complementary system, we transform its optimal scheduling problem into a Markov decision process, and achieve efficient strategy exploration and learning via deep reinforcement learning algorithms. Finally, we validate the method through application to such a system in the Yarkant River basin, Xinjiang. Results show the A3C algorithm stably converges to high reward values and improves system-absorbed electricity significantly with its computational cost notably lower than other algorithms, demonstrating its promising practical application value.

Key words: hydro-photovoltaic-storage system, complementary mechanism, deep reinforcement learning, Yarkant River basin, asynchronous advantage actor-critic algorithm, Markov decision process

向聪, 黄显峰, 李俊臣, 周士浩, 方国华, 周论. 深度强化学习驱动的水光储互补系统优化调度[J]. 水力发电学报, 2026, 45(4): 59-72.

XIANG Cong, HUANG Xianfeng, LI Junchen, ZHOU Shihao, FANG Guohua, ZHOU Lun. Deep reinforcement learning-driven optimal scheduling for hydro-photovoltaic-storage complementary systems[J]. Journal of Hydroelectric Engineering, 2026, 45(4): 59-72.

[1]	刘攀, 叶浩, 张晓菁, 徐欢. 特约论文:预报引导的水库调度综述[J]. 水力发电学报, 2025, 44(8): 1-10.
[2]	徐弘玮, 徐刚, 吴碧琼, 任玉峰. 基于深度强化学习的三峡电站机组负荷分配实时决策方法[J]. 水力发电学报, 2024, 43(8): 76-88.
[3]	宋文帅, 任炳昱, 关涛. 基于深度蒙特卡洛树搜索的拱坝仓面排序研究[J]. 水力发电学报, 2024, 43(3): 120-130.
[4]	李文武, 周佳妮, 裴本林, 张一凡. 梯级水库深度强化学习长期随机优化调度研究[J]. 水力发电学报, 2023, 42(11): 21-32.
[5]	黄显峰, 鲜于虎成, 许昌, 李大成, 吴迪, 李旭. 考虑短期互补的水光发电系统中长期优化调度[J]. 水力发电学报, 2022, 41(11): 68-78.

深度强化学习驱动的水光储互补系统优化调度

Deep reinforcement learning-driven optimal scheduling for hydro-photovoltaic-storage complementary systems

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

Metrics

本文评价

推荐阅读 0