水力发电学报
            首 页   |   期刊介绍   |   编委会   |   投稿须知   |   下载中心   |   联系我们   |   学术规范   |   编辑部公告   |   English

水力发电学报 ›› 2026, Vol. 45 ›› Issue (4): 59-72.doi: 10.11660/slfdxb.20260405

• • 上一篇    下一篇

深度强化学习驱动的水光储互补系统优化调度

  

  • 出版日期:2026-04-25 发布日期:2026-04-25

Deep reinforcement learning-driven optimal scheduling for hydro-photovoltaic-storage complementary systems

  • Online:2026-04-25 Published:2026-04-25

摘要: 水、光、储电站独立运行的模式受输电通道容量约束,易出现弃水、弃光,制约电网系统的清洁能源消纳能力。为解决此问题,本文提出基于异步优势动作评价算法的水光储互补系统优化调度方法,适用于大规模水光储协同运行场景。首先,搭建水光储电站运行场景,以短期—中长期互补引导机制为基础构造优化调度模型;其次,将水光储互补系统的优化调度问题转化为马尔科夫决策过程,通过深度强化学习算法实现策略的高效探索与学习;最后以新疆叶尔羌河流域水光储互补系统为实例进行验证。结果表明,异步优势动作评价算法能稳定收敛到高奖励值,系统消纳电量显著提升,且计算时间显著低于其他算法,具有良好的工程应用价值。

关键词: 水光储系统, 互补机制, 深度强化学习, 叶尔羌河流域, 异步优势动作评价算法, 马尔科夫决策过程

Abstract: The independent operation of hydropower, photovoltaic (PV) and energy storage stations is constrained by transmission channel capacity, leading to frequent curtailment of water and PV power, which limits the clean energy absorption capacity of power grids. To address such an issue, this paper describes an optimal scheduling method for hydro-PV-storage complementary systems based on the Asynchronous Advantage Actor-Critic (A3C) algorithm, which is applicable to large-scale hydro-PV-storage coordinated operation scenarios. First, an operational scenario of hydro-PV-storage stations is constructed, and an optimal scheduling model is built based on the short-term versus medium- and long-term complementary guidance mechanism. Then, for a hydro-PV-storage complementary system, we transform its optimal scheduling problem into a Markov decision process, and achieve efficient strategy exploration and learning via deep reinforcement learning algorithms. Finally, we validate the method through application to such a system in the Yarkant River basin, Xinjiang. Results show the A3C algorithm stably converges to high reward values and improves system-absorbed electricity significantly with its computational cost notably lower than other algorithms, demonstrating its promising practical application value.

Key words: hydro-photovoltaic-storage system, complementary mechanism, deep reinforcement learning, Yarkant River basin, asynchronous advantage actor-critic algorithm, Markov decision process

京ICP备13015787号-3
版权所有 © 2013《水力发电学报》编辑部
编辑部地址:中国北京清华大学水电工程系 邮政编码:100084 电话:010-62783813
本系统由北京玛格泰克科技发展有限公司设计开发  技术支持:support@magtech.com.cn