Posted on Leave a comment

Real-time automatic control in a smart energy-hub


Qiu, D., Dong, Z., Zhang, X., Wang, Y., & Strbac, G. (2022). Safe reinforcement learning for real-time automatic control in a smart energy-hub. Applied Energy309, 118403.


An adaptable platform for meeting energy needs is provided by energy hubs. Because multi-energy systems can mix several energy carriers, they are widely used. However, because of their volatility, monitoring renewable energy sources in real time and evaluating their effects on the economy and environment are difficult.

This study suggests a model-free deep reinforcement learning method to effectively control a renewable energy hub while abiding by operational constraints. Tested on real-world data, this solution performs better than previous approaches in terms of cost reduction, emissions reduction, and computational time. It also does a good job of generalizing and handling operational constraints while taking storage flexibility and carbon pricing into account.

Issues in the previous existing methodologies:

  • Limited Control: Conventional models lack control over renewable variability.
  • Predictability Challenges: Difficulty in accurately predicting renewable outputs.
  • Complex Optimization: Requires comprehensive mathematical models.
  • Scalability Issues: Existing methods struggle with large-scale applications.
  • Inflexible Models: Traditional approaches are rigid and not adaptable.
  • High Computational Demand: Optimization processes are resource-intensive.
  • Carbon Emission Constraints: Limited integration of carbon reduction strategies.
  • Integration Difficulties: Struggles to combine multiple energy sources effectively.


Energy Hub Model

In order to maximize output and consumption, this study suggests a smart Energy Hub (EH) model that integrates different energy sectors and resources. Electric and heat needs, storage systems (thermal energy storage and hydrogen storage system), renewable energy sources (solar photovoltaic and wind generator), and conversion units (electric heat pump, gas boiler, and combined heat and power) are all included in the EH model. The approach uses hydrogen storage to instantly balance supply and demand, giving priority to meeting electric demand with renewable energy. The approach provides flexibility in terms of heat demand by utilizing thermal storage and numerous conversion units. Through electrolysis, the hydrogen storage system transforms extra renewable energy into hydrogen. This hydrogen may then be stored and transformed back into electricity as needed.

Optimization and Methodology

Demand-supply balance is guaranteed, operating expenses and carbon emissions are kept to a minimum, and energy storage and conversion mathematical models are developed accordingly. The methodology utilizes an LSTM-SDDPG method, a deep reinforcement learning approach, to improve real-time energy management choices and address uncertainties in the system.

Overall structure of the proposed SDDPG method from the study by Qiu, D., Dong, Z., Zhang, X., Wang, Y., & Strbac, G. (2022).



  • The suggested LSTM-SDDPG approach satisfies operating restrictions and efficiently reduces energy expenses and carbon emissions.
  • Uncertainties related to demand and renewable energy are better managed when the LSTM module and DDPG are integrated.
  • According to experimental data, LSTM-SDDPG reduces carbon emissions and curtails the use of renewable energy more effectively than cutting-edge techniques.
  • Increased carbon pricing encourage low-carbon transitions by moving energy purchase from natural gas to electricity.
Posted on Leave a comment

GCN for UAV coverage control


Dai, A., Li, R., Zhao, Z., & Zhang, H. (2020, October). Graph convolutional multi-agent reinforcement learning for UAV coverage control. In 2020 International Conference on Wireless Communications and Signal Processing (WCSP) (pp. 1106-1111). IEEE.


A growing number of unmanned aerial vehicles (UAVs) are being employed as mobile base stations because of their adaptability and capacity for dynamic coverage. However, UAVs must work together in cooperative groups due to their limited computing and energy resources. These clusters provide dynamic, graph-like local networks where UAVs remain linked to exchange data and maximize efficiency. This work presents a new method for managing UAV groups using graph convolutional multi-agent reinforcement learning (MARL). This approach reduces energy consumption, guarantees equity, and improves signal coverage by taking advantage of the reciprocal interactions between UAVs. Simulations indicate the considerable advantages of this strategy, exhibiting increased efficacy and efficiency in UAV network administration.

Issues in existing methods:

  • Limited Coverage Range: Effective collaboration is necessary due to the limiting signal coverage of UAVs.
  • Energy Constraints: The performance and duration of UAV operations are impacted by limited power.
  • Dynamic Topology: The network structure is continuously changing due to fluctuating UAV locations.
  • Quality of Service (QoS): Maintaining a high standard of communication might be difficult.
  • Scalability Issues: When there are more UAVs, many algorithms find it difficult to scale well.
  • Environmental Adaptability: It’s vital to be able to adjust to challenging or disaster-prone situations.


Graph-Based Modeling and Encoding of UAV Observations:

Unmanned Aerial Vehicles (UAVs) can be controlled as detachable base stations by arranging them as nodes in a graph and connecting them according to their communication range and distances. In this configuration, every unmanned aerial vehicle (UAV) obtains local observations, such as position, velocity, and ground user positions, which are crucial for decision-making. Based on these observations, each UAV then acts in accordance with its policy and is subsequently rewarded. To enhance learning, a replay buffer is used to store these encounters, ensuring that past experiences can be revisited for further training. Additionally, a multi-layer perceptron is employed to encode observations into feature vectors (MLP), enabling more efficient processing and interpretation of the collected data.

The framework of DGN from the study by Dai, A., Li, R., Zhao, Z., & Zhang, H. (2020, October)

Training with Convolutional Layers and Q Networks:

A convolutional layer integrates characteristics from nearby nodes using multi-head dot-product attention and to achieve this, the outputs from many attention heads are concatenated and run through a non-linear function after attention weights are computed to ascertain feature correlations. These concatenated features are then fed into the Q network, which is trained using deep Q-learning with present actions guided by future value projections. Importantly, terms for attention weight distribution consistency and Q-value correctness are included in the loss function, ensuring robust learning. With this approach, our method guarantees dynamic and scalable network control, thereby facilitating UAV collaboration while maximizing coverage, fairness, and energy consumption.


  • When used as detachable base stations, UAVs offer efficient signal coverage, particularly in challenging conditions and for brief communication requirements.
  • In order to maintain steady and valid signals, UAVs must collaborate effectively over dynamic local networks due to limited communication resources and battery power.
  • The graph convolutional multi-agent reinforcement learning (MARL) technique, or DGN, improves power consumption, fairness, and signal coverage efficiency.
  • DGN’s decentralized methodology offers strong scalability, enabling UAVs to function well with just local observations.
  • In the future, continuous action control techniques will be investigated. For more accurate UAV movement policies, graph convolution and actor-critic techniques like DDPG may be combined.


Posted on Leave a comment

Multi-energy microgrid through RL


Wang, Y., Qiu, D., Sun, M., Strbac, G., & Gao, Z. (2023). Secure energy management of multi-energy microgrid: A physical-informed safe reinforcement learning approach. Applied Energy, 335, 120759.


Integrating distributed energy resources speeds up the transition to a low-carbon future but also complicates safe and reliable operations. A workable solution is Multi-Energy Microgrids (MEMGs), which combine several energy sources to enhance stability. Model-free learning and standard optimization techniques have addressed MEMG energy management. However, traditional reinforcement learning (RL) often struggles with physical limits, risking secure operations. To address this, a new safe RL technique is proposed. It includes a dynamic security assessment layer that respects physical boundaries by resolving an action corrective formulation. This ensures safe handling of MEMGs during training and testing. Extensive investigations show this physical-informed RL method outperforms classic RL and optimization strategies in constraint-compliant, cost-effective MEMG energy management.

Limitations in the existing methods:

  • Accurate system knowledge is impractical to gain because of privacy concerns and system aging.
  • Extensive optimization for every possible state.
  • Model-free learning finds it difficult to represent physical limitations.
  • Danger of doing unsafe procedures without full system awareness.


PI-SPPO Approach for MEMG Energy Management

The proposed PI-SPPO method addresses the MEMG energy management problem within physical constraints. It incorporates a physical-informed safety layer, a model-free PPO control mechanism, and security evaluation criteria. A safe operation region is roughly represented by the security assessment rule, which is a part of the safety layer. Using an actor-critic architecture, the PPO control policy effectively manages high-dimensional state and action spaces.

Safety Layer and Training Process

The safety layer automatically corrects actions from the PPO policy to ensure safe operations by solving an optimization problem based on the security rule. This meticulous procedure guarantees that the control plan respects all physical constraints while maintaining stability and sampling efficiency. Through continuous online learning, the safety layer—which had previously received comprehensive offline instruction—improves in both accuracy and adaptability throughout the training process. This innovative approach ensures safe and efficient energy management for MEMG systems by effectively balancing exploration and exploitation dynamics.


Structure of PI-SPPO from the study by Wang, Y., Qiu, D., Sun, M., Strbac, G., & Gao, Z. (2023)



  • PI-SPPO method for MEMG energy management was introduced.
  • Tuned hyperparameters and sample efficiency using PPO.
  • Utilizing supervised learning, a security assessment rule was trained.
  • A safety layer that is integrated for secure MEMG operations.
  • Proven efficacy in practical energy scheduling.
  • Lower expenses for energy management.
  • Made sure MEMGs were operating securely.
  • Future work should encompass the cooling and heating industries and develop robust learning for aspects of exogenous states in future work.
Posted on Leave a comment

Proactive Routing Strategy in Smart Power Grids Using GRL


Islam, M. A., Ismail, M., Atat, R., Boyaci, O., & Shannigrahi, S. (2023). Software-Defined Network-Based Proactive Routing Strategy in Smart Power Grids Using Graph Neural Network and Reinforcement Learning. E-Prime-Advances in Electrical Engineering, Electronics and Energy, 5, 100187.


Different QoS support is needed for periodic fixed scheduling (FS) and emergency-driven (ED) packets generated by sensors and actuators in smart power grids. Current routing algorithms lack QoS distinction and flexibility. Our suggested SDN proactive routing approach employs a graph-neural-network (GNN) model to predict traffic circumstances and prioritizes ED packets with distinct queues, guaranteeing precise congestion forecasts. Additionally, a reinforcement learning (RL) based method dynamically chooses the best routes and adjusts queue service rates according to actual and anticipated network states. We tested our framework on IEEE 14-bus and IEEE 39-bus systems. It significantly outperforms current benchmarks, improving network efficiency and QoS support. This novel strategy transforms the performance and reliability of smart grids.

Limitations in the existing methods:

  • The traditional centralized power grids have limited adaptability.
  • Analog control and manual data collection are unreliable.
  • The shift to smart grids is prompted by the integration of renewable energy sources and growing demand.
  • Smart grids incorporate sophisticated automation and monitoring.
  • Data packets are classified into two categories: event-driven (ED) and fixed scheduling (FS).
  • Both packet types require high dependability and low latency.
  • Conventional routing methods lack the dynamism needed for smart grids.
  • Modeling ED traffic accurately remains difficult.
  • Current AI-based routing ignores potential network changes.
  • The proposed proactive framework combines RL techniques with GNN prediction.


SDN Layers and System Model

The system model includes OpenFlow switches, communication links, and a control center in the cyber layer. It also features power nodes, field devices, and transmission lines in the physical layer. Using an SDN paradigm, it enables packet routing in smart grids. The SDN consists of three layers: intelligence, control, and data forwarding. For adaptive queue service rates and optimal routing, the intelligence layer employs RL agents and a GNN-based prediction model. By recording temporal and spatial information, the model leverages RL agents to identify sub-optimal paths. Consequently, it modifies congestion circumstances to forecast future network conditions effectively.


Illustration of system model under consideration from the study by Islam, M. A., Ismail, M., Atat, R., Boyaci, O., & Shannigrahi, S. (2023)

Routing Strategy and Implementation that is Proactive

A proactive routing method uses emergency occurrences from the DoE’s Electric Emergency and Disturbance Report and a practical dataset from the IEEE 14 and 39 bus test systems. It pre-configures queue service rates based on anticipated congestion events. By transforming sparse queue length data into a non-sparse timer indication signal, the GNN model is effectively trained to forecast future congestion conditions. This strategy seeks to reduce packet loss and latency, ensuring effective routing choices in the smart grid network.


  • For smart power grids, a proactive SDN-based routing framework is suggested to enhance the routing of Emergency Data (ED) and Fault Sensing (FS) packets.
  • ED and FS packets are placed in different queues to minimize interference and maintain the intended Quality of Service (QoS).
  • By training a Graph Neural Network (GNN) model to anticipate future route congestion, forwarding rules can be updated ahead of time to reduce delays.
  • The framework adjusts queue service rates based on anticipated ED traffic quantities.
  • We evaluated IEEE 14-bus and IEEE 39-bus test systems using a dataset of actual emergency situations in American power networks.
  • The proactive routing architecture performed better than the Q-learning (QL) and Bellman-Ford benchmarks.
  • The findings revealed no packet loss for either ED or FS packets, less than 1% of packets with unfulfilled latency thresholds, and an average delay of under 60 milliseconds.
  • Delays result from the need to retrain GNN and Reinforcement Learning (RL) models for topological changes
Posted on Leave a comment

Adaptive Dispatch for Renewable Energy


Bai, Y., Chen, S., Zhang, J., Xu, J., Gao, T., Wang, X., & Gao, D. W. (2023). An adaptive active power rolling dispatch strategy for high proportion of renewable energy based on distributed deep reinforcement learning. Applied Energy, 330, 120294.


In this paper, we address the uncertainties associated with high-proportion renewable energy by presenting an adaptive active power rolling dispatch approach based on distributed deep reinforcement learning. In order to improve the generalization capabilities of numerous agents in active power flow regulation, we plan to integrate graph attention layers and recurrent neural network layers into each agent’s network structure. Additionally, we suggest a regional graph attention network technique that would assist agents in efficiently combining local data from their communities, enhancing their information gathering skills. Agents can effectively adapt to dynamic surroundings by employing a structure of “centralized training, distributed execution”. Case studies show good generalization across temporal granularity and network topology, allowing multi-agents to adopt efficient active power control techniques.This strategy, in our opinion, will improve the applicability and flexibility of distributed AI techniques for power system control problems.

Limitation in the existing methods:

  • The uncertainty of dispersed renewable energy poses a challenge to current approaches.
  • Flexible resource coordination is unsuccessful with centralized dispatch techniques.
  • Accurate mathematical models are necessary for traditional optimization.
  • Complete topological information is required for traditional APRD approaches.
  • Changes in grid topology necessitate rebuilding mathematical models.
  • High degree of unpredictability in power systems (N-1 failure, load variations).
  • In complicated power systems, DRL encounters dimensionality problems.
  • Although it tackles distributed dispatch, DDRL has computational issues.
  • RL-based techniques have trouble effectively extracting state characteristics.
  • High-order neighbor information is necessary for GAT models to function properly.
  • In big grids, traditional methods are unable to guarantee the effectiveness of control actions.


Improved GAT for Power Systems

The suggested approach addresses the problem of sparse topological connections in power systems by extending the aggregation range of the Graph Attention Network (GAT) to include K-order neighbors. We achieve this improvement by adding a spatial discount factor, which modifies neighbor nodes’ contributions according to distance. Additionally, we use a multi-head attention method to improve data capture and stability.

Structure and Training of Neural Networks

Using a fully connected neural network, agents estimate Q-values. They then choose behaviors that optimize these values and store their experiences in a replay buffer for future training. To avoid overestimating activities that are not optimal, the methodology uses a target network in addition to the primary Q-network. Stochastic gradient descent is used in training to minimize a loss function while updating parameters iteratively.

R-GAT layer schematic diagram from the study by Bai, Y., Chen, S., Zhang, J., Xu, J., Gao, T., Wang, X., & Gao, D. W. (2023)

Distributed Execution and Multi-Agent Scenarios

Performance in decentralized systems is enhanced by merging local value functions in multi-agent scenarios by factorizing the value function using QMIX. While distributed execution depends on incomplete observations to improve computing efficiency, centralized training makes use of global state information. By combining these processes, the Distributed Active Power Rolling Dispatching Algorithm (DAPRDA) enables centralized training and distributed execution, facilitating effective decision-making in power grid management.

Structure of R-GAT – QMIX algorithm from the study by Bai, Y., Chen, S., Zhang, J., Xu, J., Gao, T., Wang, X., & Gao, D. W. (2023)


  • A multi-agent RL technique for adaptive power dispatch that is adaptive is proposed.
  • Exhibits economical regulatory approaches.
  • Good network topology and time granularity generalization.
  • RNN improves generalization at various temporal granularities.
  • The GAT network becomes more resilient to changes in topology.
  • Captures network feature data efficiently and adaptably.
  • The rising number of discrete actions causes Q-MIX to struggle.
  • Continued research will examine continuous control strategies.
Posted on Leave a comment

Optimal energy management strategies for energy internet


Hua, H., Qin, Y., Hao, C., & Cao, J. (2019). Optimal energy management strategies for energy Internet via deep reinforcement learning approach. Applied energy, 239, 598-609.


This study addresses a critical energy management issue within the Energy Internet (EI) by exploring interdisciplinary methods. Despite the EI’s established framework, numerous basic and technological challenges persist. Focusing on a novel energy control problem grounded in economic intuition and operational concepts, the study tackles restricted optimum control without relying on explicit mathematical models of renewable power generation and loads. Standard techniques fall short due to the problem’s complexity. Instead, a model-free deep reinforcement learning approach is employed to achieve the desired control scheme, offering an innovative and practical solution. Numerical simulations demonstrate the efficiency and viability of this approach, highlighting a significant advancement in optimizing energy management within the dynamic context of the Ene

Limitations in the existing methods:

  • Power outages and voltage violations in distribution networks are made worse by the integration of distributed energy resources.
  • Conventional Volt-Var control methods have trouble with unpredictable behavior and quick reaction times, which lowers system reliability.
  • Slow processing and missing network data provide serious problems for model-based optimization.
  • Scalability issues and communication loads are encountered by single-agent reinforcement learning systems.
  • Decentralized reinforcement learning techniques might not guarantee universal data access or scale well.
  • In decentralized control systems, network partitioning can reduce the effectiveness and dependability of optimization.
  • In wider search spaces, heuristic methods encounter low exploration efficiency.
  • In order to obtain exact results, grid-based dynamic programming techniques necessitate large computation.
  • Specific system modeling is required by conventional methods, which increases complexity.
  • A lot of heuristic algorithms have trouble directly exploring high-dimensional spaces.


System description:

The Energy Internet (EI) network comprises interconnected sub-grids, each managed by an Energy Router (ER). ERs standardize operations and exchange information with cloud data centers and other ERs, ensuring unified control of devices like Distributed Generators (DGs) and Battery Energy Storages (BESs). Sub-grids include components such as PVs, WTGs, MTs, FCs, DEGs, BESs, and loads. Power outputs of PVs, WTGs, and loads are uncontrollable and modeled using historical data. DGs and BESs provide energy balance, with their control signals adjusting power output. All control inputs are bounded to prevent over-control. Power flows in the EI network are calculated using the panda power package.

A typical sub grid image from the study by Hua, H., Qin, Y., Hao, C., & Cao, J. (2019)


An Energy Internet (EI) system made up of linked sub-grids is optimized in this paper. It minimizes expenses and complies with regulations while balancing power generation and consumption. The approach entails controlling power flows, keeping battery energy storage systems (BESs) within appropriate charge/discharge limits, and covering the operational expenses of distributed generators (DGs), such as microturbines, diesel generators, and fuel cells. With weighted priority, electricity transmission, DG operation, and BES management are all included in the total cost function. By adjusting the power outputs of energy resources to reduce costs, the Asynchronous Advantage Actor-Critic (A3C) reinforcement learning method resolves the optimal control problem. The method’s capacity to achieve balanced power flows, cost efficiency, and system reliability is validated by simulations.

Flow of Simulation process from the study by Hua, H., Qin, Y., Hao, C., & Cao, J. (2019)


  • Studied into energy management for an EI system that is broader.
  • Deep reinforcement learning applied for best control.
  • Exhibited better performance than the ideal power flow approach.
  • Strictly limited the exchange of power with the external grid.
  • Improved sub-grid local power sharing.
  • Reasoned use of BES under suggested management.
  • For improved control, future work will incorporate power flows and spatial linkages.
Posted on Leave a comment

Transient stability preventive control based on GCN and DRL


Wang, T., & Tang, Y. (2023). Transient stability preventive control based on graph convolution neural network and transfer deep reinforcement learning. CSEE Journal of Power and Energy Systems. 


Maintaining transient stability in the dynamic field of power systems continues to be a major concern. Conventional optimization techniques frequently encounter difficulties with convergence, and the training of artificial intelligence solutions may be sluggish. Introducing a novel method that revolutionizes transient-stability preventive control (TSPC) by using Transfer Deep Reinforcement Learning (DRL) and Graph Convolutional Neural Networks (GCNN). This novel approach commences by assessing the present power-flow situation using a transient stability assessor (TSA) that is based on GCNNs. After that, it pinpoints the major factors influencing stability and incorporates this knowledge into an improved Markov decision-making procedure. Through entropy-enhanced TD3 algorithms and knowledge transfer, the method dramatically increases learning efficiency. This approach, which has been validated through simulations on actual power grids using the IEEE 39-bus system, shows improved control effects 

Issues in the existing method: 

  1. Complexity of Modern Power Systems: Growing complexity as a result of power electronics and integration of renewable energy. 
  2. Non-convergence in Conventional Methods: Conventional optimization-based techniques include local optima and non-convergence issues.
  3. High Computational Complexity: Scalability issues prevent large-scale electricity grids from expanding. 
  4. Limited Efficiency of Existing Approaches: The effectiveness of earlier techniques in meeting the demands of real-time control is lacking. 
  5. Long Training Times in AI-based Solutions: Long training times are a problem for deep reinforcement learning (DRL). 
  6. Underutilization of Transfer Learning: The reach and application of current transfer learning techniques in power systems are constrained. 


GCNN-Based Transient Stability Assessor (TSA): 

The main objective of this study section is to develop a TSA using Graph Convolutional Neural Networks (GCNN). The TSA aims to gain knowledge about the complex connection between transient stability and power-flow conditions in power systems. By precisely predicting the Transient Stability Index (TSI) while considering various input parameters such as load level, fault location, and generator output power, the TSA enhances understanding.  GCNN provides deep knowledge about the durability of the system after a fault and promotes efficient system information learning ,when applied as a graph structure in the power system.

 TSPC Based on Transfer DRL: 

The primary goal of this study section is to construct the Markov Decision Process (MDP) for Transient Stability Preventive Control (TSPC) based on Transfer Deep Reinforcement Learning (DRL). The MDP has state, action, policy, and incentive components to support TSPC decision-making. By connecting DRL-determined actions to variations in generator output power, stability criteria are met. The reward system is carefully designed to promote actions that increase stability, with assessments conducted using N-1 contingency scenarios. By combining GCNN-based TSA with Transfer DRL, the study delivers efficient and effective TSPC, resolving issues with computational complexity and training time while guaranteeing stability and security in power systems.


schematic diagram of TSPC based on GCNN and transfer based on the study by Wang, T., & Tang, Y. (2023) 


  1. Combined TSA and Transfer DRL: The method integrates TSA with GCNN and transfer DRL for improved control.
  2. Effective Power Flow Control: Demonstrated effective control of power flow in both standard and actual power grid scenarios.
  3. Enhanced Learning Efficiency: Transfer learning accelerates learning and improves effectiveness.
  4. Identification of Influential Generators: Method identifies generators with significant influence for targeted action.
  5. Future Research Directions: Future studies will consider simultaneous rotor angle and transient voltage instability and extend control to multiple regions.
Posted on Leave a comment

PV inverter based on VAR control using GRL


Yan, R., Xing, Q., & Xu, Y. (2023). Multi agent safe graph reinforcement learning for PV inverter s based real-time de centralized volt/VAR control in zoned distribution networks. IEEE Transactions on Smart Grid. 


This research presents a multi-agent safe graph reinforcement learning approach for PV inverter reactive power output optimization, thereby enabling real-time voltage and var control (VVC) in active distribution networks (ADNs). Specifically, each zone in the network adopts a decentralized architecture for coordinating reactive power regulation, managing voltage profiles, and reducing energy loss. To describe the VVC issue, a multi-agent, decentralized, partially observable, restricted Markov decision process is employed. Furthermore, graph convolutional networks (GCNs) are utilized by central control agents in every zone to enhance decision-making through the extraction of features from the ADN topology, noise filtering, and imputing of missing data. By optimizing primal-dual policies, the approach ensures that voltage safety requirements are fulfilled. Consequently, this method successfully reduces network energy loss and voltage variations, as evidenced by simulations on a 141-bus distribution system.

Limitations in the existing methods: 

  • High ratio of resistance to inductance has an impact on voltage stability.  
  • Voltage variations are caused by intermittent photovoltaic production. 
  • Slow mechanical devices are used in traditional VVC.  
  • For PV fluctuations, OLTCs and CBs are not quick enough.  
  • Effective real-time VVC techniques are required.  
  • For PV inverters, better control models are needed.  
  • The VVC techniques now in use are rigid.  
  • IEEE 1547.8, which is vague, promotes the usage of inverters.  


The MAPDGRL Approach  

The decentralized partially observable Markov decision process (Dec-POCMDP) problem for PV inverter-based voltage/var control (VVC) in active distribution networks (ADNs) is addressed by the Multi-Agent Primal-Dual Graph Reinforcement Learning (MAPDGRL) technique. This method centrally teaches agents optimal policies, which are then implemented in a decentralized manner based on local observations. Using a layer propagation mechanism to normalize the graph structure and aggregate data, Graph Convolutional Networks (GCNs) extract the power network’s graph-structured properties.

The actor network employs a fully connected neural network following a multi-layer GCN to map partial observations to control actions. Additionally, deep neural networks (DNNs) in reward and cost critique networks estimate predicted rewards and constraint costs. Dual variables are established and updated using sampled dual gradients to ensure constraints are met.

Robustness and Training  

Initializing parameters, centralized training using replay buffers, and recurring updates to dual variables and network parameters are all integral parts of the training process. Specifically, the message-passing mechanism of GCNs, which enables the filling in of missing data using information from surrounding nodes, and the usage of Graph Fourier Transform for filtering, further improve robustness against noise and missing data. Consequently, robust against noise and missing data, this technique includes GCNs for feature extraction, dual variables for constraint satisfaction, and a multi-agent DRL framework for optimum VVC in ADNs.


  • The proposed MAPDGRL method significantly improves real-time voltage/var control (VVC) in distribution networks with high PV penetration.
  •  By incorporating GCNs into the policy network of agents, it enhances decision-making and automatically handles imperfect measurement data.
  • Additionally, it uses Lagrangian relaxation to manage voltage constraints effectively, merging topology and feature information for better knowledge representation.
  • Furthermore, it acts as a graph-structured “low-pass filter” to reduce noise and fill in missing information through message passing.
  •  Consequently, simulations on a 141-bus testing network show that MAPDGRL outperforms benchmark algorithms like MADDPG and MAPDDDPG in terms of efficiency, effectiveness, and robustness, even with incomplete or noisy feature information.
Posted on Leave a comment

Decentralized Volt-VAR control using Multi-agent GRL


Hu, D., Li, Z., Ye, Z., Peng, Y., Xi, W., & Cai, T. (2024). Multi-agent graph reinforcement learning for decentralized Volt-VAR control in power distribution systems. International Journal of Electrical Power & Energy Systems155, 109531.


Volt/Var control, or VVC, is crucial to power distribution systems in order to reduce power loss and keep voltages within acceptable ranges. However, the lack of comprehensive network knowledge presents problems for standard model-based approaches. In order to handle VVC under partial observation limitations, our study presents MASAC-HGRN, a unique multi-agent graph-based deep reinforcement learning (DRL) technique. By using a decentralized training and execution paradigm, MASAC-HGRN differs from traditional methods in that it allows agents to receive information locally and make choices on their own. Studies using numerical data on IEEE test feeds show that VVC outperforms both conventional techniques and the most advanced RL algorithms. Comprehensive robustness experiments demonstrate the flexibility and robustness of the decentralized system. 

Limitations in the existing Methods: 

  • Power outages and voltage infractions in distribution networks are made worse by distributed energy resource integration.  
  • Rapid reaction times and unpredictability are problems for traditional Volt-VAR control techniques, which reduces system dependability. 
  • Incomplete network data and sluggish processing speeds are problems for model-based optimization.  
  • Scalability problems and communication loads plague single-agent reinforcement learning techniques.  
  • Decentralized reinforcement learning techniques try to mitigate the constraints of centralized control, but they might not be scalable or provide universal access to data.  
  • Network partitioning for decentralized control may impair optimization efficiency and dependability.  


Methodology overview:  

  • This research presents a new approach to multi-agent voltage regulation in power distribution networks. 
  •  It uses a decentralized training and execution (DTDE) infrastructure with deep reinforcement learning (DRL) algorithms.  
  • The method seeks to solve issues with voltage control’s partial observability and scalability.  


Flowchart of the training process of MASAC-HGR from the study by Zhao, Z. Y., Che, Y., Luo, S., Wu, K., & Leung, V. C. (2023, August) 

Communication and Network Structure: 

  • A hierarchical graph recurrent network (HGRN) structure is used in the approach. 
  • To facilitate communication between heterogeneous agents, this structure combines graph attention networks (GAT) with deep recurrent Q networks (DRQN).  
  • The communication system facilitates efficient information flow between agents, enhancing voltage control coordination.
    Process of Training and Optimization:  
  • Multi-agent actor-critic techniques, such as multi-agent soft actor-critic hierarchical graph recurrent network (MASAC-HGRN), are used in the training process.  
  • The integration of maximum entropy learning guarantees steady and resilient policy optimization.  
  • During training, experience replay mechanisms are used to improve data stability and efficiency.  


  • The DTDE paradigm-based MASAC-HGRN algorithm for power distribution system training is proposed.  
  • By optimizing the smart inverter and SVC set points, each sub-network functions as an agent depending on its observations. 
  • Performs more optimally and robustly than the most advanced RL-based algorithms.  
  • A computing speed test verified the lower training costs.  
  • Ineffective building-to-building coordination caused by current systems results in energy coupling and increased total consumption.


  • VVC: Volt/Var Control
  • MASAC-HGRN: Multi-Agent Soft Actor-Critic Hierarchical Graph Recurrent Network
  • DTDE: Decentralized Training and Execution
  • HGRN: Hierarchical Graph Recurrent Network
  • DRQN: Deep Recurrent Q Networks
  • SVC: Static Var Compensator
Posted on Leave a comment

DRL for Energy optimization in buildings


 Qin, Y., Ke, J., Wang, B., & Filaretov, G. F. (2022). Energy optimization for regional buildings based on distributed reinforcement learning. Sustainable Cities and Society, 78, 103625.


Because of their affordability and scalability, model-free control methods like Reinforcement Learning (RL) are highly respected in the field of energy management. The inability of RL to offer coordinated and efficient control to regional buildings leads to higher energy consumption. This blog looks at Distributed Reinforcement Learning (DRL), a cutting-edge technique for maximizing energy utilization across several buildings while preserving tenant comfort. By exchanging parameters and coordinating optimization, the system effectively reduces energy use. DRL scored better in terms of overall energy utilization than Rule-Based Control (RBC), Soft Actor-Critic (SAC) approach, Model Predictive Control (MPC), and Non-dominated Sorting Genetic Algorithm II (NSGA-II), as demonstrated by a case study of nine university buildings. Additionally, the recommended strategies demonstrated exceptional accuracy and robustness in evaluations of energy consumption across many buildings, error analysis, load factor, power demand, and net power consumption

Limitations in the existing methods: 

  • Inadequate changes to RBC’s basic “If X, then Y” criteria frequently impede full potential development.   
  •  Although MPC works well in simulations, its applicability in real-world scenarios is limited due to its lack of large-scale, fully populated building implementation.
  •  Overfitting and excessive variance are problems for data-driven models like random forest predictive control, which provide inaccurate outcomes. 
  • Due to their complexity, methods like MPC, RBC, and GA are difficult to scale for use in regional or local applications. 
  • Difficulties in gathering data, extended sample periods, and environmental disruptions lead to incomplete datasets, which compromise the precision and efficacy of models. 
  • Ineffective building-to-building coordination caused by current systems results in energy coupling and increased total consumption.  


Scheme for Control Optimization Employing DRL:  

The study offers a control optimization strategy based on Distributed Reinforcement Learning (DRL) to reduce energy consumption in regional buildings. Based on the CityLearn framework, the proposed method enhances the MARLISA algorithm with an improved Least Square Boosting (LSBoost) algorithm for energy prediction and an incentive system to promote staggered power usage. Many reinforcement learning agents use a sequential action selection strategy that iteratively selects actions to control each building’s energy usage.

Improvements and Iterative Education: 

While the Soft Actor-Critic (SAC) method is used for its scalability and coordination skills, the LSBoost technique is improved to increase prediction accuracy. Agents engage in environmental interaction, optimizing energy use through performance-based rewards or penalties. Agents can forecast and share energy usage to optimize regional energy consumption and preserve human comfort by coordinating energy consumption across buildings using an iterative learning process. By lowering energy use and carbon emissions, this strategy hopes to promote sustainable urban growth. 


Schematic representation of the building optimal control closed-loop system using the MPC/GA controller from the study by Qin, Y., Ke, J., Wang, B., & Filaretov, G. F. (2022) 


  • Developed a regional building energy optimization using distributed multi-agent reinforcement learning. 
  • Improved LSBoost algorithm for more accurate energy consumption predictions. 
  • Layer normalization of critic networks speeds up training. 
  • Huber loss avoids exploding gradients and oversensitivity to outliers. 
  • Numerical simulations show DRL reduces energy consumption by 6.72% over RBC and 3.67% over SAC. 
  • DRL method is scalable and provides good energy optimization. 
  • Decentralized distributed network with no central control enhances system structure. 
  • System coordinates energy consumption among agents for scalability.
Posted on Leave a comment

GRL for inverter-based active voltage control


Mu, C., Liu, Z., Yan, J., Jia, H., & Zhang, X. (2023). Graph multi-agent reinforcement learning for inverter-based active voltage control. IEEE Transactions on Smart Grid.


To address voltage fluctuations caused by distributed generations like PVs in ADNs, this paper formulates the problem as minimizing voltage deviation and network loss by adjusting PV inverter reactive power. The AVC problem aims to stabilize voltage at specified levels through reactive power injection while minimizing network loss. The Dec-POMDP framework is adopted for multi-agent cooperation, treating the problem as a tuple including state space, joint action and observation spaces, transition probabilities, shared reward function, and a discount factor. Decentralized execution is coupled with centralized training within the multi-agent actor-critic framework, enabling efficient exploration and convergence to optimal policies.

Issues in the previous paper:

  • Renewable Energy Challenges: Although the usage of renewable energy is increasing, distributed generation such as photovoltaics (PVs) presents issues with voltage stability and network loss.
  • Innovative Voltage Control: To effectively regulate voltage, the MAGRL algorithm combines the voltage barrier function and GCN.
  • Advanced Decision-Making: By handling voltage variations from quick changes in photovoltaic systems, MAGRL outperforms conventional techniques.
  • Enhanced Safety Measures: By reducing voltage variations, the exponential voltage barrier function guarantees safe power system operation.
  • Sturdy Performance: MAGRL’s effectiveness over a range of graph topologies is shown by comparative experiments conducted on IEEE 33-bus and 141-bus systems.

Problem statement:

To address voltage fluctuations caused by distributed generations like PVs in ADNs, so for this paper formulates the problem as minimizing voltage deviation and network loss by adjusting PV inverter reactive power. Specifically, the AVC problem aims to stabilize voltage at specified levels through reactive power injection while simultaneously minimizing network loss. Consequently, the Dec-POMDP framework is adopted for multi-agent cooperation, treating the problem as a tuple including state space, joint action and observation spaces, transition probabilities, shared reward function, and a discount factor. Within the multi-agent actor-critic framework, decentralized execution is coupled with centralized training, thereby enabling efficient exploration of the environment and convergence to optimal policies.


 Formulation of the AVC Issue and Safety Mechanisms:

The AVC issue is formulated as a Markov game in the suggested technique, and the solution is introduced as an improved MARL algorithm called graph multi-agent reinforcement learning (MAGRL). First, each PV inverter is treated as an agent in the AVC issue, which is formulated as a Dec-POMDP. Shared observations are used to guarantee regional coordination. State, observation, action spaces, and a reward function incorporating voltage deviation and network loss reduction are all included in the formulation. A voltage barrier function, which maximizes voltage within a safe range, is developed to ensure safety.

Integration of GCN and Training Methodology in MAGRL:

To improve feature extraction from the distribution network topology and agent performance, a Graph Convolution Network (GCN) is also included. Through the integration of MADDPG and GCN, the MAGRL algorithm enables both centralized training and decentralized execution. Using experience replay and target networks, actor-critic networks are updated throughout training. Only the actor network develops control policies during testing. Real-time voltage control within practical limits is made possible by the methodology’s smooth learning and reduction of computing load.

The overall structure of the proposed multi agent GRL from the study by Mu, C., Liu, Z., Yan, J., Jia, H., & Zhang, X. (2023).


  • Voltage quality in renewable energy systems is improved by the proposed multi-agent graph reinforcement learning (MAGRL).
  • Distribution network topology is better represented with the introduction of the Graph Convolution Network (GCN).
  • The voltage is stabilized for safe distribution network operation using the exponential voltage barrier function.
  • Simulation using IEEE 33-bus and 141-bus instances shows that MAGRL is better than MARL and conventional techniques.
  • In order to improve the effectiveness of the MARL algorithm, future study may examine attention processes.
Posted on Leave a comment

GRL for user preference- energy sharing systems


Timilsina, A., Khamesi, A. R., Agate, V., & Silvestri, S. (2021). A reinforcement learning approach for user preference-aware energy sharing systems. IEEE Transactions on Green Communications and Networking5(3), 1138-1153


By allowing customers with renewable energy capabilities to sell energy, Energy Sharing grids (ESS) have the potential to completely transform power grids. This work provides an ESS that takes constrained rationality, engagement, and customer preferences into account in a novel way. It is NP-Hard to maximize energy exchange while accounting for user modeling. Two heuristics are given to handle this: one based on Reinforcement Learning and giving bounded regret, while the other, more efficient and with ensured termination and accuracy, is called BPT-K. Comparing the suggested algorithms to state-of-the-art techniques, an experimental investigation utilizing genuine datasets reveals that they produce 25% greater efficiency and 27% more transferred energy. Furthermore, in less than three months, the learning algorithms converge to within 5% of the ideal answer.

Limitations in the previous Energy sharing systems:

  • Simplified Human Behavior Models: It is impractical for many previous research to presume that users are constantly present and engaged.
  • Assumed Parameter Knowledge: A number of studies use the unrealistic assumption that user behavior models’ parameters are known ahead of time.
  • Homogeneous consumer Preferences: The variability in consumer preferences for energy sources is frequently overlooked in previous study.
  • Limited participation Consideration: A lot of models don’t take user participation with energy management systems into consideration, which varies widely.
  • Potential System Failure in Real-world Application: When used in real-world circumstances, models with overly simplistic assumptions run the risk of causing system breakdowns.


In order to optimize overall performance and acquire user preferences in energy exchange systems, this research presents a sophisticated reinforcement learning technique. Accurately representing user preferences through probabilities is essential for optimizing energy exchanges. However, obtaining these preferences directly from users often leads to inaccuracies. To overcome this challenge, the study rephrases the issue as a combinatorial multi-armed bandit problem. This approach enables the system to dynamically learn preferences by observing the outcomes of user recommendations. The primary goal of optimization is to maximize the overall energy that is traded, thereby enhancing system efficiency and user satisfaction.


User Preference Learning (UPL) algorithm’s operation:

setup and optimization. The method chooses random actions to monitor each variable at least once during startup. It balances exploration and exploitation using an Upper Confidence Bound (UCB) method throughout the optimization phase, with the goal of maximizing energy exchange and minimizing regret—the discrepancy between the ideal and expected rewards.

In order to improve performance, the BiParTite-K Algorithm (BPT-K) tackles the generalized matching issue with a restriction to prevent inundating consumers with suggestions, and the Faster Initialization Algorithm (FIA) reduces the days required to observe all variables. By discretizing energy needs and capabilities into units and utilizing Maximum Weighted Bipartite Matching (MWBM) to maximize the exchange, BPT-K iteratively improves matches.

Overall, this strategy guarantees limited regret, which means that even with initially unknown preferences, the system will eventually find optimum matches and maximize energy exchanges.

Energy sharing system overview from the study by Xu, P., Pei, Y., Zheng, X., & Zhang, J. (2020, October)



  • Using Virtual Power Plants (VPPs) to create an Energy Sharing System (ESS) for the exchange of locally generated energy, this study formulates the issue as a Mixed-Integer Linear Programming (MILP) and shows that it is NP-Hard.
  • This study takes a fresh approach by integrating a realistic user behavioral model that considers consumer preferences, engagement, and constrained rationality, in contrast to other research.
  • In practical evaluations using actual energy data, the proposed User Preference Learning (UPL) algorithm and the BPT-K heuristic, both based on Reinforcement Learning (RL), demonstrate near-optimal performance and outperform existing approaches.
  • Refinements to account for changing behavior and the influence of past actions on future decisions may be necessary to the assumptions about user behavior being independent and stable.
  • Future studies should examine how pricing affects user preferences, if semi-automated systems can sustain long-term user involvement, and whether more sophisticated models that take into account the irrationality of human decision-making are feasible.
Posted on Leave a comment

GRL for line flow control


Xu, P., Pei, Y., Zheng, X., & Zhang, J. (2020, October). A simulation-constraint graph reinforcement learning method for line flow control. In 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2) (pp. 319-324). IEEE.


System uncertainty resulting from the unknown strategies of other producers presents challenges to the search for the best bidding strategies in power markets. Despite its popularity, distributed optimization cannot withstand this unpredictability. While deep reinforcement learning (DRL) has potential, it does not integrate spatially. A semi-distributed learning strategy combining a graph convolutional neural network (GCN) and DRL is presented in this research. Generation units adjust bids to handle uncertainty by using input from the surroundings. Units can understand system structure and improve tactics thanks to GCN’s inputs of state and node connections. When compared to standard DRL, evaluation on IEEE 30-bus and 39-bus systems highlights better generalization and profit potential.

Issues in previous methods:

  • Previous RL algorithms did not take network topology into account.
  • Difficulties with distributed decision-making in intricate network systems, such as dynamic uncertainty and non-cooperative behavior.
  • Past studies that concentrated on conventional centralized decision-making, which raised issues with privacy and increased computing load.
  • The challenge of locating Nash Equilibrium in situations including dispersed decision-making and dynamic uncertainty.
  •  Because strategy and state spaces are continuous, standard RL algorithms provide computational difficulties.
  •  Prior research on power market bidding did not specifically address system concerns
  •  Previous RL approaches’ limited capacity to generalize across different system topologies.
  •  Previous research missing full definition of bidding methods as a two-level optimization issue.


The proposed methodology tackles incomplete information challenges in a bi-level optimization problem, where generation units aim to maximize revenue without knowledge of competitors’ bids or market prices. A novel Deep Reinforcement Learning (DRL) algorithm enables units to learn competitors’ strategies through feedback. Units submit bids to the ISO, which then clears the market and determines prices. Units adjust bids based on feedback until convergence to maximum profit. DRL employs Actor-Critic networks, with state space comprising historical data like price and demand. Action involves bidding strategies, and rewards are based on revenue minus generation costs. The algorithm integrates Graph Convolutional Neural Networks (GCN) to incorporate system topology. GCN-based Critic networks use demand graph and market prices to update weights, enhancing learning efficiency and system awareness.

The architecture of the proposed method from the study by Xu, P., Pei, Y., Zheng, X., & Zhang, J.(2020, October)


  • Using GCN and DRL together enhances bidding tactics.
  • Addresses uncertainty in the system and missing information.
  • Improves decision-making by enabling the learning of system topology.
  • DRL with GCN performs better than traditional DRL because it is more flexible to changing topologies.
  • Future studies should incorporate uncertainties related to wind turbines and assess on broader networks.
Posted on Leave a comment

GRL for service restoration


Fan, B., Liu, X., Xiao, G., Kang, Y., Wang, D., & Wang, P. (2023). Attention-Based Multi-Agent Graph Reinforcement Learning for Service Restoration. IEEE Transactions on Artificial Intelligence


Distributed energy resources are revolutionizing power restoration because they allow recovery without transmission support. Although conventional techniques need accurate models, deep reinforcement learning (DRL) offers a viable substitute. In this paper, a multi-agent graph reinforcement learning strategy, utilizing attention mechanisms, is proposed for service restoration, which is modeled as a partially observable Markov decision process. Specifically, agents improve strategy formulation by extracting features using graph convolutional networks. Moreover, inter-agent collaboration is improved by centralized, attentive training. This innovative method, which combines energy storage, photovoltaics, and dispatchable generators, has been validated on the IEEE-118 system, thereby demonstrating how DRL may significantly improve power distribution resilience.

Issues in existing methodologies:

  • Inadequate Handling of High-Dimensional Continuous Domains: The use of traditional Q-learning in large-scale power systems is limited due to its inability to handle high-dimensional continuous domains.
  • Simplistic Application of DRL Algorithms: Numerous research employ simple DRL algorithms that don’t improve scalability or performance.
  • Single-Agent Policy Learning: Current methods rely on a single agent, which is unsuitable for active distribution networks of a large size because of the high expenses and duration of centralized data collecting.
  • Ignoring Network Topology: Prior research ignores the topology of the distribution network, omitting important geographical links and data.
  • Exaggeration Problems with DQN: In complicated contexts, DQN paired with graph learning overestimates and is unable to extract features.
  • Multi-Agent Systems (MAS): Difficulties Because agents update separately, standard DRL algorithms like DDPG encounter instability, which makes it difficult to reach Nash equilibrium and convergence.


An approach to reinforcement learning for Service Restoration in active distribution networks (ADNs) using multi-agent graphs is presented in this research. ADNs go into island mode in the event of a power outage, creating microgrids for Service Restoration. DERs maintained by the public are managed by the system operator using real-time ADN states.

Three agents—one that minimizes DG generating costs, another that lowers ES degradation costs, and a third that optimizes load restoration—make up the Partially Observable Markov Decision Process (POMDP) model for the SR process. Agent’s control DGs, ESs, and loads by monitoring local conditions and generating actions. PYPOWER models state transitions while accounting for uncertainty in the load and DER.

Graph convolutional networks (GCNs) are used by agents for centralized training, whereas self-attention-enhanced critic networks are used for feature extraction. So,by utilizing sophisticated graph learning and reinforcement learning techniques, this strategy improves the robustness and efficiency of SR.

The structure of actor-critic based multi-agent graph reinforcement learning method for service restoration from the study conducted by Fan, B (2023).


  • Proposed attention-based multi-agent graph reinforcement learning for service restoration.
  • Defined power network state using graph data, integrating topology and power flow information.
  • Enabled state perception with graph learning using graph convolutional networks.
  • Introduced self-attention into centralized training, enhancing agent collaboration.
  • Experimental results showed agents with graph learning achieved higher rewards.
  • Attention-based training improved efficiency and convergence of multi-agent deep reinforcement learning.
  • Future work will focus on enhancing deep reinforcement learning’s generalizability to various network scenarios.


  • DQN: Deep Q-Network
  • DDPG: Deep Deterministic Policy Gradient
  • MAS: Multi-Agent Systems
  • ADN: Active Distribution Network
  • DER: Distributed Energy Resource
  • DG: Distributed Generator
  • ES: Energy Storage
  • POMDP: Partially Observable Markov Decision Process
  • SR: Service Restoration


Posted on Leave a comment

Resource management in dense cellular network using GRL


Shao, Y., Li, R., Hu, B., Wu, Y., Zhao, Z., & Zhang, H. (2021). Graph attention network-based multi-agent reinforcement learning for slicing resource management in dense cellular network. IEEE Transactions on vehicular Technology70(10), 10792-10803.


This blog delves into challenges of controlling network slicing (NS) in multi-BS dense cellular networks. It proposes an innovative approach where each BS acts as an agent via multi-agent reinforcement learning (MARL). Integrating graph attention network (GAT) into deep reinforcement learning (DRL) enhances agent collaboration. Considering frequent BS handovers and fluctuating service requirements, this approach manages inter-slice resources in real-time. Value-based techniques like deep Q-network (DQN) and hybrid policy-based-value techniques like advantage actor-critic (A2C) are employed to assess GAT’s impact on DRL effectiveness. Extensive simulations confirm the superiority of GAT-based MARL algorithms in addressing complex network management issues.

Limitations in other existing methods in handling voltage in dense 5G networks:

  • Overlooking collaboration among neighboring BSs in dense 5G networks ignores crucial efficiency boosts.
  •  Traditional resource allocation fails to adapt to real-time changes in subscriber mobility.
  • Erratic 5G RAN demands surpass capabilities of traditional resource management techniques.
  •  Scalability and complexity concerns limit certain RL-based solutions in dense 5G networks.
  •  The MARL model optimizes system efficiency and service-specific dependability.
  • Integrating GAT into MARL framework enhances base station cooperation and performance.

problem formulation:

System utility J is made up of the successful service ratio (SSR) and spectral efficiency (SE) added together in a weighted manner. Optimization is formulated as:

maximize Jm

subject to:

Jm = α * SEm(dm, wm) + Σ (βn * SSRmn(dm, wm))


– Jm: Objective function to maximize


The case study methodology involves simulating 19 BSs covering a 160 m × 160 m area with 2000 subscribers. Considering two bandwidth granularities (coarse and fine), it adheres to 3GPP specifications, simulating VoLTE, eMBB, and URLLC services, each with specific requirements. Bandwidth reallocation occurs every second, with round-robin scheduling every 0.5 ms per service slice. Hyper-parameters like c1, c2, and c3 are set for reward definition. GATs enhance BS collaboration, and GAT-DQN and GAT-A2C are contrasted with conventional DRL algorithms. DQN and A2C serve as baselines. These algorithms aim to ensure QoS while optimizing resource allocation for increased system usefulness. Performance assessment considers convergence speed, stability post-convergence, SE, and SSR. Results show GAT-based DRL algorithms outperform conventional ones, enhancing stability and system usefulness.

GAT – DRL network from the study by Smith, J., & Doe, A. (2022)


    • GAT in conjunction with A2C and DQN to manage resources.
    • GAT-DRL algorithms effectively satisfy SLA criteria.
    • Attain ideal policies with increased system effectiveness.
    • Strengthen Base Station (BS) cooperation.
    • Future work : Refine neural network topologies
    • Control intricate movement patterns and interference.


Posted on Leave a comment

PV-Rich Networks: Voltage Management Boosting


Chen, Y., Liu, Y., Zhao, J., Qiu, G., Yin, H., & Li, Z. (2023). Physical-assisted multi-agent graph reinforcement learning enabled fast voltage regulation for PV-rich active distribution network. Applied Energy351, 121743.


Voltage violations stem from the intricate nature of active distribution networks, arising from the proliferation of distributed PV systems. Traditional systems encounter challenges in both efficiency and flexibility. To address this, a novel edge intelligence approach combines a graph attention network with multi-agent deep reinforcement learning. This innovative method effectively captures network dynamics and spatial correlations, optimizing voltage regulation. By incorporating an accurate physical model, it enhances learning speed. Demonstrated on IEEE 33-node and 136-node systems, this technique outperforms conventional approaches in both convergence and control effectiveness.

Limitations in other approaches for voltage regulation:

  • Traditional equipment like OLTCs and CBs effectively regulate voltage long-term but struggle in emergencies.
  • Electronic devices like STATCOM and SVCs respond quickly but often only handle reactive power.
  •  ESS offers flexible and rapid voltage regulation, but centralized control faces challenges in real-time adjustment.
  • Conventional MADRL algorithms struggle to efficiently train in dynamic distribution networks.

proposal and problem formulation:

To tackle this author proposes an edge intelligence method that uses a physical model that is correct for reference voltage regulation experiences, integrates graph attention network (GAT) into MADRL, and makes use of cloud-edge collaborative architecture and proposes a problem formulation for optimization formula as:

minPessj,t  ,Qsvcj,t   ∑tTjN,ij∈\Ꜫ lij,t rij

Here, Pessj,t   and Qsvcj,t   are the decision variables representing the active power supplied by ESS and the reactive power supplied by SVC, respectively. The summation is over the time slots T and network branches \Ꜫ.


For effective voltage control, the suggested system for voltage regulation integrates cloud and edge computing. The network is divided into sub-networks at the edge control level, each of which is represented as an agent and is responsible for controlling ESS and SVCs to provide dynamic voltage. By transmitting their judgments locally, these agents’ lower communication overhead by periodically transferring their experiences to the cloud for training. Centralized control takes place at the cloud learning level, where agents are taught with edge experiences that are saved in a replay buffer. Through the use of electrical distance and modularity, the network partitioning approach breaks down the concentrated difficulty into smaller, more manageable issues.

Markov game process:

A Markov game is used to model the multi-agent voltage regulation, in which agents interact with their surroundings and choose their course of action depending on observations. GAT-MASAC is a reinforcement learning technique that combines a MASAC for policy learning with a GAT to capture topological interdependence. Lastly, to improve the effectiveness of agent learning, a physical-assisted mechanism creates reference experiences based on an accurate physical model.

GAT-MASAC framework from the study by Chen, Y. (2023)

Comparision result of different approaches:


Case study result of different approaches compared with proposed GAT-MASAC from the study by chen,Y. (2023)


  • In distribution networks with strong PV penetration, the suggested edge intelligence technology efficiently reduces power losses and mitigates voltage violations.
  • Graph Attention Networks (GAT) and MASAC together increase learning efficiency and agent adaptation to changes in network topology.
  • The method shows scalability for large-scale distribution grids, guaranteeing voltage stability and maximizing efficiency in operations.
  • Future studies should focus on improving ESS performance, investigating data-driven techniques for voltage control, planning the best possible resource allocation, and taking dynamic partitioning tactics into account.
Posted on Leave a comment

GCRL for Advanced Energy-Aware Process Planning


Xiao, Q., Niu, B., Xue, B., & Hu, L. (2022). Graph convolutional reinforcement learning for advanced energy-aware process planning. IEEE Transactions on Systems, Man, and Cybernetics: Systems53(5), 2802-2814.


This paper discusses the challenges in excuting advanced energy-aware process planning (AEPP), emphasizing how advanced machining systems are susceptible to disruptions. These issues are addressed by a novel approach that makes use of graph convolutional reinforcement learning (GCRL). The adaptability of AEPP across machines, processes, and cutting tools is demonstrated via a graph convolutional policy network that has been trained to adapt to a variety of jobs. To represent the dynamic character of process plan development, the problem is reformulated as a Markov decision process (MDP). For process planning, a graph convolutional network (GCN) streamlines the input graph topology, and reinforcement learning (RL) guarantees resilient learning. The versatility of GCRL is improved by a two-phase multitask training methods that takes into account both task-specific rules and intertrack commonalities.

Issues in the existing methodologies:

  • Conventional approaches, like expert systems, lack flexibility and struggle with changing production conditions.
  • Metaheuristics such as simulated annealing and evolutionary algorithms are limited by static programming.
  • Expert availability requirements and case-specific background knowledge hinder conventional methods.
  • These approaches often fail to reduce energy and time consumption in manufacturing operations.
  • Machine learning (ML) techniques encounter challenges in obtaining optimal operation sequence labels.
  • ML techniques, while promising, face difficulties in real-world application.


The paper presents a new approach to AEPP called graph convolutional reinforcement learning (GCRL). By transforming the process plan formulation into a graph-based Markov decision process (MDP), this method improves flexibility and scalability by ,Combining planning and reinforcement learning.


Graph Convolutional Reinforcement Learning (GCRL), an automated energy-efficient process planning (AEPP) technique, is presented in this article. This process uses less energy and takes less time to produce. By taking care of feature sequencing and resource selection, constraints guarantee logical AEPP systems. Graph Convolutional Networks are used by the GCRL framework to integrate graph embedding, multitask training, and reinforcement learning. By representing AEPP as a Markov Decision Process (MDP), where an RL agent chooses nodes and actions based on rewards and transition dynamics, this method improves process plans. The success of energy-efficient process planning is increased through multitask training, which improves generalization across various planning activities.

 GCN-RL framework from the study by Xiao, Q., Niu, B., Xue, B., & Hu, L (2023).

The case study makes use of a Python 3 simulation framework for Internet of Things-based energy monitoring. The important evaluation results are displayed in the graphic below.

Comparision results of case study from the study by Xiao, Q., Niu, B., Xue, B., & Hu, L. (2023).


  • GCRL, a network approach, efficiently generates graphs for the stochastic AEPP issue.
  • GCRL combines GCN, RL, and multitask training for a comprehensive solution.
  • Comparative analysis favors GCRL over metaheuristics in convergence, stability, and solution quality.
  • Prioritizing low-power processes, GCRL resolves goal conflicts for energy-efficient manufacturing.



Posted on Leave a comment

A GRL for rapid charging stations


Xu, P., Zhang, J., Gao, T., Chen, S., Wang, X., Jiang, H., & Gao, W. (2022). Real-time fast charging station recommendation for electric vehicles in coupled power-transportation networks: A graph reinforcement learning method. International Journal of Electrical Power & Energy Systems141, 108030.


Fast charging requirements are a major factor affecting power-transportation networks as electric vehicle usage expands. This research develops a multi-objective system-level recommendation mechanism dynamically assigns cars to appropriate stations in order to address this. This is presented as a deep reinforcement learning sequential decision-making task. Graph attention networks integrate information from power grid buses, traffic nodes, and charging stations to control system states. A DQN(𝜆) training method that double-prioritizes tasks increases efficiency while dealing with lengthy delays. This approach improves the viability and resilience of urban systems by efficiently managing real-time requests, as demonstrated by testing it on a power-transportation simulation platform.

Key issues of other papers highlighted in this paper include:

  • Increased EV charging strains power grids, threatening stability.
  • Traffic congestion rises due to extended charging times and station queues.
  • Studies propose pricing, dynamic charging, and route optimization.
  • More efficient solutions are needed to manage EV proliferation.
  • Current algorithms face challenges in meeting real-time demands.
  • Coordinating EV charging with power grids and transport adds complexity.


These problems led the author of EV strategies that led author to propose Graph Reinforcement Learning (GRL) in which DQN(𝜆) uses a double-prioritized training approach to control action delays, whereas Graph Attention Networks (GATs) use a graph formulation based on physical connections.

Author proposes the  multi-objective fast charging station recommendation problem as the  Dijkstra algorithm and the objective function serve as the foundation for the construction of the objective-oriented high-level characteristics and the goal-originated incentives, which propel the agent’s self-evolving


In time-varying stochastic coupled systems, the research proposes employing a deep graph reinforcement learning (GRL) methodology for rapid charging station selection. This aims to enhance awareness of multi-dimensional states, where charging stations act as mediators between traffic networks and power grids. This is facilitated by a Graph Neural Network (GNN) based on the adjacency matrix and node properties of Graph Attention Networks (GAT). A groundbreaking off-policy technique integrates experience replay with 𝜆-return. Prioritized Replay utilizes large-scale replay memory to store and refresh precomputed 𝜆-returns, while 𝜆-return effectively handles credit assignment for delayed action execution. In contrast, the Attention-Prioritized Cache enhances training effectiveness by preserving the percentage of real suggestion transitions. This helps avoid making poor judgments during both deployment and training phases.

Framework of GRL Adapted from “Real-time fast charging station recommendation for electric vehicles in coupled power-transportation networks” by Xu et al. (2022).


  • The paper proposes a GRL strategy for fast charging station suggestion.
  • GAN combined with graph construction handles irregular environment characteristics.
  • DQN (𝜆) training mechanism enhances recommendation effectiveness.
  • The approach reduces user time costs, maintains traffic conditions, and prevents voltage variations.
  • It balances service among fast charging stations.
  • Outperforms opportunistic methods in dynamic recommendation systems.
  • Further refinement is needed for practical applicability.
  • Adjustments required for handling concurrent requests and large-scale systems.
Posted on Leave a comment

GRL approach for smart building energy and comfort optimization

Reference :

Haidar, N., Tamani, N., Ghamri-Doudane, Y., & Boujou, A. (2023). Selective reinforcement graph mining approach for smart building energy and occupant comfort optimization. Building and Environment228, 109806.


Optimizing building energy usage is crucial because it significantly impacts the environment. Unfortunately, current techniques’ inability to accurately forecast occupant behavior often leads to ineffective HVAC management, causing inhabitants to feel uncomfortable or waste energy. The author proposes using information technology to gather data on energy use and tenant behavior in buildings by installing sensors. The authors present an optimization technique based on graph mining that combines selective reinforcement learning with a model for predicting occupant behavior. Real-time occupancy sensors help identify prediction flaws, enabling remedial action to be taken.

Limitation in the previous HVAC management techniques:

  • Traditional methods suggested by some papers limited due to their accurate fixed operation schedule.
  • Despite the fact that people have a major impact on HVAC energy consumption, there is a dearth of research on the detection and analysis of occupant behavior in building systems.
  • Conventional methods either aim at predicting occupant movements to selectively heat/cool rooms or leave HVAC systems running in every room, wasting energy while the rooms are empty.
  •  On the other hand, inaccurate forecasts may make residents uncomfortable.
  • Existing models might not be able to handle these mistakes well enough.
  • Although Reinforcement Learning (RL) has demonstrated potential in a number of domains, there hasn’t been much incorporation of RL into building systems for energy optimization.
  •  It’s possible that RL wasn’t completely utilized in earlier research to increase prediction accuracy and energy efficiency.

To tackle these issues the author proposes:

The author presents a graph mining-based optimization technique that combines a selective reinforcement learning method with an occupant behaviour prediction model to analyse user behaviour, identify prediction mistakes, and fix the model.


Three primary algorithms make up the methodology of this paper: Selective Prediction Reinforcement (SPR), Real-time Room Occupancy based on Prediction Reinforcement (OPR), and Occupant Movement Prediction (OMP).

Simplified diagram of the proposed method from the study by Haidar, N(2023)

OMP evaluates room occupancy based on past movement data and forecasts future movements. It uses a graph-based technique to describe motions and trains a model using the Graph Learning algorithm. OPR identifies and fixes forecast flaws by contrasting anticipated and actual occupancy from sensors. SPR optimizes the prediction model, giving more weight to certain predictions based on occupancy length. It maximizes energy usage by identifying rooms with brief occupancy and refraining from turning on HVAC systems.


  • By combining OMP, OPR, and SPR, we can reduce HVAC energy consumption by up to 80.1% while maintaining occupant satisfaction levels of up to 97%.
  • Future scope: To evaluate and improve the presented technique, we need further trials using larger datasets. It is important to take into account various building kinds while fine-tuning and optimizing the prediction model and system parameters.



Posted on Leave a comment

Power Optimization using DRL for the Energy Internet


Xu, S., & Guo, S. (2024). Distributed Reactive Power Optimization for Energy Internet via Multiagent Deep Reinforcement Learning With Graph Attention Networks. IEEE Transactions on Industrial Informatics


Sagging voltages pose a danger to the stability of the Energy Internet (EI) with varying demands and dispersed power supply. Reactive power adjustment must be done quickly and precisely in these situations in order to preserve stability. To tackle this, the author proposes a reactive power optimization framework using multiagent deep reinforcement learning (DRL) for emotional intelligence (EI) during voltage sags. By employing real-time EI state awareness, this framework seeks to synchronize many reactive power compensation devices in order to guarantee voltage stability and introduces a Introduces a multi agent DRL to connect multiple reactive power compensation devices the proposed system validated through IEEE-9 bus system and an industrial zone.

Issues in existing method for managing voltage stability in EI system:

  • Reactive power methods lack swift response to fault circumstances
  • Traditional techniques may not efficiently handle abrupt voltage sags
  •  Limits exist due to focus on local bus system
  • Data abundance not fully utilized for optimal control
  •  Heuristic algorithms may not capture system dynamics accurately
  •  Simplified equations hinder optimal control utilization
  •  Methods too slow for online optimization support
  •  Challenges in simultaneous optimization of multiple devices
  •  Wide action space complicates arriving at improved solutions

For those above issues the author proposes a novel a novel framework for optimizing reactive power coordination based on graph attention networks (GAT) and multiagent deep reinforcement learning (DRL).

The expected reward for each action is calculated as:

 R_k(s) = Σ(i=k to n) γ^(i-k) * r_i

  • Where:
    `R_k(s)` is the expected reward of the kth action.
    `γ` is the reward attenuation coefficient.
    `r_i` is the reward of the ith reactive power compensation action.


Numerical simulations and field data from real systems construct the EI model. Voltage sags, disturbances, and faults simulate EI behavior. SVG controllers at nodes monitor conditions during sags. SCADA data from all nodes and PMU data from nearby nodes are utilized. High-frequency PMU data undergoes processing with GRU for sequential properties and GAT dynamically reveals EI topology. Combining GRU and GAT extracts EI features. A multiagent DRL model, based on A2C, is developed. Each SVG controller serves as an actor network agent. Cloud servers merge SCADA and PMU data to form a critic network. Agents utilize available state data for compensation decisions. EI simulation integrates agents’ compensation tactics. Reward systems evaluate compensation schemes. Reward values back-propagate networks during training. The trained multiagent DRL model adjusts reactive power online. It ensures voltage stability during sags.


Flowchart of proposed framework proposed from the study by  Xu, S(2024)


Results obtained during evaluation by IEEE-9 bus system from the study by Xu, S(2024)


  • The suggested framework efficiently improves voltage stability in the Energy Internet (EI) by combining multiagent DRL with Graph Attention Networks (GAT).
  • When compared to other techniques, it exhibits higher accuracy in reactive power adjustment
  • In order to optimize network setup for EI safety, future research will focus improving control mechanisms for catastrophic failure and natural catastrophes.




Posted on Leave a comment

A GRL for Dynamic Renewable Energy Dispatch

Reference :

Chen, Junbin, et al. “A scalable graph reinforcement learning algorithm based stochastic dynamic dispatch of power system under high penetration of renewable energy.” International Journal of Electrical Power & Energy Systems 152 (2023): 109212.                       


The study discusses how uncertainty affects power systems due to increasing renewable energy integration. This poses challenges for secure and economical power system operation. Dynamic economic dispatch (DED) faces difficulties due to uncertainties. RL techniques offer dispatch rules, but they rely on Euclidean data representations. To address scalability and computing efficiency, the author proposes a novel GRL. It enhances scalability and generalization, yielding higher-quality solutions online.

Issues in existing methods:

  •  Determining the best dispatch strategy in dynamic economic dispatch (DED) situations is challenging due to the vast state and action space.
  •  Existing model predictive control (MPC) methods are limited by their dependence on accurate models and sufficient data, which are difficult to obtain and maintain.
  • DED problems entail a large state and action space, with nonstationary variations in load and renewable energy sources (RES), making prediction challenging.
  • Harvesting latent connections between system topology and state is a challenge in DED.
  • Deep neural network (DNN) approaches face scalability issues when applied to large-scale systems.
  • Current methods utilizing matrix or vector information fail to capture the topological information of the model.
  • Low sample efficiency and scalability issues in existing methods result in high sample data requirements. To tackle these limitations the author proposes a novel GRL tailored for DED in power systems.

problem statement:

The author proposes problem formulation for the DED problem as a multistage stochastic sequential decision-making problem and suggests a GRL technique to solve it. Creating the best possible strategy to reduce the anticipation of cumulative costs over the dispatch horizon is the goal as :

min E(∑[t=1 to T] F(t)) = E(∑[t=1 to T] (FG(t) + FRES(t) + FESS(t)))

`min`: Minimize the expression.
`E`: Denotes the expectation operator.
`∑[t=1 to T] `: Summation over time from 1 to T.
`F(t)`: Decision variables representing power output at stage t.
`FG(t)`: Operation cost function of conventional generation cost at stage t.
`FRES(t)`: Operation cost function of renewable energy curtailment at stage t.
`FESS(t)`: Operation cost function of grid-level energy storage system at stage t.



They frame DED as a dynamic sequential choice issue and formulate it as a MDP to get solved using Rl algorithms the SAC algorithm is chosen for improvement. Second, they provide a graph-based depiction of the state of the system that successfully integrates the implicit correlations present in the system topology. while it capturing the non-Euclidean features of dispatch operation data. Thirdly, they create a GRL algorithm to find the best course of action for mapping the state of the system represented by a graph to DED decisions. To optimize dispatch rules, the SAC algorithm is trained iteratively using historical data. The critic network assesses the quality of these activities, while the actor network learns to choose actions depending on the system state represented in the graph.

Testing :

Test data is used to evaluate the trained model performance and it can generate dispatch choices that optimize costs while satisfying operational requirements and maintaining system stability.

Framework GRL based DED from the study by Chen(2023)

Major Evalution result of case study conducted using IEEE- 39:


  • GRL outperformed conventional reinforcement learning methods in DED problems.
  • GRL has strong scalability, which allows it to adapt well to state space changes by using graph representation of states.
  • GRL is valuable for large systems and extended optimization due to its computationally efficient, nearly optimal solutions compared to algorithms like MPC.
  • Future scope: Future work will benefit from looking at more effective ways to describe input for GRL in order to lessen computation strain, as energy systems get more complex and more controllable resources are integrated.
Posted on Leave a comment

physics assisted GRL for PV voltage regulation


Cao, D., Zhao, J., Hu, J., Pei, Y., Huang, Q., Chen, Z., & Hu, W. (2023). Physics-informed graphical representation-enabled deep reinforcement learning for robust distribution system voltage control. IEEE Transactions on Smart Grid.


This study deals with optimization problems in distribution systems caused by anomalies and flaws in the models and so,It suggests a reliable voltage control method that combines a representation network with a surrogate model DRL. Deep learning approaches extract important characteristics from real-time and pseudo-measurements using graph-based analysis with a tree topology. A soft actor-critic algorithm that was trained on a power flow surrogate model receives these inputs. The technique improves resistance to anomalies and lessens dependency on exact system attributes. Its efficacy is validated on IEEE 33-node and 119-node systems.

Issues in the existing voltage control methods:

  • Because renewable energy is unpredictable, conventional methods have difficulty controlling voltage changes, which might cause grid instability.
  • The inability of stochastic programming (SP) and resilient optimization (RO) techniques to handle load demand and renewable energy output uncertainty limits their usefulness for voltage regulation.
  • Current DRL-based techniques need accurate physical models for training, which presents difficulties in real-world applications because such models could be imprecise or unavailable.
  • The application of many DRL strategies is limited because they presuppose perfect observability of the distribution system, which is sometimes unrealistic in real-world contexts.
  • Existing approaches’ error measures emphasize the need for methods that can function well with partial system observations and lessen need on accurate distribution system models.

proposal of Physical assisted GRL:

Therefore, the author proposes suggests a novel method for controlling the voltage in a resilient distribution system by combining GGAT based surrogate model with DAE and SAC methods.

problem formulation

The author proposes a problem formulation that lays the foundation for tackling strong voltage management in distribution networks using a combination of reinforcement learning and physics-based approaches.

Three different asset categories are used by the system to regulate voltage: energy storage systems (ESS), PV inverters, and static variable compensators (SVCs). For every network node, there are set both active and reactive constrains.

Active and reactive flow constrain formulation is set as:

p_gi_t + p_ei_t – p_ci_t = abs(v_i_t) * sum(abs(v_j_t) * (G_ij * cos(theta_ij) + B_ij * sin(theta_ij)), j=1 to N)

q_gi_t + q_si_t – q_ci_t = abs(v_i_t) * sum(abs(v_j_t) * (G_ij * sin(theta_ij) – B_ij * cos(theta_ij)), j=1 to N)

In this representation, I’ve used more straightforward terms like `abs()` for absolute value, `cos()` and `sin()` for trigonometric functions, and `sum()` to represent summation.

Node voltages are kept within allowable bounds by voltage constraints. To achieve full observability this paper formulates the problem as MDP.

Methodology of physical-assisted GRL:

The methodology for physics assisted multi agent GRL develops a strong voltage control strategy for network distribution by combining surrogate modelling, reinforcement learning methodologies, and neural network topologies in the distribution network, GGAT records structural relationships between nodes. It integrates structural information into the neural network by allocating attention weights to neighboring nodes in accordance with their significance. From the GGAT output the CNN extracts features. DAE reduces the dimensions of the feature extracted from GGAT The Markov Decision Process (MDP), which is used to regulate voltage, is solved by the SAC method using SAC algorithm. A surrogate model uses previous operating data to simulate the power flow calculation process. It contains the voltage prediction module and the GGAT module and The prediction module uses a CNN and fully connected layers to forecast node voltages, while the GGAT module extracts features from pseudo-measurements and real-time data. To training the representation network, the surrogate model’s parameters were optimized under supervision. optimization of these SAC algorithm parameters is based on surrogate model-calculated rewards.

preliminary tests are conducted on IEEE-33 system and the comparative tests are conducted on IEEE-119 some of the major results obtained are as follows:

                          comparison results from the study by Chen, Y(2023)


  • The proposed voltage control technique combines SAC-based control with a physics informed GGAT and DAE-based representation network, providing resilience against anomalous observations.
  • Comparative studies show how effective the approach is against faulty data, preserving control performance throughout a range of anomalous observations.
  • The proposed method reduces dependence on the physical model and achieves performance that is comparable to that determined by precise line parameters.
  • the suggested approach outperforms the conventional stochastic optimization technique in handling rapid voltage fluctuations.
  • Compared to the graph-based policy technique, the suggested method can train a robust voltage control strategy via the supervised learning-aided representation network.


Posted on Leave a comment

Job scheduling in manufacturing systems using GRL.


Liu, Z., Mao, H., Sa, G., Liu, H., & Tan, J. (2024). Dynamic job-shop scheduling using graph reinforcement learning with auxiliary strategy. Journal of Manufacturing Systems73, 1-18.


Unpredictable dynamic events in manufacturing systems forms as a limitation to effective job-shop scheduling (JSP) additionally The efficiency of solutions and dynamic adaptation are difficult to balance in present methods. To tackle this the author of this paper suggests an approach for dynamic job-shop scheduling (DJSP) using GRL. The state representation is expanded upon, a mixed graph Transformer network for adapting changes, and a novel Phase Proximal Policy Optimization algorithm with Rollback is developed in this paper

Limitations in the existing DJSP methods:

Although the goal of robust scheduling in the earlier works was to maintain performance in increasing uncertainty and it often involves resource redundancy and could not consider all perturbations effectively reactive scheduling resulted in increased computational cost.

PDRs method suggested by other papers put low computational effort, but it is not ideal and can be strongly impacted by instance-related factors. Previous attempts which use DRL for DSJP problems had lack of generalization across various instance. Although to tackle this some other papers utilize graph-based DRL methods provide valuable insights, they might not be able to cope with the growing diversity of instances and complexity of environments.


To overcome these issues the author proposes a GRL-AS framework.

Therefore, the author proposes a problem formulation that provides the complexity of DJSP through considering uncertainties in processing time and machine failure with the objective to improve efficiency of manufacturing.

The reward function goal is determined by taking the average machine usage and which is defined as:


The methodology combines graph representation and reinforcement learning, also auxiliary strategies to effectively address the challenges of dynamic job-shop scheduling with stochastic processing times and machine breakdowns.


General complete GRL-AS framework for solving DJSPs illustrated from by J. Chen (2021)

The scheduling problem is represented by disjunctive graphs. Operations are represented by nodes, priority restrictions are represented by edges, and operations that are carried out on the same machine are represented by disjunctions. The objective is to minimize the length of the longest path in the directed acyclic graph. The MGTN be used to transform disjunctive networks into fixed-length feature vectors in order to efficiently capture the topological and feature interactions between nodes in the graph. The graph representation module and reinforcement learning technique are combined in the Graph Reinforcement Learning with Auxiliary Strategy (GRL-AS) framework.

Presently There are two parts to it: online deployment and offline learning. In order to train the RL agent offline, DJSP instances are transformed into separate graphs and transmitted into the graph representation module additionally the Gradient algorithms are used by the agent to constantly learn the best scheduling strategies and an P3OR algorithm is used to improve the stability once the model is trained it can be provided for online applications.


  • Therefore, proposed a GRL-AS framework to manifesting system’s DSJP issues.
  • MGTN is used for transforming separate disjoined network into fixed length.
  • P3OR algorithm utilized for stable training.
  • The proposed model shows a better performance when handling both static as well as dynamic DJSP instances.


Posted on Leave a comment

Energy Harvesting for Dynamic Computation Offloading using GRL


J. Chen and Z. Wu, “Dynamic Computation Offloading with Energy Harvesting Devices: A Graph-Based Deep Reinforcement Learning Approach,” in IEEE Communications Letters, vol. 25, no. 9, pp. 2968-2972, Sept. 2021,

doi: 10.1109/LCOMM.2021.3094842.


The author presents a unique technique called GCN-DDPG for joint partial offloading and resource allocation (JPORA) in mobile edge computing (MEC) systems with energy harvesting (EH). The dynamic nature of the MEC environment, with its changing numbers of mobile devices (MDs), compute jobs, and accumulated amounts of energy, is the difficult to be handled. Because traditional DDPG is not very good at extracting latent representations from Euclidean data, it does not generalize well in such dynamic changing conditions. In order to overcome this restriction, author propose GCN-DDPG and it Improving JPORA choices for MDS including uplink transmission power, local computing capacity, and offloading ratio is the goal of this paper.

The issues author spots in other paper methods for JPORA in MEC for Energy Harvesting:

  • Some paper suggests convex relaxation and heuristic search as optimization-methodologies though they are in successful some situation and they take a long time to spot workable solutions, so this may not be appropriate for the dynamic nature of MEC systems in large scale situations.
  • Other researchers propose that algorithms utilize Deep Q-Networks (DQN), which have trouble with dynamic JPORA issues’ continuous action spaces and determining the right degree of discretization is difficult since changing it too much might result in higher-dimensional complexity or the loss of behavioral information.
  • Only latent representations may be extracted from Euclidean data by DDPG agents using neural networks such as FCN, CNN, and LSTM. But these networks can’t be able to adequately replicate the graph-like features that task offload distribution and allocation of resources display in MEC systems.
  • Neural network exists are not able to recognize graph-based relations among network components, which is required to properly handle JPORA issues.

Therefore, These issues point to the need for a more flexible and successful method of JPORA in MEC with EH devices. So, the author proposes a GCN-DDPG agent.

problem statement:

The objective of this study is to reduce, over time, the average weighted cost of job computation time and energy consumption on MDs Therefore the author proposes a problem formulation as:


lim(T → ∞) (1/T) * Σ(t=1 to T) (1/M) * Σ(m=1 to M) (ω1 * τt^m + ω2 * Et^m)

  •  Represents the time horizon, indicating the duration over which the average tradeoff is considered.
  • 𝑀: Represents the number of devices.
  • 𝜔1 and 𝜔2: Tradeoff weights, determining the relative importance of computation time (𝜏𝑡𝑚) and energy consumption (𝐸𝑡𝑚).
  • 𝜏𝑡𝑚: Computation time of device 𝑚 at time 𝑡.
  • 𝐸𝑡𝑚: Energy consumption of device 𝑚 at time 𝑡.



The fig of proposed model illustrated by GCN-DDPG frame from the study by J. Chen (2021)

Methodology described in this work offers a unique solution to Energy Harvesting devices, mobile edge computing (MEC) systems face the dynamic JPORA problem in GCN-DDPG GCN collects the graphical structure whereas the DDPG is used in decision making. The initialization of the MEC system with EH MDs includes setting up parameters such as the quantity of MDs, calculation jobs, max battery capacity, and obtained energy levels. Task data, uplink channel gains, battery energy levels, and harvested energy levels for each MD are represented as the major state of the MEC system. The GCN-DDPG agent determines continuous actions, such as offloading ratios, local computing capacity, and uplink transmission powers, for each MD on the basis of observed condition. The effectiveness and durability of the proposed technique in resolving the JPORA issue in MEC with EH devices have been validated   thorough performance assessments and analyses of the experimental data.

Results obtained during validation:


                                                 Results obtained in the study by J. Chen (2021)


  • Used GCN-DDPG to address dynamic decision-making in MEC with EH MDs.
  • Furthermore, it Reduced the average weighted cost of energy use and job completion time.
  • GCN makes it easier to study the MEC network topology and make effective decisions.
  • Based on experimental data, additionally GCN-DDPG works better than the most advanced techniques.
  • Future scope: Investigate how to allocate computing resources at edge and servers for additional optimization.


Posted on Leave a comment

GRL for Sequential Distribution System Restoration Learning


T. Zhao and J. Wang, “Learning Sequential Distribution System Restoration via Graph-Reinforcement Learning,” in IEEE Transactions on Power Systems, vol. 37, no. 2, pp. 1601-1611, March 2022, doi: 10.1109/TPWRS.2021.3102870,


DSR As a basic resilient model for system operators, a distribution service restoration algorithm provides an ideally coordinated and performance improvement on restoration. Restoration problem helps in Improved coordination between control switches and generators a model-based control system is often created to solve restoration problem but they depend on the accurate methods causes poor scalability.

To tackle these issues the author of this paper develops a novel GRL architect for the restoration issue in order to address these constraints. We connect the architecture of the power system to a GCA, which does an interplay between controllable devices and captures the process of network restoration in power networks. Graph convolutional layers generate latent characteristics across graphical power networks, which are then used to train the control policy for network restoration through DRL


Issues in the previous methods:

  •   Limitations include dependency on accurate or approximate power system methods, leading to no guaranteed solutions.
  • Computational time increases due to a large number of controllable components.
  •  Solutions often lack scalability.
  • Decentralized methods rely on precise information, limiting their effectiveness.
  •  Some consider DRL-based approaches, but they face challenges with growing distribution system scales, parameters, viable choices, and reconfiguration steps.
  • Using a single agent to gather central data is expensive and time-consuming.


So, the author of this paper proposes a GRL architect which tackle the aforementioned challenges.

The author proposes restoration problem Using a routing model that adheres to the MAS framework. The sequential restoration model is one of the DSR issues covered by the suggested formulation. The reward of this case is described as:

Ri,t(st, at) = P Ll,t × Δt


Without the knowledge of the power system characteristics, a novel GRL framework is developed to train efficient control methods for DSR issues sequentially by utilizing the graphical power system model and RL’s model-free functionality. It combines the power system topology with the GCN architecture. So, DGs extract latent graphical characteristics and abstract the mutual relations in order to understand the intricate repair process.


Figure of learning framework suggested in this paper from the study by T. Zhao(2022)


case studies are demonstrated to confirm the learning-based restoration framework’s efficiency in terms of scalability and optimality. We begin by outlining the system.
and setups for algorithms. Next, its scalability and optimality are illustrated and contrasted with benchmark methods. The changed version of IEEE 123-node and IEEE 8500-node is used to demonstrate the exceptional performance of this technology, comparative studies are carried out, and the IEEE 8500-node test systems are used to confirm its scalability.

image of results from the study by T. Zhao(2021)


  • Proposed a GRL framework to solve DSR problems.
  • GCNs records the impact of graphical user interface and network reconfiguration on controllable devices. Next, utilizing the latent characteristics generated by the DRL method, an efficient control policy for DSR is learned by GCN.
  • Case studies on IEEE 123- and 8500-node test systems show that proposed G-RL framework sign improves the efficiency and scalability of conventional RL algorithms, (DQN and MARL).
Posted on Leave a comment

GCN – Topology embedded DRL for Voltage Stability Control


R. R. Hossain, Q. Huang and R. Huang, “Graph Convolutional Network-Based Topology Embedded Deep Reinforcement Learning for Voltage Stability Control,” in IEEE Transactions on Power Systems, vol. 36, no. 5, pp. 4848-4851, Sept. 2021, doi: 10.1109/TPWRS.2021.3084469.



In order to effectively capture topological fluctuations and spatial correlations, the author of this research suggests a GCN (Graph Conventional Network) based DRL (Deep Reinforcement Learning) approach. This approach differs from existing GCN models, such as FCN-DRL.Therefore, when tested using the IEEE-39 bus system, the suggested model outperforms the FCN (Fully Connected Network)-DRL models in terms of performance and convergence.

Limitations in previous method: 

  •  Modern power grids face disruptions from renewable energy integration and dynamic demands.
  • Model Predictive Control (MPC) approaches have computational limitations.
  • Convolutional techniques in other papers overlook topological variations.
  • The author proposes a GCN-based DRL framework for voltage stability control.
  • The GC-DDQN method adjusts power system topology to enforce load-shedding.
  • It addresses short-term voltage stability issues resulting from FIDVR.
  • The DRL agent embeds grid topology information using GCN.


Framework of GCN-DRL for voltage control from the study by Ramij R. Hossain(2019)


            on the basis of reward (rt) signal gains from the environment and the underlying state representation (termed as observation (st)), an agent network (figure above) generates actions (at). The agent’s goal is to find a policy πθ (st, at) that maximizes reward  .The network policy is divided into two phase they are 1) feature extraction and 2) policy approximation for the 1st one GCN is applied for the 2nd FCN applied respectivly      state, action, and reward of the MDP formulation as: 

  • state at time t: st = {Xt, At}
  • Action at : either 0 or 1
  • Reward rt at time t:

−10000, if Vi(t) < 0.95, t>Tpf + 4

c1∑i ΔVi(t) − c2∑j ΔPj (p.u.) − c3

GCN layer stands as an alone model for extracting features. In order to tackle this, the author used the classical Q-network to the GCN-based Q-Network Qgcn (s, a) by adding the GCN model to the optimization loop.     

   The GC-DDQN Algorithm is applied in IEEE-39 bus system a short circuit fault is introduced at one of five distinct locations—bus 4, 15, 21, 26, and 7 of IEEE bus- 39—at time 0.05 sec for a fault period of 0.08 sec during the GC-DDQN training phase. For the training situation, the effectiveness of GC-DDQN is compared with the traditional FCN-based DDQN (FC-DDQN) method. 



  • For one action step decision-making, the GC-DDQN and FC-DDQN approaches have average computation times of 0.90 and 0.88 milliseconds, respectively.


  • When comparing GC-DDQN to FC-DDQN, the figure below shows that more voltage recovery can be obtained with around 30% less total load shedding.                                                                                      

Fig of Testing results of trained GC-DDQN and FC-DDQN models. (a) Voltage of bus 7. Dashed line denotes the performance requirement for voltage recovery. (b) Total load shedding amount from the study by R. R. Hossain(2021)


  • GCN-DDQN outperforms FC-DDQN, with a less number of failed cases





Posted on Leave a comment



Reference: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby



The problem targeted in the paper is the limited application of the Transformer architecture in computer vision tasks. While the Transformer architecture has been widely adopted for natural language processing, its usage in computer vision has been constrained. Specifically, attention mechanisms are typically used alongside convolutional networks or to replace certain components of these networks while maintaining their overall structure. The paper aims to demonstrate that this reliance on convolutional neural networks (CNNs) is unnecessary and that a pure Transformer architecture applied directly to sequences of image patches can achieve excellent performance on image classification tasks.

Key Contribution:

The key contribution of the paper lies in introducing the Vision Transformer (ViT) model, which applies the Transformer architecture directly to sequences of image patches for image classification tasks. By pre-training the ViT model on large datasets and transferring it to various mid-sized or small image recognition benchmarks (such as ImageNet, CIFAR-100, and VTAB), the authors demonstrate that ViT achieves remarkable results compared to state-of-the-art convolutional networks while requiring fewer computational resources for training.


  • Demonstrating that ViT achieves excellent performance on image classification tasks when pre-trained on large datasets and transferred to multiple benchmarks.
  • Analyzing the usage of self-attention in ViT through metrics such as attention distance and attention maps, providing insights into how ViT integrates information across images.
  • Reporting results on the ObjectNet benchmark, where the ViT-H/14 model achieves 82.1% top-5 accuracy and 61.7% top-1 accuracy.
  • Presenting scores attained on various tasks within the VTAB-1k benchmark, showcasing the performance of ViT across different image recognition tasks.


The paper addresses the limitation of the Transformer architecture in computer vision tasks by introducing the Vision Transformer (ViT) model, which applies the Transformer directly to sequences of image patches for image classification. Through extensive experiments on various benchmarks, including ImageNet, CIFAR-100, VTAB, ObjectNet, and VTAB-1k tasks, the authors demonstrate that ViT achieves excellent performance compared to state-of-the-art convolutional networks while requiring fewer computational resources. Additionally, the paper provides insights into how ViT utilizes self-attention through analyses of attention distance and attention maps, further enhancing our understanding of its functioning in image recognition tasks.


Posted on Leave a comment

Research of spatial context convolutional neural networks for early diagnosis of Alzheimer’s disease

Research of spatial context convolutional neural networks for early diagnosis of Alzheimer’s disease

Reference: Yinsheng Tong, Zuoyong Li, Hui Huang, Libin Gao, Minghai Xu & Zhongyi Hu



The problem targeted in the paper is the early and effective diagnosis of Alzheimer’s Disease (AD) and Mild Cognitive Impairment (MCI) using structural MRI images. Current deep learning methods often overlook the contextual spatial information within these images, potentially missing crucial structural details and impacting the accuracy and generalization ability of the models. Therefore, the paper aims to develop a new network model to effectively detect or predict AD by leveraging deeper spatial contextual structural information.

Key contribution:

The key contribution is the proposal of a spatial context network based on 3D Convolutional Neural Network (CNN) to learn multi-level structural features of brain MRI images for AD classification. This network is designed to capture spatial contextual relationships between slices, enhancing feature representation and improving model stability, accuracy, and generalization ability.


The experimental results demonstrate the effectiveness of the proposed spatial context network model. It achieved high classification accuracy rates, including 92.6% in AD/CN comparison, 74.9% in AD/MCI comparison, and 76.3% in MCI/CN comparison. Ablation experiments further validate the effectiveness of the spatial context network, showing improvements in model performance compared to traditional 2D CNNs and other methods in the literature.



The paper introduces a novel spatial context network model for the early detection and classification of Alzheimer’s Disease using structural MRI images. This model effectively captures spatial contextual relationships between image slices, enhancing feature representation and improving classification accuracy. Experimental results demonstrate the superiority of the proposed model over traditional methods, highlighting its potential for clinical applications in disease diagnosis. However, the study acknowledges limitations, such as the focus on structural MRI data only and suggests future research directions, including the integration of data from multiple modalities like positron emission computed tomography (PET). Overall, the paper underscores the significance of deep learning methods in disease diagnosis and highlights the promising results achieved with the spatial context network approach.


Posted on Leave a comment

Graph RL for High renewable Energy penetration


Li, P., Huang, W., Dai, Z., Hou, J., Cao, S., Zhang, J., & Chen, J. (2022, March). A Novel Graph Reinforcement Learning Approach for Stochastic Dynamic Economic Dispatch under High Penetration of Renewable Energy. In 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES) (pp. 498-503). IEEE.


In order to enhance the decision quality of economic dispatch, the author of this work suggests using graph reinforcement learning (GRL) to sort out the improved uncertainty produced by a large number of distributive generations and next the GRL more accurately depicts the structure of the system. Because of the constraints in his previous work the author inspired to produce this one, which employs GRL to get around those limits.

The limitations from other studies: 

  • MIQP method, which significantly raises the standard of ED’s global optimum solution search. Moreover, some researchers attempt to combine heuristic algorithms with MILP and put out novel concepts as EP-SQP, PSO-SQP, AIS-SQP, and so on.  
  •  Heuristic algorithms are severely constrained by their poor optimization efficiency and sensitivity to parameters, whereas MILP depends on accurate system modeling, which is challenging to put into practice. 
  •  Heuristic algorithms are highly limited by their low optimization efficiency and parameter sensitivity, while MILP relies on precise system modeling, which is difficult to implement. 


So, for these limitations the author proposes GRL, which is a graph convolutional network (GCN) with extension of soft actor-critic (SAC). Because GRL is built on SAC, and it has SAC’s capacity to prevent local optimum while making decisions. By using entropy regularization, learning progresses more quickly.
GRL only requires real-time operational circumstances when the algorithm is executing online in order to achieve extremely quick decision making. The updated IEEE39 instance validates the validity and speed of GRL.

Methodology of validating GRL using modified IEEE39:  

Node characteristics are specified as node operation features in this study, and they may be represented as Pload, Qload, PG, P WT, P PV, E BS, t, a]. s. Topological structure is used as the graph structure. f is equivalent to 8 in this paper and Using the IEEE5 scenario as an illustration, the following figure: 

                                figure of Power system operation state graph representation from the study by Li et al. (2022)

In order to extract the structural characteristics of graph data, this article first adopts a fully connected layer for feature transformation and then employs a two-layer graph convolutional neural network for stacking. The nonlinear mapping of the system state to action policy is then realized by using three complete connection layers which is shown below in the picture:


 connection layer of the proposed framework from the study by Li et al. (2022)

IEEE39 framework and working:

As seen in the figure below, a modified IEEE39 is used for a 24-hour real-time operation with 10 conventional generations, 2 WTs, 2 PVs, and 1 BS. There are 96 phases in a day of the real-time dispatching cycle, which is scheduled to run every 15 minutes. It is presumed that all generations—conventional and unconventional—are always in usable shape. The designated training hours are 8000 where as Python is used for all simulations in this, and it has an Inter Core i7 processor and 16GB of RAM. 


                       IEEE39 structure from the study by Li et al. (2022)

8,000 days of training trials, with 96 stages each day and 768,000 data points, were carried out for this work and In this instance and the generating parameters are displayed in the TABLE below there Both βBS and βBS are fixed at $100/Mw*h; the cost of the power system purchasing electricity from the external grid is $300/Mw*h, while the cost of selling the power is just $30/Mw*h.

Evaluation results from the study by Li et al. (2022)


  • A novel method for GRL was created with a high penetration rate of RES for dynamic economic dispatch.
  • GRL uses system topology to aggregate operating data.
  • The depiction of operation data using graphs reveals strong relationships between them.
  • GRL increases system flexibility and economy.
  • shorter offline pre-learning period than with current techniques.
  • Because the state graph is sparse, the algorithm has strong scalability.
  • Future scope: promising potential for use in high-renewable energy systems. 


  •  GRL: Graph Reinforcement Learning
  • RES: Renewable Energy Sources
  • GCN: Graph Convolutional Network
  • SAC: Soft Actor-Critic
  • WT: Wind Turbine
  • PV: Photovoltaic
  • BS: Base Station
  • RE: Renewable Energy
Posted on Leave a comment

Graph-Based RL for Dynamic Network Scheduling


Xing, Q., Chen, Z., Zhang, T., Li, X., & Sun, K. (2023). Real-time optimal scheduling for active distribution networks: A graph reinforcement learning method. International Journal of Electrical Power & Energy Systems145, 108637.


By connecting ADN with GAT (Graphical Attention Network) and DDPG (Deep Deterministic Policy Gradient), the author worked on providing a real-time online collaborative optimization of control equipment for improved economy and safety of ADNs (active distribution networks). The GAT extracts complex unstructured graphical information that contains power information in the form of nodes and topological relations in the form of edges that come from ADN and feeds these feature extractions back into ADN as action output by passing them into a DRL (Deep Reinforcement Learning) and DPDG generates optimization solutions for efficient scheduling of ADN. Case studies are conducted by contrasting the suggested GRL model with conventional DRL approaches. 

 Limitations in other papers in scheduling network:

Uncertain random output generations are produced by the high connectedness of distributive generations found in modern distributive generations like ADN.  Certain papers offer research on the best ADN policies; however, they rely on precise prediction data, such outputs from renewable energy sources, and neglect to account for source-load uncertainty in their modelling study. The precision of the prediction data has a significant impact on the optimization outcomes. Making quick judgments on online dispatching for the growing amount of controllable equipment is challenging where Recent development of DRL provides a solution for computer complexity of ADN’s and the strong connection of distributive generations seen in contemporary distributive generations such as ADN results in uncertain random output generations.
Some studies provide research on optimal ADN policies, but they base their analysis on accurate forecast data, such outputs from renewable energy sources, and thus don’t take source-load uncertainty into consideration when modelling. The optimization results are significantly influenced by the prediction data’s accuracy. It is difficult to make snap decisions on online dispatching for the increasing quantity of controllable equipment. 
none of the previous work sufficiently evaluates the feasibility and scalability of DRL-based optimal dispatch models for ADNs under the topology variation scenarios. 


nevertheless, there are still two primary problems that come up, like:
• To begin with, the majority of DRL-based ADN optimization techniques (such as DQN and DDPG) fall short of fully examining the organic graph-structured characteristics present in distribution systems.
• Changes in topology and operating patterns brought on by emergencies frequently happen in the distribution 



 structure of the proposed scheduling for ADNs from the study by Xing, Q., Chen, Z., Zhang, T., Li, X., & Sun, K. (2023)

The author’s technique entails that taking unstructured graphical data out of ADN and putting it into a DDPG, which then generates an action output for the best possible scheduling. The ADNs use this action output to make the best decisions possible, increasing the ADN’s economy and safety. 

Eess t,k = Eess t− 1,k + v ess,ch t,k ηess,ch k Δt Qess k Pess,ch t,k + v ess,ch t,k Δt ηess,disc k Qess k Pess,disc t,k Eess,min  

k EESS t,k Eess,max k 

 Eess,end k = Eess,pref 

, this equation assures that the state of charge (SOC) of an ESS is within the operating range so, avoiding.
 the damage to batteries resulting from excessive charging and discharging for the best possible model scheduling. the harm that comes from overcharging and overdischarging batteries in order to achieve optimal model scheduling. 

The author then formulates the energy management decision-making issue as an FMDP using the features that were taken from GAT. The primary variables and elements of decision-making included in the FMDP include states, actions, and rewards. 

conclusion :

  • The technique for optimizing ADNs that combines GAT and DDPG is presented in the paper: GRL.
  • GRL aims for distribution networks to operate safely and economically.
  •  GAT extracts ADNs’ load characteristics and topological structure.
  • For controlled equipment, DDPG develops energy management plans.
  • Under steady conditions, the results demonstrate lower operating costs and better decision-making.
  • In cases of unknown faults, GRL performs better than DDPG and GDDPG.
  • One area of future research will be GNN embedding in reinforcement learning.


Posted on Leave a comment

Electrical System for Varying Topologies using Graph based DRL

Reference Paper

Zhao, Y., Liu, J., Liu, X., Yuan, K., Ren, K., & Yang, M. (2022, December). A graph-based deep reinforcement learning framework for autonomous power dispatch on power systems with changing topologies. In 2022 IEEE Sustainable Power and Energy Conference (iSPEC) (pp. 1-5). IEEE.


Modern energy systems face increased complexity due to growing distributed power supply and energy load variability. Traditional autonomous policies struggle with this changing topology. To address this, the author proposes a framework combining Convolutional Graphics Network with DRL and GraphSAGE. it lies in the category of graph based DRL. This ensures adaptability to evolving topologies caused by emergencies, maintenance, and grid development. The Policy Proximal Optimization (PPO) algorithm facilitates effective power dispatch by recognizing network characteristics. This approach feeds graphical structured data into DRL to optimize outcomes.

 limitations of previous existing methods: 

  •  Traditional OPF policies overlook power system expansion and operational variations.
  • Some studies integrate DRL into OPF using PPO for distributed networks but are limited by fixed network topology.
  • To address this, recent research employs GCN in DQN for voltage control in varying topologies.
  • However, these approaches often use fixed-size input matrices, limiting their adaptability to new bus additions.

 This research proposes a unique graph based DRL framework for autonomous power dispatch that takes topology changes into account.  This work presents the Markov Decision Problem (MDP), a discrete-time control process model that is expressed as follows: 

min c_P,i =+α * P_i + β * P_A + γ * Σ loss_dis,i,j + δ * Σ loss_dis,i,k

This formula seems to represent a cost function for power distribution considering power values and losses between nodes.

 Graph based DRL Methodology

graph based DRL framework as shown in Figure 1 (Zhao et al., 2022)  

                  The suggested DRL architecture for electricity dispatch based on graphs. The critic NN’s architecture and the actor NN’s structure (which also constitutes IL’s structure). There are three branches to the suggested structure. In branch 2, the DRL agent interacting with the environment in the electricity dispatch issue utilizes the PPO algorithm.

The PPO neural networks include the Graph SAGE method to capture the characteristics of dynamic topologies. But starting from scratch might result in worse training outcomes. Therefore, in branch 1, the NN parameters of the PPO are initialized using historical data and expert knowledge. Branch 3 demonstrates how the PPO agent is updated during training by using a replay buffer.

 Every time step t, where it is represented as follows, the reinforcement learning (RL) agent engages with the environment and receives a reward t r:  Author sets a Multiple running cases, and the actions receive the highest reward are chosen through search to create the IL dataset.  

  • Architecture: Three-branch, graph-based DRL architecture.
  • Actor-Critic NN: Captures dynamic topologies by using GraphSAGE and the PPO method.
  • Initialization: Expert knowledge and historical data are used to initialize the PPO’s NN parameters.
  • Training: A replay buffer is used to update the PPO agent.
  • Case Study: Performed using a modified IEEE 118-bus design.

Every time step t, where it is represented as follows, the reinforcement learning (RL) agent engages with the environment and receives a reward t r: 


  • A graph based DRL structure for autonomous power dispatch was suggested in the paper, taking changes in power system topology into consideration.  
  • In this study, the Graph SAGE method is combined with DRL and makes use of imitation learning.  
  • In the proposed case study demonstrates that the conventional dense based PPO algorithm compared with suggested PPO algorithm, the suggested graph-based PPO approach is more successful when dealing with shifting topologies.  
Posted on

VGG-TSwinformer: Transformer-based deep learning model for early Alzheimer’s disease prediction.

VGG-TSwinformer: Transformer-based deep learning model for early Alzheimer’s disease prediction.

Patient suffering from Alzheimer’s disease


Zhentao Hu, Zheng Wang, Yong Jin, Wei Hou, School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China , College of Computer and Information Engineering, Henan University, Kaifeng, 475004, China


About 44% of people with mild cognitive impairment (MCI) go on to develop Alzheimer’s disease (AD) within three years, acting as a transitional phase between normal aging and AD.
Patients with MCI show anatomical abnormalities in the brain as well as cognitive deterioration; those who develop AD have the quickest rate of brain atrophy.
Given that pMCI patients have a higher risk of developing AD, differentiating between progressive MCI (pMCI) and stable MCI (sMCI) is essential for early AD diagnosis and treatments.
Differentiating between AD and cognitively normal (CN) individuals is not as complex as classifying pMCI and sMCI due to small changes in cognition and brain structure.

Problem and Limitations:

The field of aided diagnosis for Alzheimer’s disease (AD) has made great progress thanks to deep learning techniques, which have even surpassed manual diagnosis.
Two Groups of AD Diagnostic Techniques: There are two types of AD diagnostic methods: categorization models and progression models. While classification models forecast diagnosis labels, such as Alzheimer’s disease (AD) versus cognitively normal (CN) people, progression models try to quantify the course of the disease, such as differentiating between progressing MCI (pMCI) and stable MCI (sMCI).
Current Methodologies for Progression Models: A number of approaches, such as 3D CNN architectures, unsupervised learning models, and sparse regression coupled with deep learning, have been put forth for progression models. The accuracy of these methods in differentiating between pMCI and sMCI has varied.
Problems with Longitudinal Studies of MCI: Processing lengthy sequences of 3D medical image data presents difficulties for longitudinal studies of MCI, which can result in high-dimensional data volumes and overfitting with short datasets. Current methods, including manually extracting features and combining them with RNN, might not fully capture the features of brain disorders associated with AD.
Transformer-Based Models’ Potential: Transformer-based models have showed promise in resolving the difficulties associated with longitudinal studies; they were first used in natural language processing. Particularly, the Swin Transformer has demonstrated cutting-edge performance in computer vision applications, providing a viable method for handling short-term longitudinal MRI data.
VGG-TSwinformer Model Overview: In the final paragraph, the suggested VGG-TSwinformer model is presented. This model combines CNN and Transformer topologies to capture both spatial and temporal information for accurate prediction of MCI progression.


VGG-16 Architecture as discussed in the study by Zhentao Hu, Zheng Wang, Yong Jin, Wei Hou

The VGG-16 model is used in brain image processing because of its convolutional neural network architecture, which is well-known for its effective feature extraction. It has 13 convolutional layers and 5 pooling layers, with additional convolutional layers for channel extension and feature mapping. A convolutional layer with ReLU activation maps the input slices from slice series T1 and T2 to an output feature map size of 3x3x512, which is then mapped to tokens. A feature map is produced via max pooling, and its properties are dictated by variables such as stride and pooling window size. Low-level features are extracted by convolution, yielding 256 characteristics per slice. After mapping these attributes to tokens, two token series representing spatial features are produced. These token series then go through further processing to enable position embedding. The tokens are finally passed through a temporal attention block for further analysis, facilitating the prediction of disease progression.


Self-attention process

Self-Attention (SA):
Internal correlations in data sequences, such as tokens or image patches, are captured by SA.
Queries, keys, and values are obtained from input tokens.
Token relevance is determined by computing weight coefficients based on similarity between the query and the key.
Fused information from all tokens is represented by the weighted sum of value vectors.

Self-Attention in Multiple Heads (MSA):
Multiple independent query, key, and value groups are added by MSA to enhance SA.
Understanding is improved by the distinct characteristics of the data that each head captures.
Concatenation of the outputs creates a full representation.

Transformer Group:
consists of a multi-layer perceptron (MLP), layer normalization (LN), and SA or MSA.
Gradients are stabilized by LN, and input and SA/MSA output are combined via residual connections.
By using GELU activation to refine features, MLP increases the representational capability of the model.

VGG-Tswinformer model:

High-level characteristics from slices in T1 and T2 are extracted using VGG-16 CNN in order to prepare them for further processing.

Attention Mechanisms: To improve the model’s capacity to capture longitudinal changes, ten attention blocks—five spatial and five temporal—are used for feature integration.

Temporal Attention and MSA: In order to extract local longitudinal features that are essential for identifying minute changes in brain morphogenesis across time, MSA is performed in the temporal attention block.

Refinement with RSwin and LSwin Blocks: By dividing token series into windows and carrying out MSA, RSwin and LSwin blocks ensure excellent integration across spatial dimensions, hence refining feature fusion.

Complete Feature Integration: Ensures complete feature integration for precise predictions in MCI research by concluding with MSA on every token in T2.


The study was performed on Alzheimer’s Disease Neuroimaging Initiative (ADNI) [36] database ( to verify the performance of the proposed model

Normalization with FSL: with the FMRIB Software Library (FSL), the sMRI images were normalized into the MNI152 standard space, yielding uniform dimensions of 182 × 218 × 182 (X × Y × Z) with a spatial resolution of 1 × 1 × 1 mm³ per voxel.

Skull Dissection: To improve the quality of a later analysis, non-brain tissues were removed from spatially normalized sMRI images by the process of skull dissection.

Unified Bias Field Correction with ANTs: To reduce intensity changes brought on by imaging errors and improve the accuracy of subsequent processing stages, unified bias field correction was implemented using Advanced Normalization Tools (ANTs).
Axial Slice Selection: From the preprocessed sMRI images, axial slices were chosen in a vertical axial plane direction, beginning in the middle and extending to both ends. As a result, each image contained 40 axial slices, for a total of 80 slices per sample.

Formation of Slice Series T1 and T2: Following slicing, each sample included two 3D sMRI images that corresponded to slice series T1 and T2. The number of slices in both slice series was equal, and each slice’s measurements were 182 × 218 × 1.


The 823 samples were divided into three subsets: training, validation and test, of which 65% for training, 20% for validation and 15% for test. Because of the GPU restrictions, we set the token dimension C = 256 and the number of slices N = 80 for each of the two slice series in each sample. For 100 epochs, the model was trained with a momentum of 9e −5 and a learning rate of 1e −5. We used SGD [37] as the optimizer with a weight decay of 0.1 to prevent overfitting. Cross-entropy [38] was our loss function of choice. This study conducted five experiments, each of which randomly selected a different training, validation, and test subset in order to eliminate randomness.


Experiment with Replaced Slice Series: In each sample, all of the slices from slice series T1 were swapped out for slices from slice series T2. Five controlled studies showed that the model’s performance is not competitive with other algorithms when samples do not contain information regarding changes in the brain anatomy of individuals with motor cortex injury.

Model Sensitivity and Specificity Comparison: The original experiment’s sensitivity was higher than the control experiment’s, suggesting that the model is more responsive to alterations in brain anatomy. The initial experiment’s specificity, however, was lower, indicating larger percentages of missed diagnosis—a diagnosis that is more unsatisfactory in clinical settings than misdiagnosed.
Impact of Pre-trained CNN: Research on the use of pre-trained VGG-16 loaded with pre-trained weights in VGG-TSwinformer shown that the accuracy, specificity, and AUC of the scratch-trained model were higher than those of the pre-trained model. This suggests that the performance of the model prediction is not enhanced by utilizing pre-trained VGG.

Comparing Various Plane Slices: There are three plane views available in MRI images: sagittal, coronal, and axial. Variable influences on model performance were found while comparing the model’s performance utilizing various plane slices. Even though axial plane slices allowed the model to perform as comprehensively as possible, a single plane slice is insufficient to extract all of the information of a 3D MRI. It might work better to combine the three plane cuts.


Accuracy graph and Sensitivity vs Specificity as discussed in the study by Zhentao Hu, Zheng Wang, Yong Jin, Wei Hou

Model Superiority: The suggested VGG-TSwinformer model performs better in terms of accuracy (77.2%), sensitivity (79.97%), and AUC (0.8153%) than cross-sectional deep learning methods.

VGG-TSwinformer Summary and Confirmation: In longitudinal MCI investigations, VGG-TSwinformer extracts sMRI features by using VGG-16 CNN. It captures anatomical alterations in the brain by using sliding-window methods and temporal attention. When validated using the ADNI database, the diagnostic efficacy is higher than when using cross-sectional approaches. Limitations include problems with feature fusion, indicating that future research should use multimodal biomarkers.

Posted on Leave a comment

GRL-based residential electricity behavior

Chen, X., Yu, T., Pan, Z., Wang, Z., & Yang, S. (2023). Graph representation learning-based residential electricity behavior identification and energy management. Protection and Control of Modern Power Systems8(1), 28.,

Home energy management system – HEMS is an end-user energy conserving and emission-reducing method. but user behavioral identification and energy management strategy are issues of efficient HEMS and current HEMS assume user behaviour or ignore the relation between the user and appliance it causes poor management approach and improper behavioural explanation to counter these and in order to enhance HEMS decision making, this research suggests combining graph reinforcement learning (GRL) with non-intrusive load monitoring (NILM). The user behaviour and energy consumption are identified through NILM by a correlation graph and a multi-label classification approach is employed to monitor loads.

Limitations in other methods:

Research that has already been done either uses additional intrusive devices to collect user behaviour data or assumes that usage behaviour is known beforehand. These methods are not practical consideri8ng dynamic change of behaviour Conventional NILM techniques are limited to load disaggregation; they are unable to identify behaviour. The current NILM approaches have increased equipment need and poor disaggregation accuracy. Thus, the author notes that creating a practical and precise online energy behaviour detection technique for HEMS input still a difficult challenge. Certain studies make the assumption that behaviour is well-known, yet they don’t say where they got their behavioural data. However, these studies do not adequately consider the dynamic uncertainty of user According to several research, the decision-making process for improving each appliance is independent of the others. In several studies, behaviour recognition includes label correlations; still this integration is dependent on the time series signal for correlation capture of applications.

To compensate the above shortcomings the author proposes an intelligent HEMS technique uses NILM-assisted graph reinforcement learning (GRL) for behavioural detection and strategy to minimize electric cost.


   HEMS framework  as discussed in the study by Chen et al. (2023).

The suggested method’s practical implementation is depicted in the centre of Figure above. The ML-SGN model performs load disaggregation and behaviour identification using aggregated data from an outdoor electricity meter. In order to help customers control their home energy usage, A set of instructions is produced using the HEMS method depending on the objective function, load states, and surrounding circumstances. To be more precise, the load state relates to the online status of loads, the relationship state refers to the correlation data in the graph, and the environment state refers to external data like the cost of electricity and the temperature outside.

This method’s behavioural correlation matrix, can be computed as:

                                                                                                            pi,j = p(ai|aj) = Ni/N

The likelihood that appliance ai will function after appliance aj is represented by pi, j. The total number of times that appliance aj is on is represented by Nj, and the number of times that appliance ai operates following aj is shown by Ni, j.

Behaviour identification allows for the determination of the probability of appliance utilization during each period and the behaviour correlation. In contrast to the conventional NILM, behaviour identification requires not just determining the gadget, but moreover to extract the usage behaviour of the equipment.

Formulating the problem for HEMS:

Residential loads are separated as photovoltaic distribution (PV), loads transferable (TL), loads that can’t be controlled (UL), inaccessible loads (IL), and thermostatically regulated loads (TCL). where several appliances may be included by TCL.

The water heater equivalent thermodynamic model is expressed as follows:

Certainly, here’s a simpler version of the formula:

TWH_{n+1} = (TWH’_{n+1} * (V – V_{n,demand}) + TWH_{n,inject} * V_{n,demand}) / V

The price of energy use, comfort of users, and the consequence of breaking the restrictions, which can be expressed as

Here’s a simplified version:

rn = Ro + Cn – αEn Sn ∈ Sc

where En, Cn represent the energy consumption-related cost, comfort, and a penalty respectively.

Performance evaluation:

Six houses’ consumption of energy is provided by the REDD dataset, while twenty families’ electric power measures are provided by the REFIT dataset.

The remaining REDD and REFIT data are utilized to build a previous behaviour connection using the ML-SGN model.

Graphical correlation as discussed in the study by Chen et al. (2023).

The suggested ML-SGN model is compared with the classical multi-label classification model, multi-label k-nearest neighbour algorithm (MLKNN), random k-label sets algorithm (RAKEL), and single label model (SGN) in order to demonstrate that it not only performs better than the conventional multi-label classification methods but also shows exhibits a stronger understanding ability than original single-label models. Its effective performance can be seen by comparison with the load division with attention model (LDWA), which is baseon SGN and more sophisticated due to the attention approach.

The results of behaviour identification as discussed in the study by Chen et al. (2023).


  • The suggested approach builds and modifies a graph that records customers’ power use patterns, then uses an enhanced multi-label NILM technique to pinpoint such patterns.
  • Simulations are used to assess the suggested approach, showing its improved performance in HEMS and behaviour detection.
The suggested approach offers these two key benefits:
  • In the studies, it attains a high average recognition accuracy of 93.2%, proving its efficacy in behaviour identification.
  • It keeps consumer comfort and satisfaction levels high while lowering average power costs for users by 18.3%
  • The technology offers a better balance between energy cost and user comfort than earlier approaches.
Posted on Leave a comment

Remembering the sacrifice of Freedom Fighters of India

India is celebrating its 74th Independence Day on 15th August. Independence Day is an annual observance day which is celebrated on every 15th August.   India celebrates Independence Day and remembers the sacrifices that our freedom fighters made during the struggle against the British Empire. 15th August is declared as the National and Gazetted holiday to celebrate the independence of our country from the British Empire.

On the fifteenth of August 1947, Pandit Jawaharlal Nehru, who turned out to be the first Prime Minister of Independent India. On that day of 1947, he raised the Indian national flag over the Lahori Gate at the Red Fort in Delhi. And after that, in a similar fashion, every year the national flag of India is hoisted from the same platform by the contemporary Prime Ministers of our country.

Mahatma Gandhi was the leader who guided India towards Independence. India was under the British rule for over 250 years. Gandhi returned to India from South Africa in 1915 at the request of Gopal Krishna Gokhale.

Gandhi’s contribution to the Indian freedom movement cannot be measured in words. He, along with other freedom fighters, compelled the British to leave India. His policies and agendas were non-violent and his words were the source of inspiration for millions.

Gandhi’s Salt March is considered to be a pivotal incident in the history of freedom struggle. At the Calcutta Congress of 1928, Gandhi declared that the British must grant India dominion status or the country will erupt into a revolution for complete independence. The British did not pay heed to this.

A man is but a product of his thoughts. What he thinks he becomes

By Mahatma Gandhi

As a result, on December 31, 1929, the Indian flag was unfurled in Lahore and the next January 26 was celebrated as the Indian Independence Day. Then, Gandhi started a Satyagraha campaign against the salt tax in March 1930. He marched 388 kilometres from Ahmedabad to Dandi in Gujarat to make salt. Thousands of people joined him and made it one of the biggest marches in Indian history.

Many more freedom fighters were playing a crucial role in the history of independent India. Bhagat Singh was born on 28 September 1907 into a family that had been involved in revolutionary activities against the British Raj. This would go on to prove just how true he was to his name in the coming years. Bhagat means ‘bhakt’ or ‘devotee’. From an early age Bhagat was devoted to the cause of his motherland and her freedom.

The tragic death of Lala Lajpat Rai was  turning point in Bhagat Singh’s life. The independence movement was in full heat. The Simon Commission was set up to look into the state of Indian constitutional affairs. It left the Indian public outraged and insulted that the Commission which was to determine the future of India, did not include a single Indian member in it. Lala Lajpat Rai who was leading a non-violent protest against the Simon Commission when the Commission visited Lahore on 30 October 1928, got grievously injured when the police under orders from James A. Scott, lathi charged on the unsuspecting crowd. Lalaji died on 17 November 1928.

Singh vowed to take revenge for Lalaji’s murder and joined other revolutionaries, Shivaram Rajguru, Sukhdev Thapar and Chandrashekhar Azad, in a plot to kill Scott. However, in a case of mistaken identity Bhagat Singh and Rajguru shot Assistant Superintendent of Police John P. Saunders on 17 December 1928. 

Even after fighting so valiantly and sacrificing his life at the young age of 23 for the freedom of this country. Responding to an inquiry under the Right to Information Act the Home Ministry said that it possesses no record to prove that Bhagat Singh has been declared a martyr. There might be no record to prove him a martyr but that doesn’t take away from his greatness and try as we may we can never really forget the cries of “Inquilab Zindabad!”

Criticism and independent thinking are the two indispensable qualities of a revolutionary

By Bhagat singh

We should take a stand for what you believe in as per the Mahatma Gandhi protesting in independence of India. Another important lesson we can learn from our freedom fighters like Bhagat singh, Lala Lajpat rai, Chandarshekhar Azad, Subhash chandr Bose and Rani Luxmibai is their absolute believes in the power of team building and team work. It is a lovely and necessary thing to dream, and to dream big, and we would be wise to teach our children the same life lesson. The lesson to draw ultimately from our set of incredible freedom fighters is that plans they drew up in their lives, they had the courage, the will, the strength, and the sheer guts, to execute them. Our follows principle of freedom fighters like team work, action oriented and innovation in research ideas.   

Just like Independence Day is valued by other countries around the world, India also celebrates it with happiness and pomp. Come and be a part of this joy. You’re always welcome.