Posted on Leave a comment

GRL for user preference- energy sharing systems


Timilsina, A., Khamesi, A. R., Agate, V., & Silvestri, S. (2021). A reinforcement learning approach for user preference-aware energy sharing systems. IEEE Transactions on Green Communications and Networking5(3), 1138-1153


By allowing customers with renewable energy capabilities to sell energy, Energy Sharing grids (ESS) have the potential to completely transform power grids. This work provides an ESS that takes constrained rationality, engagement, and customer preferences into account in a novel way. It is NP-Hard to maximize energy exchange while accounting for user modeling. Two heuristics are given to handle this: one based on Reinforcement Learning and giving bounded regret, while the other, more efficient and with ensured termination and accuracy, is called BPT-K. Comparing the suggested algorithms to state-of-the-art techniques, an experimental investigation utilizing genuine datasets reveals that they produce 25% greater efficiency and 27% more transferred energy. Furthermore, in less than three months, the learning algorithms converge to within 5% of the ideal answer.

Limitations in the previous Energy sharing systems:

  • Simplified Human Behavior Models: It is impractical for many previous research to presume that users are constantly present and engaged.
  • Assumed Parameter Knowledge: A number of studies use the unrealistic assumption that user behavior models’ parameters are known ahead of time.
  • Homogeneous consumer Preferences: The variability in consumer preferences for energy sources is frequently overlooked in previous study.
  • Limited participation Consideration: A lot of models don’t take user participation with energy management systems into consideration, which varies widely.
  • Potential System Failure in Real-world Application: When used in real-world circumstances, models with overly simplistic assumptions run the risk of causing system breakdowns.


In order to optimize overall performance and acquire user preferences in energy exchange systems, this research presents a sophisticated reinforcement learning technique. Accurately representing user preferences through probabilities is essential for optimizing energy exchanges. However, obtaining these preferences directly from users often leads to inaccuracies. To overcome this challenge, the study rephrases the issue as a combinatorial multi-armed bandit problem. This approach enables the system to dynamically learn preferences by observing the outcomes of user recommendations. The primary goal of optimization is to maximize the overall energy that is traded, thereby enhancing system efficiency and user satisfaction.


User Preference Learning (UPL) algorithm’s operation:

setup and optimization. The method chooses random actions to monitor each variable at least once during startup. It balances exploration and exploitation using an Upper Confidence Bound (UCB) method throughout the optimization phase, with the goal of maximizing energy exchange and minimizing regret—the discrepancy between the ideal and expected rewards.

In order to improve performance, the BiParTite-K Algorithm (BPT-K) tackles the generalized matching issue with a restriction to prevent inundating consumers with suggestions, and the Faster Initialization Algorithm (FIA) reduces the days required to observe all variables. By discretizing energy needs and capabilities into units and utilizing Maximum Weighted Bipartite Matching (MWBM) to maximize the exchange, BPT-K iteratively improves matches.

Overall, this strategy guarantees limited regret, which means that even with initially unknown preferences, the system will eventually find optimum matches and maximize energy exchanges.

Energy sharing system overview from the study by Xu, P., Pei, Y., Zheng, X., & Zhang, J. (2020, October)



  • Using Virtual Power Plants (VPPs) to create an Energy Sharing System (ESS) for the exchange of locally generated energy, this study formulates the issue as a Mixed-Integer Linear Programming (MILP) and shows that it is NP-Hard.
  • This study takes a fresh approach by integrating a realistic user behavioral model that considers consumer preferences, engagement, and constrained rationality, in contrast to other research.
  • In practical evaluations using actual energy data, the proposed User Preference Learning (UPL) algorithm and the BPT-K heuristic, both based on Reinforcement Learning (RL), demonstrate near-optimal performance and outperform existing approaches.
  • Refinements to account for changing behavior and the influence of past actions on future decisions may be necessary to the assumptions about user behavior being independent and stable.
  • Future studies should examine how pricing affects user preferences, if semi-automated systems can sustain long-term user involvement, and whether more sophisticated models that take into account the irrationality of human decision-making are feasible.

Sakthivel R

I am a First-year M.Sc., AIML student enrolled at SASTRA University

I possess a high level of proficiency in a variety of programming languages and frameworks including Python. I have experience with cloud and database technologies, including SQL, Excel, Pandas, Scikit, TensorFlow, Git, and Power BI

Leave a Reply

Your email address will not be published. Required fields are marked *