ugr-sail / sinergym Goto Github PK

Gym environment for building simulation and control using reinforcement learning

Home Page: https://ugr-sail.github.io/sinergym/

License: MIT License

Python 98.22% Dockerfile 1.78%

buildings reinforcement-learning reinforcement-learning-environments building-energy energyplus framework python simulation building-control

sinergym's Introduction

Sinergym

Welcome to Sinergym!

⚠️ Warning: Pytype is disabled temporally because is not compatible with Python 3.12 yet.

The goal of this project is to create an environment following Gymnasium interface for wrapping simulation engines (EnergyPlus) for building control using deep reinforcement learning or any external control.

For more information about Sinergym, please visit our documentation.

To report questions and issues, please use our issue tracker. We appreciate your feedback and contributions. Check out our CONTRIBUTING.md for more details on how to contribute.

The main functionalities of Sinergym are the following:

Simulation Engine Compatibility: Uses EnergyPlus Python API for Python-EnergyPlus communication. Future plans include more engines like OpenModelica.
Benchmark Environments: Designs environments for benchmarking and testing deep RL algorithms or other external strategies, similar to Atari or Mujoco.
Customizable Environments: Allows easy modification of experimental settings. Users can create their own environments or modify pre-configured ones in Sinergym.
Customizable Components: Enables creation of new custom components for new environments, making Sinergym scalable, such as function rewards, wrappers, controllers, etc.
Automatic Building Model Adaptation: Sinergym automates the process of adapting the building model to user changes in the environment definition.
Automatic Actuators Control: Controls actuators through the Gymnasium interface based on user specification, only actuators names are required and Sinergym will do the rest.
Extensive Environment Information: Provides comprehensive information about Sinergym background components from the environment interface.
Stable Baseline 3 Integration: Customizes functionalities for easy testing of environments with SB3 algorithms, such as callbacks and customizable training real-time logging. However, Sinergym is agnostic to any DRL algorithm.
Google Cloud Integration: Offers guidance on using Sinergym with Google Cloud infrastructure.
Weights & Biases Compatibility: Automates and facilitates training, reproducibility, and comparison of agents in simulation-based building control problems. WandB assists in managing and monitoring model lifecycle.
Notebook Examples: Provides code in notebook format for user familiarity with the tool.
Extensive Documentation, Unit Tests, and GitHub Actions Workflows: Ensures Sinergym is an efficient ecosystem for understanding and development.
And much more!

This is a project in active development. Stay tuned for upcoming releases.

Project Structure

This repository is organized into the following directories:

sinergym/: Contains the source code for Sinergym, including the environment, modeling, simulator, and tools such as wrappers and reward functions.
docs/: Online documentation generated with Sphinx and using Restructured Text (RST).
examples/: Jupyter notebooks illustrating use cases with Sinergym.
tests/: Unit tests for Sinergym to ensure stability.
scripts/: Scripts for various tasks such as agent training and performance checks, allowing configuration using JSON format.

Available Environments

For a complete and up-to-date list of available environments, please refer to our documentation.

Installation

Please visit INSTALL.md for detailed installation instructions.

Usage example

If you used our Dockerfile during installation, you should have the try_env.py file in your workspace as soon as you enter in. In case you have installed everything on your local machine directly, place it inside our cloned repository. In any case, we start from the point that you have at your disposal a terminal with the appropriate python version and Sinergym running correctly.

Sinergym uses the standard Gymnasium API. So a basic loop should look like:

import gymnasium as gym
import sinergym
# Create the environment
env = gym.make('Eplus-datacenter-mixed-continuous-stochastic-v1')
# Initialize the episode
obs, info = env.reset()
truncated = terminated = False
R = 0.0
while not (terminated or truncated):
    a = env.action_space.sample() # random action selection
    obs, reward, terminated, truncated, info = env.step(a) # get new observation and reward
    R += reward
print('Total reward for the episode: %.4f' % R)
env.close()

A folder will be created in the working directory after creating the environment. It will contain the Sinergym outputs produced during the simulation.

For more examples and details, please visit our usage examples documentation section.

Google Cloud Platform support

For more information about this functionality, please, visit our documentation here.

Projects using Sinergym

The following are some of the projects benefiting from the advantages of Sinergym:

📝 If you want to appear in this list, do not hesitate to send us a PR and include the following badge in your repository:

Repo Activity

Citing Sinergym

If you use Sinergym in your work, please cite our paper:

@inproceedings{2021sinergym,
    title={Sinergym: A Building Simulation and Control Framework for Training Reinforcement Learning Agents}, 
    author={Jiménez-Raboso, Javier and Campoy-Nieves, Alejandro and Manjavacas-Lucas, Antonio and Gómez-Romero, Juan and Molina-Solana, Miguel},
    year={2021},
    isbn = {9781450391146},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3486611.3488729},
    doi = {10.1145/3486611.3488729},
    booktitle = {Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation},
    pages = {319–323},
    numpages = {5},
}

sinergym's People

Contributors

Stargazers

Watchers

sinergym's Issues

Update documentation for gcloud centralize data and automatic stop

Update Energym compatibility with Energyplus 9.5.0

Create environment logger

This logger can record observation values, action values, reward, temp, power, timestep, simulation_time (seconds) and extra information are useful.

On the other hand, It's convenient to create other file where record episode summarized information like mean reward, total simulation timesteps, etc.

Finally, we could record energym terminal logger which already exists in a log.txt file (optional?)

Document the use with vscode

Everything is in place for cloning the repo, open it in vscode and get it running through the "developing inside a container" functionality. However, currently this is not documented anywhere (particularly in the README)

Reduce size (~4.5GB) of the container specified in Dockerfile

It may be interesting to reduce this size. There are some minimal Linux OS variants like Alpine that might work.

Execution time test

Since we are adding more and more functionalities into energym, I think we should include a test to measure the execution time of the simulation.

Maybe it is enough to launch some environments with random actions and assert that simulation runs in less than N seconds/steps.

Visualization tool for Energym output data

Stochasticity

Currently the environment is fully deterministic, since the weather is fixed. A sense of randomness could be included to improve the diversity of the simulations.

Change weather from year to year
Add noise to the observations and/or actions
Include forecasting

Create tests for environment logger

Add tests for all environments

Currently, Eplus-demo-v1 environment is used for all high level tests.

Set energym logger as a wrapper

I think it will be better for new updates, this shouldn't take long

Reward Documentation

It would be fine if we document Reward class and hyperparameters.

Migrate eplus_old to eplus in simulator script

Add tensorboard in repository extras

Division by zero when resetting the environment

The following non-handled exception occurs when resetting the environment:

Traceback (most recent call last):
  File "./A2C.py", line 85, in <module>
    obs = env.reset()
  File "/usr/local/lib/python3.6/dist-packages/gym/core.py", line 264, in reset
    observation = self.env.reset(**kwargs)
  File "/workspaces/energym/energym/envs/eplus_env.py", line 225, in reset
    self.logger.log_episode(episode=self.simulator._epi_num)
  File "/workspaces/energym/energym/utils/common.py", line 307, in log_episode
    self.comfort_violation_timesteps/self.total_timesteps*100)
ZeroDivisionError: division by zero

I'm using Energym v0.3.0 in Ubuntu. The error occurs in both discrete and continuous environments.

I was able to stop it from occurring by previously activating the logger with: env.env_method('activate_logger'), but this is an exception that should be handled.

Cheers!

[Feature] New buildings to be included

Feature 🚀

Add new buildings to be controlled for creating a more extensive benchmarking environment.

Motivation

Currently, two buildings are included in Sinergym: a 5Zone office building and a data center. We want to include other buildings in order to include new and diverse environments.

Ideally, these new buildings would be of different types (i.e. hospitals, restaurants, warehouses, etc.), and with different things to control (apart from HVAC system).

Solution

In order to include the building, the IDF file should be added into the sinergym/data/buildings folder. We should understand the main components of it, modify it to accept control signals from an external interface and define the observation and action spaces.

Checklist

I have checked that there is no similar issue in the repo (required)

Tests for DRL Logger

Simulation periods and termination conditions

Extend the simulation period to one or several years
Add the functionality of changing dynamically the simulation period or starting dates, for example from episode to episode.
Currently termination condition only occurs at the end of the simulation. Other termination conditions (game over) could be added.

Define infrastructure to Google cloud

Action and observation spaces

Action spaces

At least one environment for discrete, continuous and multi-discrete action spaces.
Add the restriction of maximum number of actions to perform per time, for example each hour.

Observation spaces

Add new variables into the state
Include time and day of the simulation
Add wrappers for manipulating the observations: normalization, stacking last N observations, etc.

dynamic actions and observations spaces

Now we have several environments in our project. Thus, I think it might be interesting to define the observations and actions spaces separately from the code (in XML for example) and specify it in env constructor.

Currently, this specification is static within code as I comment.

Noise bug in stochastic environments

Actual noise with sigma=2.5 make environmets too hostile.

Epw without noise:

Epw with noise:

Possible solution: Change sigma to 0.5

Rule-based controllers

Implementation of rule-based controllers for comparison with the DRL algorithms.

Use Google Cloud Alerts to remove or stop remote containers automatically

Documentation for Google cloud platform integration in Energym

Documentation for DRL Logger

Callback error when using DQN, DDPG and SAC

When executing DQN, DDPG and SAC in any environment, the following error appears just before training:

Traceback (most recent call last):
  File "./A2C.py", line 90, in <module>
    model.learn(total_timesteps=timesteps, callback=callback)
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/ddpg/ddpg.py", line 131, in learn
    reset_num_timesteps=reset_num_timesteps,
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/td3/td3.py", line 204, in learn
    reset_num_timesteps=reset_num_timesteps,
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 273, in learn
    log_interval=log_interval,
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 481, in collect_rollouts
    if callback.on_step() is False:
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/callbacks.py", line 192, in _on_step
    continue_training = callback.on_step() and continue_training
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/workspaces/energym/energym/utils/callbacks.py", line 52, in _on_step
    action = self.locals['actions'][-1]
KeyError: 'actions'

There seems to be an error when retrieving self.locals['actions'].

I'm using Energym v0.3.0 in Ubuntu.

Cheers!

Reward function specification in env constructor

Now we have only a reward function implemented. However, we will have more functions in the future. Thus, I think it will be useful to determine reward function as a parameter of environment constructor. We can use SimpleReward class In __init__.py, and this could be changed in gym.make if we want, overwriting default function.

Energym logger for normalize wrapper environment

Although you wrap env with normalization, energym uses original values to log CSV files

Document tests

As done in the description of PR #21 , let's have information about the tests (what they are doing, how they are performed, how to add new tests, etc.) in a README within the test folder. It can also be referenced from the main README.

Add functionality to centralize experiment results and data visualization

LoggerWrapper dimension error

I'm executing the following script with Energym v.1.0.0 in order to test SAC in continuous environments:

#!/usr/bin/python3

import gym
import energym
import argparse
import uuid
import mlflow

import numpy as np

from energym.utils.callbacks import LoggerCallback, LoggerEvalCallback
from energym.utils.wrappers import NormalizeObservation, LoggerWrapper

from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import EvalCallback, BaseCallback, CallbackList
from stable_baselines3.common.vec_env import DummyVecEnv


parser = argparse.ArgumentParser()
parser.add_argument('--environment', '-env', type=str, default=None)
parser.add_argument('--episodes', '-ep', type=int, default=1)

parser.add_argument('--learning_rate', '-lr', type=float, default=0.0003)
parser.add_argument('--buffer_size', '-bf', type=int, default=1000000)
parser.add_argument('--learning_starts', '-ls', type=int, default=100)
parser.add_argument('--batch_size', '-bs', type=int, default=256)
parser.add_argument('--tau', '-t', type=float, default=.005)
parser.add_argument('--gamma', '-g', type=float, default=.99)
parser.add_argument('--train_freq', '-tf', type=int, default=1)
parser.add_argument('--gradient_steps', '-gs', type=int, default=1)
parser.add_argument('--target_update_interval', '-tu', type=int, default=1)
args = parser.parse_args()

# experiment ID
environment = args.environment
n_episodes = args.episodes
name = 'SAC-' + environment + '-' + str(n_episodes) + '-episodes'

with mlflow.start_run(run_name=name):

    mlflow.log_param('env', environment)
    mlflow.log_param('episodes', n_episodes)  
    mlflow.log_param('learning_rate', args.learning_rate)
    mlflow.log_param('buffer_size', args.buffer_size)
    mlflow.log_param('learning_starts', args.learning_starts)
    mlflow.log_param('batch_size', args.batch_size)
    mlflow.log_param('tau', args.tau)
    mlflow.log_param('gamma', args.gamma)
    mlflow.log_param('train_freq', args.train_freq)
    mlflow.log_param('gradient_steps', args.gradient_steps)
    mlflow.log_param('target_update_interval', args.target_update_interval)
    env = gym.make(environment)
    env = NormalizeObservation(LoggerWrapper(env))

    #### TRAINING ####

    # Build model
    # model = SAC('MlpPolicy', env, verbose=1,
    #             learning_rate=args.learning_rate,
    #             buffer_size=args.buffer_size,
    #             learning_starts=args.learning_starts,
    #             batch_size=args.batch_size,
    #             tau=args.tau,
    #             gamma=args.gamma,
    #             train_freq=args.train_freq,
    #             gradient_steps=args.gradient_steps,
    #             target_update_interval=args.target_update_interval,
    #             tensorboard_log='./tensorboard_log/' + name)

    # n_timesteps_episode = env.simulator._eplus_one_epi_len / \
    #     env.simulator._eplus_run_stepsize
    # timesteps = n_episodes * n_timesteps_episode + 501

    # env = DummyVecEnv([lambda: env])

    # # Callbacks
    # freq = 5  # evaluate every N episodes
    # eval_callback = LoggerEvalCallback(env, best_model_save_path='./best_models/' + name + '/',
    #                                    log_path='./best_models/' + name + '/', eval_freq=n_timesteps_episode * freq,
    #                                    deterministic=True, render=False, n_eval_episodes=2)
    # log_callback = LoggerCallback()
    # callback = CallbackList([log_callback, eval_callback])

    # # Training
    # model.learn(total_timesteps=timesteps, callback=callback)
    # model.save(name)

    #### LOAD MODEL ####

    model = SAC.load('best_models/' + name + '/best_model.zip')

    for i in range(n_episodes - 1):
        obs = env.reset()
        rewards = []
        done = False
        current_month = 0
        while not done:
            a, _ = model.predict(obs)
            obs, reward, done, info = env.step(a)
            rewards.append(reward)
            if info['month'] != current_month:
                current_month = info['month']
                print(info['month'], sum(rewards))
        print('Episode ', i, 'Mean reward: ', np.mean(rewards), 'Cumulative reward: ', sum(rewards))
    env.close()

    mlflow.log_metric('mean_reward', np.mean(rewards))
    mlflow.log_metric('cumulative_reward', sum(rewards))

    mlflow.end_run()

When executing, the following error appears:

ValueError: Error: Unexpected observation shape (16,) for Box environment, please use (19,) or (n_env, 19) for the observation shape.

I suppose the reason for this error is that the dimensions used in training and execution differ.

This only happens when using LoggerWrapper. Deepening into the code of this wrapper I found the following line:

# We added some extra values (month,day,hour) manually in env, so we need to delete them.
obs = obs[:-3]

Commenting this line avoids the error. In fact, I don't know the reason to delete month, day and hour from the observation.

Cheers 😃

PD. I attach the models in order to replicate
best_models.zip

Develop a specific Dockerfile to google cloud

It would be convenient to develop a dockerfile very similar to the current one. However, this new dockerfile wouldn't have tests and other files which aren't necessary to make experiments in Google Cloud

Add custom reward weights for energy consumption and confort

I propose to include the customization of weights for energy consumption and confort in the environment constructor.

This may be the code:

class EplusEnv(gym.Env):
    """
    Environment with EnergyPlus simulator.
    """

    metadata = {'render.modes': ['human']}

    def __init__(
        self,
        idf_file,
        weather_file,
        variables_file,
        spaces_file,
        env_name='eplus-env-v1',
        discrete_actions=True,
        weather_variability=None,
        energy_weight=0.5
    ):

...

    # Reward class
    self.cls_reward = SimpleReward(energy_weight=self.energy_weight)

This energy_weightcan be set both in the register of the environment or in the agents' scripts with:

env = gym.make(environment, energy_weight=ew)

Cheers

Reward with several zones

Reward is calculated using only temperature and power in reward function. Instead of using a value, it should use a list (temperatures) in order to be able to manage several zones.

Backend

Currently the backend is taken from Zhang's repository.

This issue is to consider all the improvements and functionalities related with backend or communication with simulation engines. Some of them could be:

Refactor of the code
Add new simulation engines
...

Update tests for EnergyPlus 9.5.0 compatibility

Actions predicted by A2C and PPO out of bounds in continuous action space.

We think this bug affects EnergyPlus simulation. Episodes data in tensorboard has always the same values (in discrete environments, this bug doesn't happen)

Being able to decide each action value separately in discrete environments

Action it's a unique value in discrete environments. We use that value like index in an action mapping and decide the tuple which define the discrete values that compose the action. Could we use these discrete values separately and not in a predefined tuple? Maybe define an action mapping for each discrete value (all of them with the same length)?

Documentation for environment logger

Write new documentation explaining logger's files format and where these files are stored.

Check Datacenter environment adjustement

Update Documentation for Energyplus 9.5.0 compatibility

change/eliminate filenames when start with "test" but they are not going to be executed by pytest

I have had some problems with that when combine this project with vscode, and it tries to discover new test. In order to prevent this, I think it would be better to change these file names.

For example, if we change test_env.py in repository root, we must change README.md in consequence.

Cheers!

New exponential reward function

Current reward function penalize comfort as the distance between temperature and a valid range. This part of the function is linear. For example, if the distance is 4, the penalty will be double than the distance is 2. In order to penalize larger distances, it would be interesting to propose an exponential function instead of a linear one.

Sometimes stable-baselines3 DQN test fails

The problem is in action type "if" blocks (environment step). Test is usually passed but fails sometimes randomly. This bug needs to be fixed.

Documentation

Documentation of the package in an Sphinx or similar format.

Include doc-strings, typing hints and any other self-documented code option.

Weather files

Include several weather files, so different combinations of building and climate can be simulated.

The main sources for these files are:

DOE's 19 climate zones, most of them in USA (Link)
Climate.OneBuilding files, from most of the world (Link).

When integrating these files, the design day must be also taken into account.

TO DOs:

Design days have to be included manually. Find a way to do it automatically
Modify weather from one year to another
Include more types of weather files, and also more locations (not USA)

ugr-sail / sinergym Goto Github PK

sinergym's Introduction

Sinergym

Project Structure

Available Environments

Installation

Usage example

Google Cloud Platform support

Projects using Sinergym

Repo Activity

Citing Sinergym

sinergym's People

Contributors

Stargazers

Watchers

Forkers

sinergym's Issues

Feature 🚀

Motivation

Solution

Checklist

Recommend Projects

Recommend Topics

Recommend Org