Code Monkey home page Code Monkey logo

sinergym's Introduction

Sinergym



Github latest release Github last commit Pypi version Pypi downloads GitHub Contributors Github issues GitHub pull requests Github License Pypi Python version

Welcome to Sinergym!



⚠️ Warning: Pytype is disabled temporally because is not compatible with Python 3.12 yet.

The goal of this project is to create an environment following Gymnasium interface for wrapping simulation engines (EnergyPlus) for building control using deep reinforcement learning or any external control.

For more information about Sinergym, please visit our documentation.

To report questions and issues, please use our issue tracker. We appreciate your feedback and contributions. Check out our CONTRIBUTING.md for more details on how to contribute.

The main functionalities of Sinergym are the following:

  • Simulation Engine Compatibility: Uses EnergyPlus Python API for Python-EnergyPlus communication. Future plans include more engines like OpenModelica.

  • Benchmark Environments: Designs environments for benchmarking and testing deep RL algorithms or other external strategies, similar to Atari or Mujoco.

  • Customizable Environments: Allows easy modification of experimental settings. Users can create their own environments or modify pre-configured ones in Sinergym.

  • Customizable Components: Enables creation of new custom components for new environments, making Sinergym scalable, such as function rewards, wrappers, controllers, etc.

  • Automatic Building Model Adaptation: Sinergym automates the process of adapting the building model to user changes in the environment definition.

  • Automatic Actuators Control: Controls actuators through the Gymnasium interface based on user specification, only actuators names are required and Sinergym will do the rest.

  • Extensive Environment Information: Provides comprehensive information about Sinergym background components from the environment interface.

  • Stable Baseline 3 Integration: Customizes functionalities for easy testing of environments with SB3 algorithms, such as callbacks and customizable training real-time logging. However, Sinergym is agnostic to any DRL algorithm.

  • Google Cloud Integration: Offers guidance on using Sinergym with Google Cloud infrastructure.

  • Weights & Biases Compatibility: Automates and facilitates training, reproducibility, and comparison of agents in simulation-based building control problems. WandB assists in managing and monitoring model lifecycle.

  • Notebook Examples: Provides code in notebook format for user familiarity with the tool.

  • Extensive Documentation, Unit Tests, and GitHub Actions Workflows: Ensures Sinergym is an efficient ecosystem for understanding and development.

  • And much more!

This is a project in active development. Stay tuned for upcoming releases.



Project Structure

This repository is organized into the following directories:

  • sinergym/: Contains the source code for Sinergym, including the environment, modeling, simulator, and tools such as wrappers and reward functions.
  • docs/: Online documentation generated with Sphinx and using Restructured Text (RST).
  • examples/: Jupyter notebooks illustrating use cases with Sinergym.
  • tests/: Unit tests for Sinergym to ensure stability.
  • scripts/: Scripts for various tasks such as agent training and performance checks, allowing configuration using JSON format.

Available Environments

For a complete and up-to-date list of available environments, please refer to our documentation.

Installation

Please visit INSTALL.md for detailed installation instructions.

Usage example

If you used our Dockerfile during installation, you should have the try_env.py file in your workspace as soon as you enter in. In case you have installed everything on your local machine directly, place it inside our cloned repository. In any case, we start from the point that you have at your disposal a terminal with the appropriate python version and Sinergym running correctly.

Sinergym uses the standard Gymnasium API. So a basic loop should look like:

import gymnasium as gym
import sinergym
# Create the environment
env = gym.make('Eplus-datacenter-mixed-continuous-stochastic-v1')
# Initialize the episode
obs, info = env.reset()
truncated = terminated = False
R = 0.0
while not (terminated or truncated):
    a = env.action_space.sample() # random action selection
    obs, reward, terminated, truncated, info = env.step(a) # get new observation and reward
    R += reward
print('Total reward for the episode: %.4f' % R)
env.close()

A folder will be created in the working directory after creating the environment. It will contain the Sinergym outputs produced during the simulation.

For more examples and details, please visit our usage examples documentation section.

Google Cloud Platform support

For more information about this functionality, please, visit our documentation here.

Projects using Sinergym

The following are some of the projects benefiting from the advantages of Sinergym:

📝 If you want to appear in this list, do not hesitate to send us a PR and include the following badge in your repository:

Repo Activity

Alt

Citing Sinergym

If you use Sinergym in your work, please cite our paper:

@inproceedings{2021sinergym,
    title={Sinergym: A Building Simulation and Control Framework for Training Reinforcement Learning Agents}, 
    author={Jiménez-Raboso, Javier and Campoy-Nieves, Alejandro and Manjavacas-Lucas, Antonio and Gómez-Romero, Juan and Molina-Solana, Miguel},
    year={2021},
    isbn = {9781450391146},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3486611.3488729},
    doi = {10.1145/3486611.3488729},
    booktitle = {Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation},
    pages = {319–323},
    numpages = {5},
}

sinergym's People

Contributors

actions-user avatar ahmed2bp avatar alejandrocn7 avatar biemann avatar jajimer avatar manjavacas avatar melon-pieldesapo avatar miguems avatar mmdecastro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sinergym's Issues

Create environment logger

This logger can record observation values, action values, reward, temp, power, timestep, simulation_time (seconds) and extra information are useful.

On the other hand, It's convenient to create other file where record episode summarized information like mean reward, total simulation timesteps, etc.

Finally, we could record energym terminal logger which already exists in a log.txt file (optional?)

Document the use with vscode

Everything is in place for cloning the repo, open it in vscode and get it running through the "developing inside a container" functionality. However, currently this is not documented anywhere (particularly in the README)

Execution time test

Since we are adding more and more functionalities into energym, I think we should include a test to measure the execution time of the simulation.

Maybe it is enough to launch some environments with random actions and assert that simulation runs in less than N seconds/steps.

Stochasticity

Currently the environment is fully deterministic, since the weather is fixed. A sense of randomness could be included to improve the diversity of the simulations.

  • Change weather from year to year
  • Add noise to the observations and/or actions
  • Include forecasting

Division by zero when resetting the environment

The following non-handled exception occurs when resetting the environment:

Traceback (most recent call last):
  File "./A2C.py", line 85, in <module>
    obs = env.reset()
  File "/usr/local/lib/python3.6/dist-packages/gym/core.py", line 264, in reset
    observation = self.env.reset(**kwargs)
  File "/workspaces/energym/energym/envs/eplus_env.py", line 225, in reset
    self.logger.log_episode(episode=self.simulator._epi_num)
  File "/workspaces/energym/energym/utils/common.py", line 307, in log_episode
    self.comfort_violation_timesteps/self.total_timesteps*100)
ZeroDivisionError: division by zero

I'm using Energym v0.3.0 in Ubuntu. The error occurs in both discrete and continuous environments.

I was able to stop it from occurring by previously activating the logger with: env.env_method('activate_logger'), but this is an exception that should be handled.

Cheers!

[Feature] New buildings to be included

Feature 🚀

Add new buildings to be controlled for creating a more extensive benchmarking environment.

Motivation

Currently, two buildings are included in Sinergym: a 5Zone office building and a data center. We want to include other buildings in order to include new and diverse environments.

Ideally, these new buildings would be of different types (i.e. hospitals, restaurants, warehouses, etc.), and with different things to control (apart from HVAC system).

Solution

In order to include the building, the IDF file should be added into the sinergym/data/buildings folder. We should understand the main components of it, modify it to accept control signals from an external interface and define the observation and action spaces.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Simulation periods and termination conditions

  • Extend the simulation period to one or several years
  • Add the functionality of changing dynamically the simulation period or starting dates, for example from episode to episode.
  • Currently termination condition only occurs at the end of the simulation. Other termination conditions (game over) could be added.

Action and observation spaces

Action spaces

  • At least one environment for discrete, continuous and multi-discrete action spaces.
  • Add the restriction of maximum number of actions to perform per time, for example each hour.

Observation spaces

  • Add new variables into the state
  • Include time and day of the simulation
  • Add wrappers for manipulating the observations: normalization, stacking last N observations, etc.

dynamic actions and observations spaces

Now we have several environments in our project. Thus, I think it might be interesting to define the observations and actions spaces separately from the code (in XML for example) and specify it in env constructor.

Currently, this specification is static within code as I comment.

Callback error when using DQN, DDPG and SAC

When executing DQN, DDPG and SAC in any environment, the following error appears just before training:

Traceback (most recent call last):
  File "./A2C.py", line 90, in <module>
    model.learn(total_timesteps=timesteps, callback=callback)
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/ddpg/ddpg.py", line 131, in learn
    reset_num_timesteps=reset_num_timesteps,
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/td3/td3.py", line 204, in learn
    reset_num_timesteps=reset_num_timesteps,
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 273, in learn
    log_interval=log_interval,
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/off_policy_algorithm.py", line 481, in collect_rollouts
    if callback.on_step() is False:
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/callbacks.py", line 192, in _on_step
    continue_training = callback.on_step() and continue_training
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/workspaces/energym/energym/utils/callbacks.py", line 52, in _on_step
    action = self.locals['actions'][-1]
KeyError: 'actions'

There seems to be an error when retrieving self.locals['actions'].

I'm using Energym v0.3.0 in Ubuntu.

Cheers!

Reward function specification in env constructor

Now we have only a reward function implemented. However, we will have more functions in the future. Thus, I think it will be useful to determine reward function as a parameter of environment constructor. We can use SimpleReward class In __init__.py, and this could be changed in gym.make if we want, overwriting default function.

Document tests

As done in the description of PR #21 , let's have information about the tests (what they are doing, how they are performed, how to add new tests, etc.) in a README within the test folder. It can also be referenced from the main README.

LoggerWrapper dimension error

I'm executing the following script with Energym v.1.0.0 in order to test SAC in continuous environments:

#!/usr/bin/python3

import gym
import energym
import argparse
import uuid
import mlflow

import numpy as np

from energym.utils.callbacks import LoggerCallback, LoggerEvalCallback
from energym.utils.wrappers import NormalizeObservation, LoggerWrapper

from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import EvalCallback, BaseCallback, CallbackList
from stable_baselines3.common.vec_env import DummyVecEnv


parser = argparse.ArgumentParser()
parser.add_argument('--environment', '-env', type=str, default=None)
parser.add_argument('--episodes', '-ep', type=int, default=1)

parser.add_argument('--learning_rate', '-lr', type=float, default=0.0003)
parser.add_argument('--buffer_size', '-bf', type=int, default=1000000)
parser.add_argument('--learning_starts', '-ls', type=int, default=100)
parser.add_argument('--batch_size', '-bs', type=int, default=256)
parser.add_argument('--tau', '-t', type=float, default=.005)
parser.add_argument('--gamma', '-g', type=float, default=.99)
parser.add_argument('--train_freq', '-tf', type=int, default=1)
parser.add_argument('--gradient_steps', '-gs', type=int, default=1)
parser.add_argument('--target_update_interval', '-tu', type=int, default=1)
args = parser.parse_args()

# experiment ID
environment = args.environment
n_episodes = args.episodes
name = 'SAC-' + environment + '-' + str(n_episodes) + '-episodes'

with mlflow.start_run(run_name=name):

    mlflow.log_param('env', environment)
    mlflow.log_param('episodes', n_episodes)  
    mlflow.log_param('learning_rate', args.learning_rate)
    mlflow.log_param('buffer_size', args.buffer_size)
    mlflow.log_param('learning_starts', args.learning_starts)
    mlflow.log_param('batch_size', args.batch_size)
    mlflow.log_param('tau', args.tau)
    mlflow.log_param('gamma', args.gamma)
    mlflow.log_param('train_freq', args.train_freq)
    mlflow.log_param('gradient_steps', args.gradient_steps)
    mlflow.log_param('target_update_interval', args.target_update_interval)
    env = gym.make(environment)
    env = NormalizeObservation(LoggerWrapper(env))

    #### TRAINING ####

    # Build model
    # model = SAC('MlpPolicy', env, verbose=1,
    #             learning_rate=args.learning_rate,
    #             buffer_size=args.buffer_size,
    #             learning_starts=args.learning_starts,
    #             batch_size=args.batch_size,
    #             tau=args.tau,
    #             gamma=args.gamma,
    #             train_freq=args.train_freq,
    #             gradient_steps=args.gradient_steps,
    #             target_update_interval=args.target_update_interval,
    #             tensorboard_log='./tensorboard_log/' + name)

    # n_timesteps_episode = env.simulator._eplus_one_epi_len / \
    #     env.simulator._eplus_run_stepsize
    # timesteps = n_episodes * n_timesteps_episode + 501

    # env = DummyVecEnv([lambda: env])

    # # Callbacks
    # freq = 5  # evaluate every N episodes
    # eval_callback = LoggerEvalCallback(env, best_model_save_path='./best_models/' + name + '/',
    #                                    log_path='./best_models/' + name + '/', eval_freq=n_timesteps_episode * freq,
    #                                    deterministic=True, render=False, n_eval_episodes=2)
    # log_callback = LoggerCallback()
    # callback = CallbackList([log_callback, eval_callback])

    # # Training
    # model.learn(total_timesteps=timesteps, callback=callback)
    # model.save(name)

    #### LOAD MODEL ####

    model = SAC.load('best_models/' + name + '/best_model.zip')

    for i in range(n_episodes - 1):
        obs = env.reset()
        rewards = []
        done = False
        current_month = 0
        while not done:
            a, _ = model.predict(obs)
            obs, reward, done, info = env.step(a)
            rewards.append(reward)
            if info['month'] != current_month:
                current_month = info['month']
                print(info['month'], sum(rewards))
        print('Episode ', i, 'Mean reward: ', np.mean(rewards), 'Cumulative reward: ', sum(rewards))
    env.close()

    mlflow.log_metric('mean_reward', np.mean(rewards))
    mlflow.log_metric('cumulative_reward', sum(rewards))

    mlflow.end_run()

When executing, the following error appears:

ValueError: Error: Unexpected observation shape (16,) for Box environment, please use (19,) or (n_env, 19) for the observation shape.

I suppose the reason for this error is that the dimensions used in training and execution differ.

This only happens when using LoggerWrapper. Deepening into the code of this wrapper I found the following line:

# We added some extra values (month,day,hour) manually in env, so we need to delete them.
obs = obs[:-3]

Commenting this line avoids the error. In fact, I don't know the reason to delete month, day and hour from the observation.

Cheers 😃

PD. I attach the models in order to replicate
best_models.zip

Develop a specific Dockerfile to google cloud

It would be convenient to develop a dockerfile very similar to the current one. However, this new dockerfile wouldn't have tests and other files which aren't necessary to make experiments in Google Cloud

Add custom reward weights for energy consumption and confort

I propose to include the customization of weights for energy consumption and confort in the environment constructor.

This may be the code:

class EplusEnv(gym.Env):
    """
    Environment with EnergyPlus simulator.
    """

    metadata = {'render.modes': ['human']}

    def __init__(
        self,
        idf_file,
        weather_file,
        variables_file,
        spaces_file,
        env_name='eplus-env-v1',
        discrete_actions=True,
        weather_variability=None,
        energy_weight=0.5
    ):

...

    # Reward class
    self.cls_reward = SimpleReward(energy_weight=self.energy_weight)
    

This energy_weightcan be set both in the register of the environment or in the agents' scripts with:

env = gym.make(environment, energy_weight=ew)

Cheers

Reward with several zones

Reward is calculated using only temperature and power in reward function. Instead of using a value, it should use a list (temperatures) in order to be able to manage several zones.

Backend

Currently the backend is taken from Zhang's repository.

This issue is to consider all the improvements and functionalities related with backend or communication with simulation engines. Some of them could be:

  • Refactor of the code
  • Add new simulation engines
  • ...

Being able to decide each action value separately in discrete environments

Action it's a unique value in discrete environments. We use that value like index in an action mapping and decide the tuple which define the discrete values that compose the action. Could we use these discrete values separately and not in a predefined tuple? Maybe define an action mapping for each discrete value (all of them with the same length)?

New exponential reward function

Current reward function penalize comfort as the distance between temperature and a valid range. This part of the function is linear. For example, if the distance is 4, the penalty will be double than the distance is 2. In order to penalize larger distances, it would be interesting to propose an exponential function instead of a linear one.

Documentation

Documentation of the package in an Sphinx or similar format.

Include doc-strings, typing hints and any other self-documented code option.

Weather files

Include several weather files, so different combinations of building and climate can be simulated.

The main sources for these files are:

  • DOE's 19 climate zones, most of them in USA (Link)
  • Climate.OneBuilding files, from most of the world (Link).

When integrating these files, the design day must be also taken into account.

TO DOs:

  • Design days have to be included manually. Find a way to do it automatically
  • Modify weather from one year to another
  • Include more types of weather files, and also more locations (not USA)

Create DRL Logger

This logger would have information specifically about DRL algorithms processes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.