osemosys / osemosys_pulp Goto Github PK

View Code? Open in Web Editor NEW

14.0 6.0 5.0 943 KB

OSeMOSYS_PuLP: A Stochastic Modelling Framework for Long-Term Energy Systems Modeling

License: Apache License 2.0

Python 100.00%

energy-system techno-economic optimization stochastic osemosys

osemosys_pulp's Introduction

OSeMOSYS-PuLP

OSeMOSYS-PuLP: A Stochastic Modeling Framework for Long-Term Energy Systems Modeling

Description

OSeMOSYS-PuLP is a methodological framework for an empirical deterministic–stochastic modeling approach to utilize real-world datasets in long-term energy systems modeling. An application example is provided (UTOPIA BASE dataset), in which the initial application example of OSeMOSYS is used and modified to include real-world operation data from a public bus transport system. More information are provided in the scientific article in section Citation.

OSeMOSYS-PuLP

This is the educational version of OSeMOSYS-PuLP

OSeMOSYS-PuLP-HP

This is the high performance (HP) version of OSeMOSYS-PuLP. Additional performance improvements will be developed for this code over time.

Citation

Dennis Dreier, Mark Howells, OSeMOSYS-PuLP: A Stochastic Modeling Framework for Long-Term Energy Systems Modeling. Energies 2019, 12, 1382, https://doi.org/10.3390/en12071382

Getting started

Clone this repository and change the directory:

git clone https://github.com/OSeMOSYS/OSeMOSYS_PuLP.git
cd OSeMOSYS_PuLP

OSeMOSYS-PuLP and OSeMOSYS-PuLP-HP are being continuously further developed. Please always check whether you code is up-to-date by using:

git status origin

Update to latest version:

git fetch origin
git merge origin

osemosys_pulp's People

Contributors

Stargazers

Watchers

Forkers

owenhuxley shravankumar23 sanaenit vthaore edoabraham

osemosys_pulp's Issues

Index dict by tuple, rather than string

Remove the ci() function, which is used to create a string of index names, and instead index the dict of parameters using a tuple of indexes as the dictionary key.

You could then loop over the sorted set of keys for each parameter.

Cannot find input data due to Windows specific file-paths

All file paths should be created using os.path.join() which then adapts the file path for whichever operating system is being used.

COuld make more readable with something like:

          COuld make more readable with something like:

index = combined_index(rsy[0:2]) # extract region and storage from index
model += DiscountedCapitalInvestmentStorage.get(ci(rsy)) == \
CapitalInvestmentStorage.get(ci(rsy)) \
* (1/ ((1+DiscountRateSto.get(index), \
* dflt.get('DiscountRateSto')))**(int(rsy[2]) - int(min(YEAR))))), ""

Originally posted by @willu47 in #14 (comment)

Separate random data generation from random data retrieval

I'm interested in using the PuLP version of OSeMOSYS to run a global sensitivity analysis. This involves running a pre-prepared sample through the model.

I believe that this could be accommodated if the calls to generateRandomData were replaced within a look-up function, which retrieves the sample of random data for the current iteration (i) for the particular parameter.

The random sample could then be generated via a number of different approaches, and thus enable different analysis approaches of the results.

Suggests to simplify code, performance enhancement, interoperability etc.

I've been looking into using OSeMOSYS PuLP to facilitate a project which needs some Monte Carlo runs. I've met a roadblock when trying to load data from another model programmatically. It's been quite difficult to understand the internal data structure currently used, but once I figured it out, it has proved difficult to modify cleanly.

Recommend using a dict of pandas DataFrames for parameters and sets

Currently, the data is read into a giant ragged pandas DataFrame which contains a lot of empty columns, and different parameters are listed on rows. This is not memory efficient, as you're reserving 13 columns of memory for each row.

I would recommend using a dictionary of dataframes, where the key is the parameter or set name, and the value the pandas DataFrame. Use long format for the dataframe (as you are currently using) e.g. for AccumulatedAnnualDemand you would have four columns: REGION, FUEL, YEAR and VALUE.

This would also match the format used by otoole which provides a number of handy import functions to convert between GNU MathProg models, and would make it very easy to begin using PuLP from other implementations.

Code structure

The createParameter, createTuple and createVariable helper functions exist primarily to deal with the storage of all the model elements in this giant DataFrame. These functions also obscure the otherwise straightforward syntax offered by PuLP.

The SETS and PARAMETERS AND DATA section are essentially more data munging - getting your data in shape before passinging it into the model. This could all be placed into a load_data function.

I'm a bit suspicious of PERMUTATION OF SETS because it looks like it creates the dense cross-product of all the sets, which means that you're then generating a dense matrix of constraints later on. If so, this is going to be extremely memory inefficient, and there will be many constraints that are not being used. The solver will then need to remove these before going onto solve the model - hurting performance and ability of PuLP to scale to larger models.

Model construction

In this section, you create a big dict containing all the specifications of the variables, and then create the variables using the createVariable function. Why not just call the PuLP function pulp.LpVariable directly? Do you need to pass in the cross-product of the sets to the variable creation function?

For the constraint generation, is it necessary to iterate over the cross-product of the indexes? In many cases, you only need to generate constraints for the indices for which you have data.

Also, it might simplify things to create functions for each constraint returning a pulp.LPConstraint type. This would also allow you to document what's happening in each constraint.

I also recommend wrapping the entire model in a create_model function which returns the model, and takes a package of data (e.g. the dict of DataFrames) as an argument. This would allow the model to be placed in a separate file, and make a clearer distinction between all the data management, running of the model, and model formulation itself.

Should be `max(int(TIMESLICE))`

          Should be `max(int(TIMESLICE))`

Originally posted by @willu47 in #14 (comment)

Don't hard code file names into the code - use a command-line argument instead and provide access via an import statement and function call.

          Don't hard code file names into the code - use a command-line argument instead and provide access via an import statement and function call.

Originally posted by @willu47 in #14 (comment)

Capacity Factor dictionary

Hello,

When I tested OSeMOSYS-PuLP using UTOPIA_BASE.xlsx, the CapacityFactor dictionary was loaded in Python correctly; that is, as described in the Excel file.

Nonetheless, when I started working on a personal file, the code seemed to only read the values for one of the timeslices and assign them to all the remaining ones. This happened under the condition that I was only defining the capacity factor for one (1) technology. Therefore, I looked into the original code (enumrated by lines):

CapacityFactor

CapacityFactor_default_value = p_default_df[p_default_df['PARAM'] == "CapacityFactor"].VALUE.iat[0]

CapacityFactor_specified = tuple([(str(r), str(t), str(l), str(y)) 
    for r, t, l, y in zip(p_df[p_df['PARAM'] == "CapacityFactor"].REGION, 
                                 p_df[p_df['PARAM'] == "CapacityFactor"].TECHNOLOGY, 
                                 p_df[p_df['PARAM'] == "CapacityFactor"].TIMESLICE, 
                                 p_df[p_df['PARAM'] == "CapacityFactor"].YEAR)])

CapacityFactor = {str(r): {str(t): {str(l): {str(y): p_df[(p_df['PARAM'] == "CapacityFactor") & (p_df['REGION'] == r) & (p_df['TECHNOLOGY'] == t) & (p_df['YEAR'] == y)].VALUE.iat[0] if (str(r),str(t),str(l),str(y)) in CapacityFactor_specified else CapacityFactor_default_value for y in YEAR} for l in TIMESLICE} for t in TECHNOLOGY} for r in REGION}

I saw that in the third line, the CapacityFactor is not using the condition for TIMESLICE. So I changed it to:

CapacityFactor = {str(r): {str(t): {str(l): {str(y): p_df[(p_df['PARAM'] == "CapacityFactor") &
 (p_df['REGION'] == r) &
 (p_df['TECHNOLOGY'] == t) &
 (p_df['YEAR'] == y) &
 (p_df['TIMESLICE'] == l)].VALUE.iat[0] 
if (str(r),str(t),str(l),str(y)) in CapacityFactor_specified 
else CapacityFactor_default_value for y in YEAR} 
for l in TIMESLICE} for t in TECHNOLOGY} for r in REGION}

See that I added: & (p_df['TIMESLICE'] == l)

And it then it read it properly. Although this could also be some sort of incompatibility I have been having with Excel.

Please let me know whether adding that is actually necessary.

Best,

Luiscarlos

Cannot run model using data

When trying to run OSeMOSYS PuLP for the first time, I get the following error:

python OSeMOSYS_PuLP.py 
INFO:root:2020-10-16 14:07:00	Script started.
Traceback (most recent call last):
  File "/Users/wusher/miniconda3/envs/otoole/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2891, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'REGION'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "OSeMOSYS_PuLP.py", line 427, in <module>
    sets_df, p_df, p_default_df, mcs_df, mcs_num = loadData(inputPath, sheetSets, sheetParams, sheetParamsDefault, sheetMcs, sheetMcsNum)
  File "OSeMOSYS_PuLP.py", line 90, in loadData
    sets_df['REGION'] = sets_df['REGION'].astype(str)
  File "/Users/wusher/miniconda3/envs/otoole/lib/python3.7/site-packages/pandas/core/frame.py", line 2902, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/wusher/miniconda3/envs/otoole/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2893, in get_loc
    raise KeyError(key) from err
KeyError: 'REGION'

I'm using Python 3.7, pandas 1.1.2 and PuLP 2.3 on OSX.

Looking at the UTOPIA_BASE.xlsx provided, the format of the SET worksheet doesn't seem to match the code. Perhaps the data format needs to be updated?

Add requirements.txt file

Please include a requirements.txt file in the root of the repository which contains the Python dependencies required to run OSeMOSYS-PuLP e.g.

numpy
pandas
pulp
xlrd

Add OSeMOSYS PULP Short

Add the short version of the OSeMOSYS PULP implementation to the GitHub repository.

osemosys / osemosys_pulp Goto Github PK

osemosys_pulp's Introduction

OSeMOSYS-PuLP

Description

OSeMOSYS-PuLP

OSeMOSYS-PuLP-HP

Citation

Getting started

osemosys_pulp's People

Contributors

Stargazers

Watchers

Forkers

osemosys_pulp's Issues

Recommend using a dict of pandas DataFrames for parameters and sets

Code structure

Model construction

CapacityFactor

Recommend Projects

Recommend Topics

Recommend Org