spine-tools / spine-database-api Goto Github PK

View Code? Open in Web Editor NEW

6.0 9.0 5.0 3.64 MB

Database interface to Spine generic data model

Home Page: https://www.tools-for-energy-system-modelling.org/

License: GNU Lesser General Public License v3.0

Python 99.96% Mako 0.03% Batchfile 0.01%

spine-database-api's Introduction

Spine Database API

A Python package to access and manipulate Spine databases in a customary, unified way.

License

Spine Database API is released under the GNU Lesser General Public License (LGPL) license. All accompanying documentation and manual are released under the Creative Commons BY-SA 4.0 license.

Getting started

Installation

To install the package run:

$ pip install spinedb_api

To upgrade to the most recent version, run:

$ pip install --upgrade spinedb_api

You can also specify a branch, or a tag, for instance:

$ pip install spinedb_api==0.12.1

To install the latest development version use the Git repository url:

$ pip install --upgrade git+https://github.com/spine-tools/Spine-Database-API.git

Building the documentation

Source files for the documentation can be found in docs/source directory. In order to build the HTML docs, you need to install the developer dependencies by running:

$ pip install -r dev-requirements.txt

This installs Sphinx (among other things), which is required in building the documentation. When Sphinx is installed, you can build the HTML pages from the source files by running:

> bin\build_doc.bat

$ bin/build_doc.py

depending on your operating system.

After running the build, the index page can be found in docs/build/html/index.html.

	This project has received funding from European Climate, Infrastructure and Environment Executive Agency under the European Union’s HORIZON Research and Innovation Actions under grant agreement N°101095998.
	This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 774629.

spine-database-api's People

Contributors

Stargazers

Watchers

Forkers

mhdella suvayu jamesongicl sjvrijn pekkasavolainen

spine-database-api's Issues

Manage ids across different mappings connected to the same database

In GitLab by @manuelma on Sep 29, 2018, 15:26

There needs to be a way for a DiffDatabaseMapping to reserve ids, so that other processes don't steal them from it. An idea would be to create a table next_id that the DiffDatabaseMapping will keep updated. If the other process finds this table, it'll also use it and update it.

Handle the 'maximum number of sql variables in an expression is 1000' error

In GitLab by @manuelma on Nov 28, 2018, 17:14

This happens when committing large changes.

Unique names enforced only within object classes

In GitLab by @jkiviluo on Mar 12, 2019, 16:30

Currently Spine Data store enforces unique names across all objects. As discussed in TIMES/GAMS integration issue (https://gitlab.vtt.fi/spine/toolbox/issues/274), GAMS can have same names between different sets (can be interpreted as object classes in Data Store). There shouldn't be any big issues if we allow objects with same names when they are members of different object classes.

Import error with SpineData.jl

In GitLab by @steffenkaminski on Jul 30, 2018, 12:35

after installing SpineData regarding the installation instructions I get following error using julia REPL in atom:

julia> using SpineData
INFO: Precompiling module SpineData.
WARNING: Module JSON with uuid 1889239032482 is missing from the cache.
This may mean module JSON does not support precompilation but is imported by a module that does.
ERROR: LoadError: Declaring precompile(false) is not allowed in files that are being precompiled
.
Stacktrace:
[1] _require(::Symbol) at .\loading.jl:455
[2] require(::Symbol) at .\loading.jl:405
[3] include_from_node1(::String) at .\loading.jl:576
[4] include(::String) at .\sysimg.jl:14
[5] anonymous at .<missing>:2
while loading C:\Users\u0122947.julia\v0.6\SpineData\src\SpineData.jl, in expression starting on li
ne 13
ERROR: Failed to precompile SpineData to C:\Users\u0122947.julia\lib\v0.6\SpineData.ji.
Stacktrace:
[1] compilecache(::String) at .\loading.jl:710
[2] _require(::Symbol) at .\loading.jl:497
[3] require(::Symbol) at .\loading.jl:405
[4] macro expansion at C:\Users\u0122947.julia\v0.6\Atom\src\repl.jl:118 [inlined]
[5] anonymous at .<missing>:?

Node demand is better as an object propery of node rather than a relationship propery

In GitLab by @DillonJ on Jun 21, 2018, 18:48

I think we should define things at the lowest level we can as this will ultimately reduce the maximum number of relationship nesting levels. Also I think it is preferable to define data parameters against an object when this is possible as this is more intuitive.

In this specific case, we know the commodity associated with the node and we have specified that one and only one commodity can be associated with a node, so we therefore know what node and what commodity the demand relates to if we define the parameter against the node rather than the node_commodity relationship.

I believe this is more intuitive and may make the code required to reference the demand on the Julia side more compact.

Unique constraint over class/object path for relationships

In GitLab by @manuelma on Oct 1, 2018, 12:10

Logically, there shouldn't be two relationships with the exact same object path in the same relationship class.

We need to enforce this constraint at app level.

Database integrity

In GitLab by @PvPer on Aug 27, 2018, 13:48

When work with the excel import there are some things that are not enforced by the database schema or the api.

Adding relationships:
- There is no enforcing the object class order of the objects. As long as you insert valid objects the relationship will be inserted. We should enforce that the objects in a new relationship has the object class and dimension same as in the relationship class
- You can insert duplicate relationships. we should check that the object path is unique for each relationship class
Adding parameters values.
- When adding a parameter value, you only have to have a valid object id or relationship id, it does not check if object or relationship is the right class for that parameter.

Some of these could be api level or some could be triggers in database.

Enable several DiffDatabaseMappings working on the same database

In GitLab by @manuelma on Sep 28, 2018, 21:58

DiffDatabaseMapping works but diff_ tables should be called something specific for the current session.

In this way several DiffDatabaseMappings could work on the same database without overlap.

Problems with PyCall when using user defined python environment

In GitLab by @mihlema on Mar 20, 2019, 14:28

I followed the instruction for using a specific python environment and build PyCall. When I try to run e.g. a testsystem in Julia, I get an error:

ERROR: LoadError: PyError ($(Expr(:escape, :(ccall(#= C:\Users\u0122387\.julia\packages\PyCall\RQjD7\src\pyfncall.jl:44 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'ImportError'>
ImportError('DLL load failed: The specified module could not be found.')

Seems like it doesn't recognize the spinedata_base_api or so. If I use my base environment I don't get an error. Has anyone encountered similar problems?

Archetype allowed parameters

In GitLab by @DillonJ on Sep 4, 2018, 10:34

The implementation of archetypes will require some meta/macro level stuff and I'm wondering how we can best do this

For example, one required functionality is that a unit_archetype would specify a list of allowed unit level parameters that you can define for unit objects that are linked to that archetype.

So let's say I want to specify that members of unit_archtype = "CHP_UNIT" must have a parameter called "ratio_op1_op2". How can we do it?

@poncelet @juha_k @manuelma Thoughts?

Method to add parameter values is slow

In GitLab by @manuelma on Dec 12, 2018, 13:26

It looks like integrity checks implementation is sub-optimal, we will try and improve.

More checks on insert/update

In GitLab by @manuelma on Dec 14, 2018, 21:14

We have implemented basic and not so basic integrity checks in #4 and #17

This issue is a placeholder for more of these tests we can come up with:

Check that input is valid JSON
Spine JSON schema (parameter_value)
Unique names across tables
Forbidden names (such as 'time' for an object class, since time is the name of a function in julia's Base)

Get rid of unused fields

In GitLab by @manuelma on Mar 1, 2019, 16:52

From https://gitlab.vtt.fi/spine/data/issues/38#note_5846

we're gonna get rid of some unused fields in a couple of tables.

object id list from Julia require strings

Hey all,

I try to use the API via Julia and PyCall. When using the object_id_list for e.g. creating a relationship class I get a strange behavior which leads to some inconvenience.

using PyCall

@pyimport spinedatabase_api as sa

db_file = joinpath(@__DIR__, "database.sqlite")
if isfile(db_file)
    rm(db_file)
end

db_url = "sqlite:///" * db_file
sa.create_new_spine_database(db_url)
db_map  = sa.DatabaseMapping(db_url)

###########################################################
# Fill data with own data model (ObjectClass, RelationshipClass, ParameterClass)
###########################################################
# Object Class
(agents_id, ) = db_map[:get_or_add_object_class](name="Agents", description="A representation of an model agent")

(technologies_id,) = db_map[:get_or_add_object_class](name="Technologies", description="A representation of technologies with economic and technical characteristics")


# RelationshipClass
# WORKING, but inconvenient (id as string)
db_map[:get_or_add_wide_relationship_class](name="Agents__Technolgies", object_class_id_list=["$agents_id","$technologies_id"])

# NOT WORKING PROPERLY (id as int)
# db_map[:get_or_add_wide_relationship_class](name="Agents__Technolgies", object_class_id_list=[agents_id,technologies_id])

db_map[:commit_session]("Done initializing data model!")

The issue arises because the ids, when submitted as int-values, are stored as BLOB in the sqlite database. I expect that there is somehow a type cast that goes wrong.

Greetings,
Hanspeter

Choosing constraints: parameter vs. relationship

In GitLab by @jkiviluo on Sep 14, 2018, 11:23

There's been a bit pro-longed discussion over Powerpoint slides, e-mail and telco between @poncelet, @DillonJ and me that has included this problem. There's also been some references to it in other issues like https://gitlab.vtt.fi/spine/model/issues/59#note_2700. We are also preparing another issue on the archetype (so don't discuss that here).

There are at least two major ways to express when a constraint would be used for a particular object in Spine Model.

The specification of parameter values directly trigger Model behavior (e.g., a ramp-rate constraint is generated in Spine model for those units for which the parameter “ramp_rate” would be specified).
There is a separate switch to determine when a certain constraint is to be generated. This switch could be a separate parameter (e.g., use_ramp_constraint), but I (Juha) would prefer to use relationships as a switch.

Let's go over both approaches. I'm trying to keep it simple here (i.e. it can get more complicated). Also, this is just a partial example (the full details don't add much information).

Approach where the specification of the ’common’ parameters are also used as a trigger to generate constraints

     unit_A.max_upward_primary_reserve = 0.05  
     node_Leuven.constant_upward_primary_reserve = 100

Here the existence of a <> Null/NaN parameter value for the upward primary reserve would mean that the unit will participate in the upward primary reserve. Similarly, the upward primary reserve constraint in the node gets established by the existence of the (constant) value for the parameter.

Advantages of this approach:

Intuitive to use
Similar to what most modelers are currently used to
The numeric data directly implies what will happen in the model
Fast to input

Disadvantages of this approach:

Need to be able to distinguish between parameter value existence and 0 values (in JSON this should be possible, but probably not directly with the value column, which is a float)
If one wants to make a change to the model (e.g., do not use ramp constraints), the data would need to be changed. This could be done using the alternative/scenario construct, thereby leaving the base database intact. For the example you might have the above values for the 'base' scenario and then have a 'no reserves' scenario with alternative parameter values for both of them where

     unit_A.max_upward_primary_reserve = 0  
     node_Leuven.constant_upward_primary_reserve = 0

Thus, you would need to have a scenario 'no reserves' that is built by combining the 'base' scenario with the 'no reserves' alternative (resulting in the displacement of original values with the zeros). Alternatively, the archetype could contain a field where a list of parameters which will be ignored could be stored

Sometimes, a parameter can induce/appear in multiple equations. For instance, a ramp rate could restrict the ramping between sequential time steps, but could also directly restrict the provision of reserves in a separate equation. Getting control over one or the other separately (e.g., only keep the ramp constraint related to reserve provision) would not be possible, or separate parameters would be needed for that, e.g., ramp_rate_between_time_steps and ramp_rate_reserves_within_time_steps.
What happens when there is both time series and constant values for e.g. upward_primary_reserve? How to decide which one is used? A separate parameter flag? What if the flag points to a constant and only time series values are provided? Or should they be in separate alternatives (i.e. it is not allowed to have both constant and time series on single record (row))?
Users need to be aware of what behavior different parameters trigger (especially when one parameter can cause multiple behaviors, e.g. min. load could be handled with a linear approximation or with MIP)

Approach where relationships are used as a trigger to generate constraints

    unit_A.max_upward_primary_reserve = 0.05  
    node_Leuven.constant_upward_primary_reserve = 100
    node_Leuven__require_upward_primary_reserve
    unit_A__provide_upward_primary_reserve

require_upward_primary_reserve is what I call a method (relates to the archetype - feature - method chain, but let's not go there here). Here the existence of the node_Leuven__require_upward_primary_reserve causes the upward reserve constraint to be activated for node_Leuven.

Advantages of this approach:

Separation between numeric data and function. This allows easy re-use of databases containing all possible data (e.g. you might want to use the same data in different ways for the investment model and the UCED model that you run iteratively).
upward_primary_reserve can be switched off without using alternatives and scenarios. Just delete the node__upward_primary_reserves relationship and the numeric data remains intact.
To use time series instead of constant could be achieved by having separate methods for them (or possibly with a sub-method, but that might be overly complex) and there is no need for additional code to distinguish situations where both of them have been provided.
Use of methods provides more flexibility to select which constraints are activated and when (i.e. ramp rate constraint between consecutive time steps and ramp constraints for reserve provision).
Using relationships might increase transparency and error checking. It's easier to show what is happening with the relationship approach (e.g. a graph view on the methods in use) and you don't need to worry whether some parameters still have the necessary values, because ten weeks ago you did a model run where you might have taken something out, but you don't quite remember what... Of course Spine Toolbox should make those kinds of mistakes less likely, but will probably not eliminate them.
Using relationships between units/nodes and methods, and additional relationships between methods and parameters, allows checking that all necessary parameter values have been entered. If your method requires a specific parameter, it's possible to build a check on that and give feedback to the user. However, if you're using parameters to define function, this could be done using an archetype which has a field containing a list of obligatory parameters (with the difference that the parameter would be linked to the archetype, not the method - this goes again to the question what is an archetype and is not handled here).
If you really have alternative values for parameters, then it's fine to use the alternative/scenario construct. That's what it's for.

Disadvantages of this approach:

Requires additional objects (the ‘method’ objects), and relationships (unit__method or node__method)
Modelers might have a preconception that numeric data implies that a constraint will be generated (current practice in models such as TIMES, OSeMOSYS) - (e.g., if a user sees ramp_rate = 10 in the database, they might assume that ramping constraints will be generated). It might just need some time for the new philosophy to sink in.
List of methods can become long and therefore hard to use (but this would be handled by the archetype - feature - method chain that limits what the user sees only to relevant choices - that is not to be discussed in this issue).
Need to connect each unit to methods corresponding to all generic constraints? Or should the unit__method relationship be automatically generated for generic constraints when data is there? Or would the unit__generic_constraint_method be always generated and then parameter data will dictate what happens?
Increased complexity of the input Data Store
Might make it more complex for users to add functionality (need to create method (and feature objects), establish relationships between the method objects and the parameters, etc). Archetype - feature - method chain will make this easier, but not to be discussed here...
Splitting up into features and methods could be difficult in some cases (in principle feature is the phenomena/problem, while method is the concrete way to implement it)

Support for migration

In GitLab by @manuelma on Nov 7, 2018, 13:29

We need to be able to upgrade old databases to a new structure (say if we add new tables or modify existing ones).

https://github.com/zzzeek/alembic

Enable working on the same 'differences' from different sessions

In GitLab by @manuelma on Sep 28, 2018, 22:43

Related to #11

Sometimes it could be desirable for two or more users to work together in one database and see the changes made by each other -sort of what google docs does.

Sphinx documentation and tutorial

In GitLab by @manuelma on Nov 7, 2018, 13:37

improve and complete API documentation
switch from SQLAlchemy style (or whatever it's called) to the Google style we use in toolbox
check that our gh-pages still work.
write the tutorial.

JuMP_relationship_parameter_out sensitiv to order of objects

In GitLab by @mihlema on Mar 28, 2019, 19:37

julia> connection_capacity(connection=:GasPipeline, commodity=:Gas, node=:GasNode1, direction=:in)
0

julia> connection_capacity()
Dict{Tuple{Symbol,Symbol,Vararg{Symbol,N} where N},Any} with 2 entries:
  (:Gas, :GasNode1, :GasPipeline, :in) => 200
  (:Gas, :GasNode2, :GasPipeline, :in) => 0

julia> connection_capacity(commodity=:Gas, node=:GasNode1,connection=:GasPipeline, direction=:in)
200

I think it shouldn't matter in which order the different objects of the relationship are given.

Removing parameters from objects (via toolbox) does not results in deleting them from the database

In GitLab by @steffenkaminski on Aug 8, 2018, 10:23

e.g. "Demand" was created earlier by accident. Even if this parameter has never been used, it still remains in the database.
Same for "TransCapa_a" and "TransCapa_b" which were set before, but their parent object was removed. It would be nice, if those parameter keys (?) would been removed of the database as well.

Make all functions that add items return the same datatype

In GitLab by @manuelma on Sep 28, 2018, 22:00

Right now some of these functions return KeyedTuples, other return instances, and other dicts.

They should all return the same type so it's easy to remember.

Rename spinedatabase_api package to something shorter

In GitLab by @manuelma on Oct 26, 2018, 23:18

I'm getting tired of writing spinedatabase_api, I think we could rename the package to spine_dbapi?

Means updating a couple of READMEs too.

@PekkaSavolainen

Missing close() functionality

Hi,

I try to use the spinedb_api to fill a database via a julia->python. After removing the close() function for the DatabaseMapping I run into problems that my database is locked or blocked. As a result I need to restart julia once I run into errors in order to free the database connection.

Here is a simple use case that shows the problems

using PyCall

sa = pyimport("spinedb_api")
db_file = joinpath(@__DIR__, "test.db")
println(db_file)
db_url = "sqlite:///" * db_file

# Create testing database
sa.create_new_spine_database(db_url)
db_map = sa.DatabaseMapping(db_url, upgrade=true)
db_map.reset_mapping()

# do some tests on the database
(object_class_id, ) = db_map.get_or_add_object_class(name = "TestObjectClass", description = "Description")
db_map.add_object(name = "TestObject", class_id = object_class_id)
db_map.commit_session("Test")


# closing database connection
# db_map.close() REMOVED from package

# Remove database after tests
try
    rm(db_file)
catch err
    @show err
end

Database schema version control

In GitLab by @fabianoP on Dec 4, 2018, 12:43

We propose to add database schema version control such as alembic.

Delete diff tables when session crashes

In GitLab by @manuelma on Nov 7, 2018, 13:44

A possibility is to store the diff tables in a separate in-memory database.

Should we support parameter value enumeration

In GitLab by @DillonJ on Sep 4, 2018, 09:49

Hi folks,

Proposal

While considering the implementation of archetypes I came upon the possible need for parameters that will define, for example, model options or methods for doing something. So, they can typically take a number of predefined values. E.g. we might have boolean parameters or something like efficiency_method which might be "linear", "affine" or "piecewise" for example.

Implementation options:

We could create a new JSON field in parameter definition which can include all allowed values (my preferred option - seems easiest?)
We could create a new table which contains allowed values for each parameter

Question: Do we depend on the string value of something - in my example above, do we have in the code: if(mip_method(arch_id)=="affine" ... or do we assign an integer value to each option (like an enum in C)? Surely this would be easy to do with a JSON field.

A step further would be to convert the string values to constants that we can use in Julia using JumpALLOut... so we could write (for example) if(mip_method(arch_id)==affine ...

What do you think @manuelma
Also mentioning @juha_k and @poncelet as not everyone looks at the data issues :-)

API function for accessing parameter classes and parameter

In GitLab by @steffenkaminski on Jan 18, 2019, 17:51

For the another project, which uses the Spine Toolbox, it would be nice to have an API function which returns all parameter classes for a specific relationship (or object class).

def parameter_class_list(self, class_id=None, ordered=True):
   #here some magic happens
   return class_id_list

In a second step it would also be convenient to have the some for actual parameters for a given parameter class

def parameter_list(self, class_id=None, ordered=True):
   #here some magic happens
   return parameter_list

What do you think, would that be possible to implement?

Support for `parameter_tag` and `parameter_enum` tables

In GitLab by @manuelma on Jan 24, 2019, 16:17

Tasks:

Define the parameter tag thing. So far we need one tag only: feature.
~~[ ] Update create_new_spine_database so these tables are created too. At the same time we should do all the renaming concerning our change from parameter to parameter_definition.~~
When a DatabaseMapping is created using an 'old' database, we should automatically 'migrate' it (i.e., create the new tables and do all the renaming.)
Add support to DiffDatabaseMapping for inserting, updating and removing stuff from the new tables.
Add support for querying the new tables (although the exact shape of these queries should be determined according to the needs of the 'clients', i.e., Spine Toolbox and Spine Model.)

Am I missing something @PvPer @fabianoP ? Is there any task you would like to work on?

id_ vs. id of SpineIntergrityError

Hallo,

there is a problem with the usage of id and id_ for the SpineIntegrityError in diff_db_mapping_add_mixin.py

e.g. line 305:

ids.update(set(x.id_ for x in intgr_error_log if x.id_))

  File "C:\git.vito\ELDEST\PyEldestDataModels\venv\lib\site-packages\spinedb_api\diff_db_mapping_add_mixin.py", line 305, in <genexpr>
    ids.update(set(x.id_ for x in intgr_error_log if x.id_))
AttributeError: 'SpineIntegrityError' object has no attribute 'id_'

Support for json datatype

In GitLab by @manuelma on Nov 7, 2018, 13:20

This comes with sqlite 3.8 or something. We need to figure out if we can use it.

Add support to run unittests

In GitLab by @manuelma on Sep 28, 2018, 22:30

We need a basic set of unittests to check that changes don't break existing functionality. For instance the introduction of the DiffDatabaseMapping class was kinda clumsy, several new versions had to be produced. Since this was the first time we introduce a change this big, it can be justified. But we need to be prepared for the next time.

Implement Time Patterned Data

In GitLab by @DillonJ on Nov 5, 2018, 16:30

We want to be able to support defining generic time patterns and then define parameter values against them using the JSON field.

E.g Define generic time pattern using common time identifiers:
Summer: M5-M8
Winter: M1-M4;M9-M12

Use newly defined time_patterns to define data:
JSON: {Time_Pattern: Summer{400},Time_Pattern: Winter{500} }

Actions:

Need to define how to define time_patterns
Need to define JSON Structure
Develop JumpAllOut to handle time patterned data
Identify impacts on other components, e.g. treeview, tabular view

@manuelma Is there a standard reference we can lean on for standard time format identifiers? Can we borrow something from Julia? E.g. "m" = minute, "M" = month, "h"= hour... etc. etc.

@spine Thoughts?

Temporal description for models/tools for the data store

In GitLab by @jkiviluo on Mar 22, 2019, 13:45

Time series data is stored as it is - typically in arrays with associated information when the time series start, what is its resolution, etc (as defined in https://gitlab.vtt.fi/spine/toolbox/wikis/Spine-Toolbox-JSON-specification). On the other hand there is a model that should be solved in a specific way from the temporal perspective. E.g. a single-go investment problem or a looping scheduling and dispatch problem. Both model types might use the same data. For Spine Model, we have come up with a data structure the model will use (https://gitlab.vtt.fi/spine/model/issues/17). The next step is to define a structure for Spine Data Store that can link the data with the model(s). This is a proposal for that.

First we need a 'model' object_class and model objects, e.g. 'Spine_schedule'. This object will serve other purposes than the temporal structure, but here only the temporal structure is discussed.

Next, we need to tell the time frame the model should solve, in absolute terms, e.g. 1.1.2020 - 31.12.2020. For this we need an object_class 'time_frame'. It will have parameters like start_datetime, time_jump, finish_datetime (different models/tools can have different names for these, so these could be model specific or shared between models through parameter name mapping).

Then, we also need a temporal object that can tell us about the relative temporal structure within the time_frame. In a rolling model, this relative structure defines the shape of the time structure that rolls forward. Let's call the object_class for this purpose with 'time_section'. Let's then say we have scheduling model:

The first time_section would define a period of realization: order=1, resolution=1, steps=6, forecast=[f00], etc.
The next time_section would define a first forecast period: order=2, resolution=1, steps=18, forecast=[f01,f02,f03]
The final time_section would define a tail: order=3, resolution=24, steps=364, forecast=[f02]

Together the 'time_frame' and 'time_section'(s) can be used to flexibly define any kind of temporal structure and associated looping structure.

For an investment model, possibly only one time_section would be sufficient. Although e.g. representative weeks could be sliced with a time_section for each representative week. In this case, an additional parameter would be needed to state the starting point of the time_section within the time_frame.

Finally, the time_sections need to be connected with the entities they apply to. This includes both units and nodes. Together they could be called 'block's. In a simple model all nodes and units are in the same block and they all use the same temporal structure. In a more complicated model, an arbitrary system of time_sections and blocks can be related to each other. This allows to arrive in systems like demonstrated in https://docs.google.com/presentation/d/1mwGX6mm3PwbhJ9OvLlnS5Vm17y1HpPdaiouCrFEGonk (slide 4).

An illustration of the object structure (could have e.g. realization in one hour resolution, Ireland forecasts in one hour and GB forecasts in three hour)

The task of JuMP_All_Out would be to pass the structural data to the model specific layer that creates the right sets. In case of Spine Model, the model specific layer would create t_t_overlap, t_in_t, etc. based on the structure it gets from the Data Store. The 'temporal_block' in Spine Model will then present the combination of time slices and geographic entities (nodes or units). The model specific layer will also need to generate a looping structure (more parameters will be needed for more advanced structures like day-ahead markets).

There are some additional things that need to be considered. E.g. one want to run 2030 scenario with 2017 time series profiles. I think this could happen through the time_frame - it should allow to define the target year and the source year separately. The source year could even be multiple different years depending on the data set, although that would be bad practice in most cases. This might be achieved with a separate object_class 'time_source' that is linked with the 'time_frame'. But this really needs some further thought.

Can't insert relationship with same object twice

In GitLab by @PvPer on Dec 18, 2018, 16:49

When trying to insert a relationship with a one object in mulitple levels, an integriyty error is raised.

For example if you have the relationship class unit__commodity_group__commodity_group and you want to add a relationship between hydropower, water_group, water_group, this should be allowed?

from spinedatabase_api import create_new_spine_database, DiffDatabaseMapping
db = create_new_spine_database('sqlite:///test.sqlite')
db_map = DiffDatabaseMapping('sqlite:///test.sqlite', "test")

obj1 = db_map.add_objects({'name':'test_obj1','class_id':1}).first()
obj2 = db_map.add_objects({'name':'test_obj2','class_id':2}).first()
rel_class = db_map.add_wide_relationship_classes({'name':'test_rel', 'object_class_id_list': [1, 2, 2]}).first()
rel = db_map.add_wide_relationships({'class_id':rel_class.id, 'object_id_list':[obj1.id, obj2.id, obj2.id]})

'Import by name' functionality

In GitLab by @manuelma on Dec 13, 2018, 13:02

We have methods to add stuff to the database that perform some integrity checks and use bulk operations --this is nice.

But for example, adding objects or relationships or parameters requires to know the object class id, whereas the application side usually just knows the object class name. So one has to insert object class first, query back the result, extract the id and then insert objects or parameters or whatever.

We would like that 'figuring out the id'-part to happen spinedatabase-api side.

In principle, some code developed Spine Toolbox side (import from excel feature) can be used as a starting point.

Should error be return when trying to insert duplicate item

In GitLab by @PvPer on Mar 12, 2019, 12:53

So when trying to insert a parameter (object_class_1, parameter_1), if that exact parameter exists in the database, an error is raised. Wouldn't it be better to just return that existing item instead of raising an error. The user wants to add an item anyway.

I guess we could just add get_or_add_* to each of the add_* functions.

For the import functions duplicates should just be skipped.

Allow 1-dimensional relationships

In GitLab by @PvPer on Mar 12, 2019, 11:27

So in the TIMES/GAMS you can use sets(relationships) with only 1 dimension. I don't think there's any real reason why we shouldn't allow 1-dimensional relationships. The name relationship might not fit as good but it could be useful if you want to create named groups and things like that.

Any thoughts?

Parameter table names

In GitLab by @jkiviluo on Oct 26, 2018, 14:32

parameter and parameter_value tables have somewhat confusing names. 'parameter' sounds like it would contain the values too. When we make changes to the DB structure (e.g. with archetype/feature/method) it might be a good chance to also update this.

I would suggest to use parameter_definition and parameter_value as table names.

Don't reinvent the wheel

In GitLab by @manuelma on Oct 2, 2018, 11:03

The DiffDatabaseMapping seems to work fine, however I wonder if there are ready-made libraries that we could use instead of it. If there are, they should be more robust that something we implemented in two weeks, just to solve a specific issue. Maybe @fabianoP and @PvPer have some information?

So what DiffDatabaseMapping does is create new tables, specific for a particular session, to hold all the changes the user makes to the 'original' tables -which remain untouched until the user chooses to commit. When they commit, the content from the 'diff' tables is merged into the 'original' ones. It's as simple as that.

With this setup, there can be multiple processes working on the same database without overlap. For instance, Spine Model can write to a database that's also being modified in Spine Toolbox.

So the question is, should we stick to this simple solution, or use something more powerful that's available from third-parties?

Implement database constraints with DiffDatabaseMapping

In GitLab by @manuelma on Sep 28, 2018, 22:24

DiffDatabaseMapping can check constraints that apply on the 'difference' tables, but not across original and difference tables.

For example if an object class is called unit and lives in the original table, the database won't complain if we add another object class called unit, since it will go to the differences table.

So we need to implement this kind of checks at app level.

Database api slow when inserting large amount of data

In GitLab by @PvPer on Sep 21, 2018, 09:00

So when trying to upload a large amount of data to the database via the api it's a bit slow.

What happens is that we only have functions that inserts one row at a time. So basically all the overhead of slqalchemy is there for each row we insert. There is a bulk insert option available and we could use that.

In the excel import functions most of the functionality is there to validate the bulk data before inserting so we could move some of that code into the api.

Make returned fields from database api more consistent

In GitLab by @PvPer on Oct 25, 2018, 15:30

so when getting parameter values from the api it behaves a bit different for relationship classes and objects:

db_map.parameter_value_list().first()._fields
('id',
 'parameter_id',
 'object_id',
 'relationship_id',
 'index',
 'value',
 'json',
 'expression',
 'time_pattern',
 'time_series_id',
 'stochastic_model_id')

db_map.relationship_parameter_value_list().first()._fields
('id',
 'relationship_class_name',
 'object_name_list',
 'parameter_name',
 'index',
 'value',
 'json',
 'expression',
 'time_pattern',
 'time_series_id',
 'stochastic_model_id')

It would be nice if the relationship function would return ids for relationship, parameter and objects so it behaves the same as the object function

Batch update support

In GitLab by @manuelma on Oct 5, 2018, 14:01

It would be nice to support batch update using bulk sqlalchemy operations, as we do for insertions.

We need to be careful though on what do we allow users to update. For instance updating the class_id field of an object item means that all its parameter values are no longer valid. There are many other examples like this.

So what do we do:

Support this kind of updates and take the corresponding measures to keep integrity, eg, remove all parameter values of the object whose class_id changes.
Support trying this kind of updates but throw an error if it actually happens.
Don't support these updates, that is don't even provide a method to change, eg, the class_id of an object.

Subsets

In GitLab by @jkiviluo on Aug 31, 2018, 11:01

There is a need to define subsets for the Spine Model (and other models). These are used in conditionals to restrict a constraint or a term to certain subgroup of the set members. E.g. unit_online(unit) would pick only units with online variable. The discussion was started in https://gitlab.vtt.fi/spine/model/issues/53#note_2140.

Database should somehow facilitate this. One way is to use unit_groups, node_groups, etc. However they are really meant for the users to apply rules for a larger group of units, etc. at a time. The subset need is more under the surface and would be used by developers who define new methods and associate them with archetypes. It would be positive not to clutter user_groups with functional subsets.

Current options:

Create a relationship_class for subsets 'subset_class' and associate parent_object_class or parent_relationship_class with the subset_class. Then one can create subsets for different object/relationship_classes.
Use group classes (we might ensure there is a group_class for all object_classes).

Improve style overall

In GitLab by @manuelma on Jan 18, 2019, 22:02

We have made our best to have a functional API, and sometimes style has been neglected. If this is going to be open we need to clean up a little bit, keep it as PEP8 as possible. Do we want to have a call to discuss this matter? We can come up with some rules that suit all of us.

Type mismatch from Julia

Hey,

we call the spinedb_api from Julia in the following way:

@show id_list = [Int16(db_map.get_or_add_object_class(name=obj_class)[1]) for obj_class in object_classes]
@show typeof(id_list)
@show relationship_class = db_map.add_wide_relationship_class(name = relationship_name, object_class_id_list=id_list)

Unfortunately, when submitting the id_list to python it is translated in a numpy.int64 rather than an int. This leads to the problem that the relationship is not correctly created, as the object_class_id in the sql-db is stored as BLOB.

Would it be possible to integrate the following explicit typecast:

            for wide_kwargs in wide_kwargs_list:
                for dimension, object_class_id in enumerate(wide_kwargs["object_class_id_list"]):
                    narrow_kwargs = {
                        "id": id,
                        "dimension": dimension,
                        "object_class_id": int(object_class_id),
                        "name": wide_kwargs["name"],
                    }
                    item_list.append(narrow_kwargs)

I guess it may also occur at other places. I will collect and report them below.

Thanks
Hanspeter

Link to file

different functionality accessing parameter value

Hallo,

I was wondering why there is difference in required arguments when accessing a parameter value:
https://github.com/Spine-project/Spine-Database-API/blob/master/spinedatabase_api/database_mapping.py#L261-L276

    def single_object_parameter_value(self, id=None, parameter_id=None, object_id=None):
        """Return object and the parameter value, either corresponding to id,
        or to parameter_id and object_id.
        NOTE: The parameter_id, object_id arguments are used by NetworkMap
        """
        qry = self.object_parameter_value_list()
        if id:
            return qry.filter(self.ParameterValue.id == id)
        if parameter_id and object_id:
            return qry.filter(self.ParameterValue.parameter_id == parameter_id).\
                filter(self.ParameterValue.object_id == object_id)
        return self.empty_list()

    def single_relationship_parameter_value(self, id, parameter_id=None, relationship_id=None):
        """Return relationship and the parameter value corresponding to id."""
        return self.relationship_parameter_value_list().filter(self.ParameterValue.id == id)

Why do I need the parameter_value id for a relationship, but not for an object?

Thanks,
Hanspeter

Separate table for timeseries?

In GitLab by @manuelma on Nov 7, 2018, 13:27

Instead of using the json field for timeseries.

DiffDatabaseMapping.wide_relationship_class_list() doesn't preserve dimension order

In GitLab by @PvPer on Sep 27, 2018, 12:19

when using DiffDatabaseMapping.wide_relationship_class_list the fields 'object_class_id_list', 'object_class_name_list' doesn't follow the dimension order.

db_diff = DiffDatabaseMapping("sqlite:///hydro_data.sqlite", "testuser")
db_map = DatabaseMapping("sqlite:///hydro_data.sqlite")

diff_relationships = db_diff.wide_relationship_class_list([8]).all()
map_relationships = db_map.wide_relationship_class_list(22).all()

print(diff_relationships)
print(map_relationships )

prints:

[(8, '22,2,3,5', 'direction,unit,commodity,node', 'commodity__node__unit__direction')]
[(8, '3,5,2,22', 'commodity,node,unit,direction', 'commodity__node__unit__direction')]

Can't update parameter values

In GitLab by @PvPer on Dec 18, 2018, 16:42

Trying to update a parameter value raises and SpineIntegrityError on latest dev (e974bb7)

from spinedatabase_api import create_new_spine_database, DiffDatabaseMapping
db = create_new_spine_database('sqlite:///test.sqlite')
db_map = DiffDatabaseMapping('sqlite:///test.sqlite', "test")

obj = db_map.add_objects({'name':'test','class_id':1})
par = db_map.add_parameters({'name':'test','object_class_id':1})
value = db_map.add_parameter_values({'parameter_id':par.first().id,'object_id':obj.first().id,'value': 3.14})
db_map.update_parameter_values({'id':value.first().id, 'value': 0})