e2niee / pandahub Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 12.0 471 KB

A data hub for pandapower and pandapipes networks based on MongoDB.

License: Other

Dockerfile 0.19% Python 88.38% Jupyter Notebook 11.43%

pandahub's People

Contributors

Stargazers

Watchers

Forkers

julffers jwiemer112 simonrubendrauz ascheidl pawellytaev jthurner flohirschmann8 mfranz13 dlohmeier mdecker0 lhillma

pandahub's Issues

Install issues and improvements

pandahub version: develop
Python version: docker
Operating System: Debian 12

Description

Hey, I just wanted to look into the pandahub to see if it is beneficial for my work and ran a quick test.
I found some issues and improvements which are mainly:

1. http://0.0.0.0:8002 should forward to http://0.0.0.0:8002/docs - instead of showing a 404
2. Guidance regarding adding or managing users or adding a first timeseries to the datahub

Still, I found some weirdness regarding the install:

What I Did

docker compose build
docker compose up -d
firefox 0.0.0.0:8002

3. Additionally, the main branch does not work at all when built & run with docker currently. So you might want to release current develop after adjusting the install requirements :)

fastapi-users does not provide the extra fastapi-users[mongodb] which results in a lot of unneccessary version lookups - probabl superseded by fastapi-users-db-mongodb

4. To reproduce run: pip install -e . on a fresh conda env (i used python 3.11, sorry)

pandahub 0.2.3 depends on fastapi>=0.73.0
fastapi-users[mongodb] 9.2.2 depends on fastapi<0.72.0 and >=0.65.2

5. When running pandahub-login defaults (http://127.0.0.01:8002) would be beneficial - I did not find any docs on how to add a user, so thats where I stopped for now.
6. docker-compose.yml does not include a mongodb, but I think it would be quite beneficial to have this included too (I added it myself for my tests)

I hope it helps to get some feedback from an external early beta tester ;)

creating a new project without a project id an directly activating it crashes.

pandahub version: 0.3.3
Python version: 3.10
Operating System: linux

Description

creating a new project without a project id an directly activating it crashes.

What I Did

ph.create_project(name=project_name, activate=True)

Output

_______________________ ERROR at setup of test_client_io _______________________
[gw0] linux -- Python 3.9.19 /opt/hostedtoolcache/Python/3.9.19/x64/bin/python
self = <pandahub.client.PandaHubClient.PandaHubClient object at 0x7fbe93998310>
    def __init__(self):
        config = os.path.join(Path.home(), "pandahub.config")
        try:
>           with open(config, "r") as f:
E           FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/pandahub.config'
pandahub/client/PandaHubClient.py:16: FileNotFoundError
During handling of the above exception, another exception occurred:
    @pytest.fixture(scope="session")
    def phc():
>       phc = PandaHubClient()
pandahub/test/conftest.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <pandahub.client.PandaHubClient.PandaHubClient object at 0x7fbe9399[83](https://github.com/e2nIEE/pandahub/actions/runs/9679734197/job/26706352704#step:7:84)10>
    def __init__(self):
        config = os.path.join(Path.home(), "pandahub.config")
        try:
            with open(config, "r") as f:
                d = json.load(f)
        except FileNotFoundError:
>           raise UserWarning("No pandahub configuration file found - log in first")
E           UserWarning: No pandahub configuration file found - log in first
pandahub/client/PandaHubClient.py:19: UserWarning
______________________ ERROR at setup of test_network_io _______________________
[gw1] linux -- Python 3.9.19 /opt/hostedtoolcache/Python/3.9.19/x64/bin/python
    @pytest.fixture(scope="session")
    def ph():
        ph = PandaHub(connection_url="mongodb://localhost:27017")
    
        project_name = "pytest"
    
        if ph.project_exists(project_name):
            ph.set_active_project(project_name)
            ph.delete_project(i_know_this_action_is_final=True)
    
>       ph.create_project(name=project_name, activate=True)
pandahub/test/conftest.py:16: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandahub/lib/PandaHub.py:244: in create_project
    self.set_active_project_by_id(id)```

set_active_project does not work without user_id

pandahub version: 0.2.2 develop
Python version: 3.9
Operating System: Windows 10

Description

set_active project calls _get_user, which trys to run UUID4(self.user_id). If the user_id is not set ( and therefore is None by default), this results in "TypeError: one of the hex, bytes, bytes_le, fields, or int arguments must be given"

What I Did

  Input In [76] in <module>
    ph.set_active_project("li20ph")
  File c:\users\julffers\git_repos\pandahub\pandahub\lib\PandaHub.py:208 in set_active_project
    self.set_active_project_by_id(project_id)
  File c:\users\julffers\git_repos\pandahub\pandahub\lib\PandaHub.py:211 in set_active_project_by_id
    self.active_project = self._get_project_document({"_id": ObjectId(project_id)})
  File c:\users\julffers\git_repos\pandahub\pandahub\lib\PandaHub.py:260 in _get_project_document
    user = self._get_user()
  File c:\users\julffers\git_repos\pandahub\pandahub\lib\PandaHub.py:141 in _get_user
    {"id": UUID4(self.user_id)}, projection={"_id": 0, "hashed_password": 0}
  File ~\miniconda3\envs\pandahub\lib\uuid.py:171 in __init__
    raise TypeError('one of the hex, bytes, bytes_le, fields, '
TypeError: one of the hex, bytes, bytes_le, fields, or int arguments must be given

fastapi-users Version 10+ uses beanie for mongodb connections, adaption needed

pandahub.server_is_available() crashes

pandahub version: latest development
Python version: 3.11
Operating System: Win

Description

Trying to check if the mongodb is online, using pandahub.server_is_available().
Fails with "self.get_masked_mongodb_url()" not found.

Possible race condition when parallel processes write networks to database

pandahub version: v0.3.2
Python version: 3.11.7
Operating System: Windows 10

Description

The PandaHub method "write_networks_to_db" causes very undesirable results when multiple parallel processes write pandapower networks to a MongoDB database at the same time (+/- a few seconds).

Currently, the method works this way:

The lowest available _id in the _networks collection is determined

max_id_network = db["_networks"].find_one(sort=[("_id", -1)]) _id = 0 if max_id_network is None else max_id_network["_id"] + 1

All element tables are written to their respective collections (net_bus, net_line, etc.) with the previously determined _id being added to all documents.
A new document is inserted to _networks containing the _id.

Under the right circumstances the order of these steps can lead to very undesirable results. When two processes at the same time attempt to write a network to the database, this is what happens:

Both processes determine the same available _id
All element tables of both processes are written to the database, referencing the same network _id.
One process succesfully inserts a document to _networks, the other one results in a error, that a document with the same _id is already in that collection.

The main problem however is, that all element documents already have been written to the database at that moment. As a consequence, if the network with this _id is loaded from the database, it contains all elements of both network - the correct ones as well as the elements of the failed process.

What I Did

As a workaround, a manually specified predetermined _ids to avoid these collisions. As a real fix, I mainly see these options:

Reverse the order of process steps in "write_networks_to_db": create the document in _networks at first. Only when this is successfull, insert the element documents. If an error occurs during that process, delete all documents that have already been written to the database.
Do not use custom _ids in the _networks collection anymore. If the default MongoDB _ids were used, a collision would be highly unlikely.

compressed data, setting include_metadata = True

pandahub version: 0.2.3
Python version: 3.8
Operating System: Windows

Description

Calling a compressed time series from the database and setting 'include_metadata'=True. This leads to an error in Pandahub.py.

What I Did

Traceback (most recent call last):
File "C:\Users\sdrauz\anaconda3\envs\pandahub\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 155, in <cell line: 142>
ts = ph.get_timeseries_from_db(filter_document={'data_type':"p_mw"},
File "C:\Users\sdrauz\git\pandahub\pandahub\lib\PandaHub.py", line 1299, in get_timeseries_from_db
del data["timestamps"]
KeyError: 'timestamps'

key word arguments in get_timeseries_from_db

pandahub version: 0.2.3
Python version: 3.8
Operating System: Windows

Description

If you call the function get_timeseries_from_db, you can also pass additional key word arguments which are automatically added to the filter_document dict. I think this might be a bit confusing. If a user sets a filter_document dict why should he/she additionally pass key word arguments? Therefore, I would remove kwargs here.

Why I stumbled above this problem: I wrote compress_ts_data (same as in write_timeseries_to_db) instead of compressed_ts_data. Maybe also rethink its name, as this can easily happen as new user. A consistent name would help here maybe. What happened then was, that compress_ts_data moved to the filter document dict causing that I ran into a pandahub error.

What I Did

Traceback (most recent call last):
File "C:\Users\sdrauz\anaconda3\envs\pandahub\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 155, in <cell line: 142>
ts = ph.get_timeseries_from_db(filter_document={'data_type':"p_mw"},
File "C:\Users\sdrauz\git\pandahub\pandahub\lib\PandaHub.py", line 1278, in get_timeseries_from_db
raise PandaHubError("no documents matching the provided filter found", 404)
pandahub.lib.PandaHub.PandaHubError: no documents matching the provided filter found

bulk_write_to_db - why global_database=True by default

pandahub version: 0.2.2
Python version: 3.9
Operating System: Windows 10

Description

The parameter global_database is False by default for all Pandahub methods except "bulk_write_to_db". Is there a specific reason for that?

deprecate bulk_get_timeseries_from_db?

pandahub version: 0.2.2
Python version: 3.9
Operating System: Windows 10

Description

I was wondering, if the Pandahub method "bulk_get_timeseries_from_db" is needed anymore. multi_get_timeseries_from_db also serves the purpose to retrieve multiple timeseries at once. The difference is, that "bulk_get_timeseries_from_db" aggregates the timeseries data directly on the database. However it is my experience, that the used MongoDB aggregations are rather complex, hard to debug and resource consuming on the database server. For most use-cases it is more efficient and less complex, to aggregate the data on the client side.

Are there any important use-cases, where it is important to use as little computing as possible client-side, that justifiy maintaining this code?

e2niee / pandahub Goto Github PK

pandahub's People

Contributors

Stargazers

Watchers

Forkers

pandahub's Issues

Description

What I Did

Description

What I Did

Description

What I Did

Description

Description

What I Did

Description

What I Did

Description

What I Did

Description

Description

Recommend Projects

Recommend Topics

Recommend Org