Code Monkey home page Code Monkey logo

pandahub's People

Contributors

ascheidl avatar dlohmeier avatar jkupka avatar jthurner avatar julffers avatar jwiemer112 avatar lthurner avatar simonrubendrauz avatar vogt31337 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pandahub's Issues

Install issues and improvements

  • pandahub version: develop
  • Python version: docker
  • Operating System: Debian 12

Description

Hey, I just wanted to look into the pandahub to see if it is beneficial for my work and ran a quick test.
I found some issues and improvements which are mainly:

Still, I found some weirdness regarding the install:

What I Did

docker compose build
docker compose up -d
firefox 0.0.0.0:8002
  • 3. Additionally, the main branch does not work at all when built & run with docker currently. So you might want to release current develop after adjusting the install requirements :)

fastapi-users does not provide the extra fastapi-users[mongodb] which results in a lot of unneccessary version lookups - probabl superseded by fastapi-users-db-mongodb

  • 4. To reproduce run: pip install -e . on a fresh conda env (i used python 3.11, sorry)
pandahub 0.2.3 depends on fastapi>=0.73.0
fastapi-users[mongodb] 9.2.2 depends on fastapi<0.72.0 and >=0.65.2
  • 5. When running pandahub-login defaults (http://127.0.0.01:8002) would be beneficial - I did not find any docs on how to add a user, so thats where I stopped for now.
  • 6. docker-compose.yml does not include a mongodb, but I think it would be quite beneficial to have this included too (I added it myself for my tests)

I hope it helps to get some feedback from an external early beta tester ;)

creating a new project without a project id an directly activating it crashes.

  • pandahub version: 0.3.3
  • Python version: 3.10
  • Operating System: linux

Description

creating a new project without a project id an directly activating it crashes.

What I Did

ph.create_project(name=project_name, activate=True)

Output

_______________________ ERROR at setup of test_client_io _______________________
[gw0] linux -- Python 3.9.19 /opt/hostedtoolcache/Python/3.9.19/x64/bin/python
self = <pandahub.client.PandaHubClient.PandaHubClient object at 0x7fbe93998310>
    def __init__(self):
        config = os.path.join(Path.home(), "pandahub.config")
        try:
>           with open(config, "r") as f:
E           FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/pandahub.config'
pandahub/client/PandaHubClient.py:16: FileNotFoundError
During handling of the above exception, another exception occurred:
    @pytest.fixture(scope="session")
    def phc():
>       phc = PandaHubClient()
pandahub/test/conftest.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <pandahub.client.PandaHubClient.PandaHubClient object at 0x7fbe9399[83](https://github.com/e2nIEE/pandahub/actions/runs/9679734197/job/26706352704#step:7:84)10>
    def __init__(self):
        config = os.path.join(Path.home(), "pandahub.config")
        try:
            with open(config, "r") as f:
                d = json.load(f)
        except FileNotFoundError:
>           raise UserWarning("No pandahub configuration file found - log in first")
E           UserWarning: No pandahub configuration file found - log in first
pandahub/client/PandaHubClient.py:19: UserWarning
______________________ ERROR at setup of test_network_io _______________________
[gw1] linux -- Python 3.9.19 /opt/hostedtoolcache/Python/3.9.19/x64/bin/python
    @pytest.fixture(scope="session")
    def ph():
        ph = PandaHub(connection_url="mongodb://localhost:27017")
    
        project_name = "pytest"
    
        if ph.project_exists(project_name):
            ph.set_active_project(project_name)
            ph.delete_project(i_know_this_action_is_final=True)
    
>       ph.create_project(name=project_name, activate=True)
pandahub/test/conftest.py:16: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandahub/lib/PandaHub.py:244: in create_project
    self.set_active_project_by_id(id)```

set_active_project does not work without user_id

  • pandahub version: 0.2.2 develop
  • Python version: 3.9
  • Operating System: Windows 10

Description

set_active project calls _get_user, which trys to run UUID4(self.user_id). If the user_id is not set ( and therefore is None by default), this results in "TypeError: one of the hex, bytes, bytes_le, fields, or int arguments must be given"

What I Did

  Input In [76] in <module>
    ph.set_active_project("li20ph")
  File c:\users\julffers\git_repos\pandahub\pandahub\lib\PandaHub.py:208 in set_active_project
    self.set_active_project_by_id(project_id)
  File c:\users\julffers\git_repos\pandahub\pandahub\lib\PandaHub.py:211 in set_active_project_by_id
    self.active_project = self._get_project_document({"_id": ObjectId(project_id)})
  File c:\users\julffers\git_repos\pandahub\pandahub\lib\PandaHub.py:260 in _get_project_document
    user = self._get_user()
  File c:\users\julffers\git_repos\pandahub\pandahub\lib\PandaHub.py:141 in _get_user
    {"id": UUID4(self.user_id)}, projection={"_id": 0, "hashed_password": 0}
  File ~\miniconda3\envs\pandahub\lib\uuid.py:171 in __init__
    raise TypeError('one of the hex, bytes, bytes_le, fields, '
TypeError: one of the hex, bytes, bytes_le, fields, or int arguments must be given

pandahub.server_is_available() crashes

  • pandahub version: latest development
  • Python version: 3.11
  • Operating System: Win

Description

Trying to check if the mongodb is online, using pandahub.server_is_available().
Fails with "self.get_masked_mongodb_url()" not found.

Possible race condition when parallel processes write networks to database

  • pandahub version: v0.3.2
  • Python version: 3.11.7
  • Operating System: Windows 10

Description

The PandaHub method "write_networks_to_db" causes very undesirable results when multiple parallel processes write pandapower networks to a MongoDB database at the same time (+/- a few seconds).

Currently, the method works this way:

  1. The lowest available _id in the _networks collection is determined

max_id_network = db["_networks"].find_one(sort=[("_id", -1)]) _id = 0 if max_id_network is None else max_id_network["_id"] + 1

  1. All element tables are written to their respective collections (net_bus, net_line, etc.) with the previously determined _id being added to all documents.

  2. A new document is inserted to _networks containing the _id.

Under the right circumstances the order of these steps can lead to very undesirable results. When two processes at the same time attempt to write a network to the database, this is what happens:

  1. Both processes determine the same available _id
  2. All element tables of both processes are written to the database, referencing the same network _id.
  3. One process succesfully inserts a document to _networks, the other one results in a error, that a document with the same _id is already in that collection.

The main problem however is, that all element documents already have been written to the database at that moment. As a consequence, if the network with this _id is loaded from the database, it contains all elements of both network - the correct ones as well as the elements of the failed process.

What I Did

As a workaround, a manually specified predetermined _ids to avoid these collisions. As a real fix, I mainly see these options:

  1. Reverse the order of process steps in "write_networks_to_db": create the document in _networks at first. Only when this is successfull, insert the element documents. If an error occurs during that process, delete all documents that have already been written to the database.

  2. Do not use custom _ids in the _networks collection anymore. If the default MongoDB _ids were used, a collision would be highly unlikely.

compressed data, setting include_metadata = True

  • pandahub version: 0.2.3
  • Python version: 3.8
  • Operating System: Windows

Description

Calling a compressed time series from the database and setting 'include_metadata'=True. This leads to an error in Pandahub.py.

What I Did

Traceback (most recent call last):
File "C:\Users\sdrauz\anaconda3\envs\pandahub\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 155, in <cell line: 142>
ts = ph.get_timeseries_from_db(filter_document={'data_type':"p_mw"},
File "C:\Users\sdrauz\git\pandahub\pandahub\lib\PandaHub.py", line 1299, in get_timeseries_from_db
del data["timestamps"]
KeyError: 'timestamps'

key word arguments in get_timeseries_from_db

  • pandahub version: 0.2.3
  • Python version: 3.8
  • Operating System: Windows

Description

If you call the function get_timeseries_from_db, you can also pass additional key word arguments which are automatically added to the filter_document dict. I think this might be a bit confusing. If a user sets a filter_document dict why should he/she additionally pass key word arguments? Therefore, I would remove kwargs here.

Why I stumbled above this problem: I wrote compress_ts_data (same as in write_timeseries_to_db) instead of compressed_ts_data. Maybe also rethink its name, as this can easily happen as new user. A consistent name would help here maybe. What happened then was, that compress_ts_data moved to the filter document dict causing that I ran into a pandahub error.

What I Did

Traceback (most recent call last):
File "C:\Users\sdrauz\anaconda3\envs\pandahub\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 155, in <cell line: 142>
ts = ph.get_timeseries_from_db(filter_document={'data_type':"p_mw"},
File "C:\Users\sdrauz\git\pandahub\pandahub\lib\PandaHub.py", line 1278, in get_timeseries_from_db
raise PandaHubError("no documents matching the provided filter found", 404)
pandahub.lib.PandaHub.PandaHubError: no documents matching the provided filter found

bulk_write_to_db - why global_database=True by default

  • pandahub version: 0.2.2
  • Python version: 3.9
  • Operating System: Windows 10

Description

The parameter global_database is False by default for all Pandahub methods except "bulk_write_to_db". Is there a specific reason for that?

deprecate bulk_get_timeseries_from_db?

  • pandahub version: 0.2.2
  • Python version: 3.9
  • Operating System: Windows 10

Description

I was wondering, if the Pandahub method "bulk_get_timeseries_from_db" is needed anymore. multi_get_timeseries_from_db also serves the purpose to retrieve multiple timeseries at once. The difference is, that "bulk_get_timeseries_from_db" aggregates the timeseries data directly on the database. However it is my experience, that the used MongoDB aggregations are rather complex, hard to debug and resource consuming on the database server. For most use-cases it is more efficient and less complex, to aggregate the data on the client side.

Are there any important use-cases, where it is important to use as little computing as possible client-side, that justifiy maintaining this code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.