Code Monkey home page Code Monkey logo

deutschland's Introduction

PyPI version deutschland GitHub license

Lint Publish Python 🐍 distributions 📦 to PyPI and TestPyPI Run Python 🐍 tests

Deutschland

A python package that gives you easy access to the most valuable datasets of Germany.

Installation

pip install deutschland

Supported Python Versions

3.8 - 3.12

Tested on Linux, MacOS and Windows

Run Python 🐍 tests

Development

For development poetry version >=1.2.0 is required.

Geographic data

Fetch information about streets, house numbers, building outlines, …

from deutschland.geo import Geo
geo = Geo()
# top_right and bottom_left coordinates
data = geo.fetch([52.530116236589244, 13.426532801586827],
                 [52.50876180448243, 13.359631043007212])
print(data.keys())
# dict_keys(['Adresse', 'Barrierenlinie', 'Bauwerksflaeche', 'Bauwerkslinie', 'Bauwerkspunkt', 'Besondere_Flaeche', 'Besondere_Linie', 'Besonderer_Punkt', 'Gebaeudeflaeche', 'Gebaeudepunkt', 'Gewaesserflaeche', 'Gewaesserlinie', 'Grenze_Linie', 'Historischer_Punkt', 'Siedlungsflaeche', 'Vegetationslinie', 'Verkehrsflaeche', 'Verkehrslinie', 'Verkehrspunkt', 'Hintergrund'])

print(data["Adresse"][0])
# {'geometry': {'type': 'Point', 'coordinates': (13.422642946243286, 52.51500157651358)}, 'properties': {'postleitzahl': '10179', 'ort': 'Berlin', 'ortsteil': 'Mitte', 'strasse': 'Holzmarktstraße', 'hausnummer': '55'}, 'id': 0, 'type': 'Feature'}

For the detailed documentation of this API see here

The data is provided by the AdV SmartMapping. The team consists of participants from the German state surveying offices, the Federal Agency for Cartography and Geodesy (BKG), the German Federal Armed Forces (Bundeswehr ZGeoBW) and third parties from research and education.

Company Data

Bundesanzeiger

Get financial reports for all german companies that are reporting to Bundesanzeiger. More

from deutschland.bundesanzeiger import Bundesanzeiger
ba = Bundesanzeiger()
# search term
data = ba.get_reports("Deutsche Bahn AG")
# returns a dictionary with all reports found as fulltext reports
print(data.keys())
# dict_keys(['Jahresabschluss zum Geschäftsjahr vom 01.01.2020 bis zum 31.12.2020', 'Konzernabschluss zum Geschäftsjahr vom 01.01.2020 bis zum 31.12.2020\nErgänzung der Veröffentlichung vom 04.06.2021',

Big thanks to Nico Duldhardt and Friedrich Eichenroth, who supported this implementation with their machine learning model.

Handelsregister

The code for the Handelsregister moved to this repo.

Consumer Protection Data

Lebensmittelwarnung

Get current product warnings provided by the german federal portal lebensmittelwarnung.de.

from deutschland.lebensmittelwarnung import Lebensmittelwarnung
lw = Lebensmittelwarnung()
# search by content type and region, see documetation for all available params
data = lw.get("lebensmittel", "berlin")
print(data)
# [{'id': 19601, 'guid': 'https://www.lebensmittelwarnung.de/bvl-lmw-de/detail/lebensmittel/19601', 'pubDate': 'Fri, 10 Feb 2017 12:28:45 +0000', 'imgSrc': 'https://www.lebensmittelwarnung.de/bvl-lmw-de/opensaga/attachment/979f8cd3-969e-4a6c-9a8e-4bdd61586cd4/data.jpg', 'title': 'Sidroga Bio Säuglings- und Kindertee', 'manufacturer': 'Lebensmittel', 'warning': 'Pyrrolizidinalkaloide', 'affectedStates': ['Baden-Württemberg', '...']}]

Federal Job Openings

NRW

VERENA

Get open substitute teaching positions in NRW from https://www.schulministerium.nrw.de/BiPo/Verena/angebote

from deutschland.verena import Verena
v = Verena()
data = v.get()
print(data)
# a full example data can be found at deutschland/verena/example.md
# [{ "school_id": "99999", "desc": "Eine Schule\nSchule der Sekundarstufe II\ndes Landkreis Schuling\n9999 Schulingen", "replacement_job_title": "Lehrkraft", "subjects": [ "Fach 1", "Fach 2" ], "comments": "Bemerkung zur Stelle: Testbemerkung", "duration": "01.01.2021 - 01.01.2022", ...} ...]

Autobahn

Get data from the Autobahn.

from deutschland import autobahn
from deutschland.autobahn.api import default_api

from pprint import pprint

autobahn_api_instance = default_api.DefaultApi()

try:
    # Auflistung aller Autobahnen
    api_response = autobahn_api_instance.list_autobahnen()
    pprint(api_response)

    # Details zu einer Ladestation
    station_id = "RUxFQ1RSSUNfQ0hBUkdJTkdfU1RBVElPTl9fMTczMzM="  # str |
    api_response = autobahn_api_instance.get_charging_station(station_id)
    pprint(api_response)

except autobahn.ApiException as e:
    print("Exception when calling DefaultApi->get_charging_station: %s\n" % e)

For the detailed documentation of this API see here

Presseportal

Not available for now due to changes in the API.

Auto-Generated API-Clients

bundesrat

For the detailed documentation of this API see here

bundestag

For the detailed documentation of this API see here

destatis

For the detailed documentation of this API see here

dwd

For the detailed documentation of this API see here

interpol

For the detailed documentation of this API see here

jobsuche

For the detailed documentation of this API see here

ladestationen

For the detailed documentation of this API see here

mudab

For the detailed documentation of this API see here

nina

For the detailed documentation of this API see here

polizei_brandenburg

For the detailed documentation of this API see here

risikogebiete

For the detailed documentation of this API see here

smard

For the detailed documentation of this API see here

strahlenschutz

For the detailed documentation of this API see here

travelwarning

For the detailed documentation of this API see here

zoll

For the detailed documentation of this API see here

deutschland's People

Contributors

aarondewes avatar andreasbossard avatar asdil12 avatar auchtetraborat avatar dependabot[bot] avatar devmarcstorm avatar eichenroth avatar hundsmuhlen avatar jugmac00 avatar k0in avatar lilithwittmann avatar lukaspanni avatar mauriceatrops avatar pjullrich avatar severinsimmler avatar t-huyeng avatar trisnol avatar weddige avatar wirthual avatar zeitschlag avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deutschland's Issues

Suggestion: add python version in readme

I was just trying to install the this package and couldn't get it to work using python 3.9 on Windows 10. Mostly due to numpy/blas/lapack/mkl errors somewhere deep down. However, reading the pyproject.toml I saw the mentioning of python 3.6.2.
=> Using python 3.6 I was able to install this package without errors.

destatis - Ergebnis?!

`
import time
from deutschland import destatis
from pprint import pprint
from deutschland.destatis.api import default_api

with destatis.ApiClient() as api_client:
# Create an instance of the API class
api_instance = default_api.DefaultApi(api_client)
username = "xxx"
password = "xxx"
name = "45341-0102"
area = "all"
compress = "false"
transpose = "false"
startyear = "startyear_example"
endyear = "endyear_example"
timeslices = "timeslices_example"
regionalvariable = "regionalvariable_example"
regionalkey = "regionalkey_example"
classifyingvariable1 = "classifyingvariable1_example"
classifyingkey1 = "classifyingkey1_example"
classifyingvariable2 = "classifyingvariable2_example"
classifyingkey2 = "classifyingkey2_example"
classifyingvariable3 = "classifyingvariable3_example"
classifyingkey3 = "classifyingkey3_example"
job = "false"
stand = "01.01.1970 01:00"
language = "de"
format = "csv"

try:
    api_instance.table(username=username, password=password, name=name, area=area, compress=compress, transpose=transpose, startyear=startyear, endyear=endyear, timeslices=timeslices, regionalvariable=regionalvariable, regionalkey=regionalkey, classifyingvariable1=classifyingvariable1, classifyingkey1=classifyingkey1, classifyingvariable2=classifyingvariable2, classifyingkey2=classifyingkey2, classifyingvariable3=classifyingvariable3, classifyingkey3=classifyingkey3, job=job, stand=stand, language=language)
except destatis.ApiException as e:
    print("Exception:")
    print(e)

`
Ok... analog zum Standard konfiguriert - Abfrage geht fehlerfrei durch.

Wahrscheinliche eine dumme Frage, aber wie komme ich an das Ergebnis? :)

Saved Model does not exist

I am receiving following error message, it looks like the model used to solve the captchas does not exist in assets/model.h5 directory: OSError: SavedModel file does not exist at: assets/model.h5/{saved_model.pbtxt|saved_model.pb}

Here's my code:

from deutschland import Bundesanzeiger
ba = Bundesanzeiger()
# search term
data = ba.get_reports("Deutsche Bahn AG")
# returns a dictionary with all reports found as fulltext reports
print(data.keys())
# dict_keys(['Jahresabschluss zum Geschäftsjahr vom 01.01.2020 bis zum 31.12.2020', 'Konzernabschluss zum Geschäftsjahr vom 01.01.2020 bis zum 31.12.2020\nErgänzung der Veröffentlichung vom 04.06.2021',

error when trying to extend bundesanzeiger search

I thought about contributing to your package by adding the extended search functionality (i.e. not only search for all documents but add the possibility to limit the search to certain types of documents).
Unfortunately, this is only working for certain companies while for certain other companies the captcha solver always fails. Any ideas why that might be?
(e.g. it works without errors for "Deutsche Bahn AG" but it keeps failing for "Deutsche Bank AG")

image
Change:
add the value 22 to the search request
response = self.session.get(
f"https://www.bundesanzeiger.de/pub/de/start?0-2.-top%7Econtent%7Epanel-left%7Ecard-form=&fulltext={company_name}&area_select=22&search_button=Suchen"
)

Python 3.11 compatability - Dependency Pillow

Python 3.11 supports the following Pillow verisons: Pillow >= 9.3
readthedocs Pillow
deutschland 0.3.2 requires Pillow<9.0.0,>=8.3.1

Error message from prompt:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
deutschland 0.3.2 requires Pillow<9.0.0,>=8.3.1, but you have pillow 9.3.0 which is incompatible.

destatis api does not work due to SSLError

Hello,
when trying to use the destatis api (timeseries_data to be exact), it throws a SSLCertVerificationError.

MaxRetryError: HTTPSConnectionPool(host='www-genesis.destatis.de', port=443): Max retries exceeded with url: /genesisWS/rest/2020/data/timeseries?username=*******&password=******* (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)')))

My code is just the example for timeseries_data, with a valid username and password provided.
The host url is https://www-genesis.destatis.de/genesisWS/rest/2020.

paranoia mode: requests' proxy support

When you're in paranoia mode and want to use (anonymous) proxies, replace this line

result = requests.get(url, headers=headers)

with this one

result = requests.get(url, headers=headers, proxies=proxies)

and add this to your main function (with your proxy servers of course):

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}

[NINA] AGS not working for "Gemeinde", only for "Stadtkreis" and "Landkreis"

Notice, not the AGS ("Amtlicher Gemeindeschlüssel") is used, as done by DWD; instead, the RS ("Bundeseinheitlicher Regionalschlüssel") is used.
But only the RS of city districts and county districts work, not those of towns.

The following request returns all warnings for "Stadtkreis Heilbronn" / "Universitätsstadt Heilbronn" (["081210000000","Heilbronn, Universitätsstadt",null]):
https://warnung.bund.de/api31/dashboard/081210000000.json

But the following request does not return the warnings for "Möckmühl" (["081255007063","Möckmühl, Stadt",null]):
https://warnung.bund.de/api31/dashboard/081255007063.json

Instead, we must use the RS from "Landkreis Heilbronn" (which is not provided in the linked AGS-Table):
https://warnung.bund.de/api31/dashboard/081250000000.json
But in this case, we receive all the warnings from the county district and not only the ones from the targeted town.

How to receive only the warnings from one town (e.g. Möckmühl), as it is done nowadays by the NINA App (warning level == "Gemeinde")?

Bundesanzeiger: query of a string starting with a number returns an error

How to reproduce:

from deutschland.bundesanzeiger import Bundesanzeiger
ba = Bundesanzeiger()
data = ba.get_reports('4steps systems')

Expected result:

  • assignment of the result dict to data.

What I got instead:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 3
      1 from deutschland.bundesanzeiger import Bundesanzeiger
      2 ba = Bundesanzeiger()
----> 3 data = ba.get_reports('4steps systems')

File [~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:186](https://file+.vscode-resource.vscode-cdn.net/home/adomberg/projects/20230616_Unternehmensregister/~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:186), in Bundesanzeiger.get_reports(self, company_name)
    182 # perform the search
    183 response = self.session.get(
    184     f"https://www.bundesanzeiger.de/pub/de/start?0-2.-top%7Econtent%7Epanel-left%7Ecard-form=&fulltext={company_name}&area_select=&search_button=Suchen"
    185 )
--> 186 return self.__generate_result(response.text)

File [~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:120](https://file+.vscode-resource.vscode-cdn.net/home/adomberg/projects/20230616_Unternehmensregister/~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:120), in Bundesanzeiger.__generate_result(self, content)
    118 """iterate trough all results and try to fetch single reports"""
    119 result = {}
--> 120 for element in self.__find_all_entries_on_page(content):
    121     get_element_response = self.session.get(element.content_url)
    123     if self.__is_captcha_needed(get_element_response.text):

File [~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:90](https://file+.vscode-resource.vscode-cdn.net/home/adomberg/projects/20230616_Unternehmensregister/~/miniconda3/envs/uregister/lib/python3.8/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py:90), in Bundesanzeiger.__find_all_entries_on_page(self, page_content)
     88 soup = BeautifulSoup(page_content, "html.parser")
     89 wrapper = soup.find("div", {"class": "result_container"})
---> 90 rows = wrapper.find_all("div", {"class": "row"})
     91 for row in rows:
     92     info_element = row.find("div", {"class": "info"})

AttributeError: 'NoneType' object has no attribute 'find_all'

I tried other numerals and non-numerals with the described error pattern.

My env: Ubuntu 22.04, python 3.8.17

Handelsregister.search funktioniert nicht (mehr?)

Das Beispiel aus der README funktioniert nicht:

from deutschland.handelsregister import Handelsregister
hr = Handelsregister()
# search by keywords, see documentation for all available params
hr.search(keywords="Deutsche Bahn Aktiengesellschaft") # Hier wird None zurückgegeben
print(hr)  # Das funktioniert sowieso nicht, sondern gibt das Handelsregister-Objeckt zurück

Ich habemir das gestern etwas angeschaut und der der Server liefert einen 404 zurück. Kann es sein, dass der Endpunkt jetzt https://www.handelsregister.de/rp_web/normalesuche.xhtml ist? Die Parameter heißen auch etwas anders. Nachdem meine IP gestern geblockt wurde, habe ich aber nicht mehr weiter gemacht.

Fix integration tests for handelsregister

Execute tests with pytest.

Tests fail with:

_______________________ test_for_no_data_handelsregister _______________________

    def test_for_no_data_handelsregister():
        hr = Handelsregister()
        data = hr.search(keywords="foobar", keyword_match_option=3)
>       assert (
            len(data) == 0
        ), "Found registered companies for 'foobar' although none were expected."
E       TypeError: object of type 'NoneType' has no len()

tests/integration_test.py:19: TypeError
___________ test_fetching_handelsregister_data_for_deutsche_bahn_ag ____________

    def test_fetching_handelsregister_data_for_deutsche_bahn_ag():
        hr = Handelsregister()
        data = hr.search(
            keywords="Deutsche Bahn Aktiengesellschaft", keyword_match_option=3
        )
>       assert (
            len(data) > 0
        ), "Found no data for 'Deutsche Bahn Aktiengesellschaft' although it should exist."
E       TypeError: object of type 'NoneType' has no len()

tests/integration_test.py:29: TypeError
___ test_fetching_handelsregister_data_for_deutsche_bahn_ag_with_raw_params ____

    def test_fetching_handelsregister_data_for_deutsche_bahn_ag_with_raw_params():
        r = Registrations()
        data = r.search_with_raw_params(
            {"schlagwoerter": "Deutsche Bahn Aktiengesellschaft", "schlagwortOptionen": 3}
        )
>       assert (
            len(data) > 0
        ), "Found no data for 'Deutsche Bahn Aktiengesellschaft' although it should exist."
E       TypeError: object of type 'NoneType' has no len()

Fix linting with updated spectral version

Spectral removed auto detection for openAPI schemas. Because of this, the linting fails when no .spectra.yaml is provided.

See: stoplightio/spectral#1796

In the Autobahn repo there is a .spectral.yaml file to account for that.

However, if we do not expect to use custom linting rulesets, we can simply get rid of this extra file by executing this before the linting process:

echo "extends: spectral:oas" > .spectral.yaml

In geo.py Geo.fetch() top_right and bottom_left seem to be swapped.

In a map oriented to the north, the first coordinate passed to Geo.fetch() is the lower left, and the second coordinate is the upper right of the two coordinates.

The following image shows both coordinates from the documentation in OpenStreetMap.

# top_right and bottom_left coordinates
data = geo.fetch([52.50876180448243, 13.359631043007212], 
                 [52.530116236589244, 13.426532801586827])

image

Maybe I misunderstand the names top_right and bottom_left. Please correct me if this is the case. Otherwise, I would work on a pull request for this issue, which renames the variables.

Handelsregister tests are failing

I think the Handelsregister code in this repo is deprecated for the Handelsregister repo.

This means we should be able to clean up the repo and move the tests to the Handelsregister repo.

issue with Bundesanzeiger?

I ran the sample code, but had the following error:


OSError Traceback (most recent call last)
in ()
1 from deutschland import Bundesanzeiger
----> 2 ba = Bundesanzeiger()
3 # search term
4 data = ba.get_reports("Deutsche Bahn AG")
5 # returns a dictionary with all reports found as fulltext reports

3 frames
/usr/local/lib/python3.7/dist-packages/keras/saving/save.py in load_model(filepath, custom_objects, compile, options)
202 if isinstance(filepath_str, str):
203 if not tf.io.gfile.exists(filepath_str):
--> 204 raise IOError(f'No file or directory found at {filepath_str}')
205
206 if tf.io.gfile.isdir(filepath_str):

OSError: No file or directory found at assets/model.h5

Handelsregister demo code returns error

Running the demo code in the README.md for the Handelsregister module returns an error.

How to reproduce:

>>> from deutschland import Bundesanzeiger
>>> from deutschland import Handelsregister
>>> hr = Handelsregister()
>>> hr.search(keywords="Deutsche Bahn Aktiengesellschaft")

Expected result:

  • Handelsregister infos to be retrieved and stored in 'hr' object.

What I got instead:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/Dokumente/bundesapi/test_bundesapi_2/env/lib/python3.9/site-packages/deutschland/handelsregister/handelsregister.py", line 138, in search
    return self.search_with_raw_params(params)
  File "/home/xxx/Dokumente/bundesapi/test_bundesapi_2/env/lib/python3.9/site-packages/deutschland/handelsregister/handelsregister.py", line 215, in search_with_raw_params
    return self.__find_entries(soup)
  File "/home/xxx/Dokumente/bundesapi/test_bundesapi_2/env/lib/python3.9/site-packages/deutschland/handelsregister/handelsregister.py", line 242, in __find_entries
    data = self.__extract_history(tr)
  File "/home/xxx/Dokumente/bundesapi/test_bundesapi_2/env/lib/python3.9/site-packages/deutschland/handelsregister/handelsregister.py", line 276, in __extract_history
    [position, historical_name] = tds[1].text.strip().split(".) ", 1)
ValueError: not enough values to unpack (expected 2, got 1)

My env:

  • Ubuntu 20.04, venv with 3.9.6 and pip 21.2.4

Test newest version of the openapi generator

A new version of the API generator is released which should improve the quality of the generated python code (e.g. Type hints etc.)

We need to evaluate if our current process of generating the clients still works with this newest release.

Possibly we will need to adapt the post-processing script for the new code.

Model for CAPTCHAs in ONNX format

Hi,

I have exported your model to solve the CAPTCHAs to the ONNX format. This has the advantage that you get rid of TensorFlow as a dependency (~500 MB) and can use onnxruntime (~5 MB) for inference instead – would make the whole project way more lightweight. There are also significantly fewer problems updating onnxruntime than TensorFlow without breaking the model.

And I could also fix #8, if you're interested.

Dependency Shapely: Not installed in new env

The installation of Shapely doesn't work in a fresh environment.
How to reproduce:

  • pip install deutschland

Expected result:

  • python package should be installed

What I got instead:

  • Error message:
 ERROR: Cannot install deutschland==0.1.0, deutschland==0.1.1, deutschland==0.1.2, deutschland==0.1.3, deutschland==0.1.4, deutschland==0.1.5, deutschland==0.1.6 and deutschland==0.1.7 because these package versions have conflicting dependencies.

The conflict is caused by:
    deutschland 0.1.7 depends on Shapely<2.0.0 and >=1.7.1

Workaround:

  • first install Shapely manually by running pip install Shapely.

My environment:

  • Mac 11.5.2. Big Sur, Python 3.9.6, pip 21.2.4

Add all openapi spec apis to library

We need to find a way to create API bindings from all the openapi specs to integrate them automatically into the deutschland lib.

Any suggestions on how to tackle this?

Can't get 0.3.0 running

After fresh virtualenv install via:
pipenv install git+https://github.com/bundesAPI/deutschland.git#egg=deutschland
I get:
Python 3.9.5 (default, Nov 23 2021, 15:27:38)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import deutschland as de
from deutschland import Geo
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'Geo' from 'deutschland' (unknown location)
geo = Geo()
Traceback (most recent call last):
File "", line 1, in
NameError: name 'Geo' is not defined
data = geo.fetch([52.530116236589244, 13.426532801586827],
... [52.50876180448243, 13.359631043007212])
Traceback (most recent call last):
File "", line 1, in
NameError: name 'geo' is not defined

Similar problem with google collab(python3.7):
!pip install deutschland
I get the folowing:

Attempting uninstall: urllib3
Found existing installation: urllib3 1.24.3
Uninstalling urllib3-1.24.3:
Successfully uninstalled urllib3-1.24.3
Attempting uninstall: requests
Found existing installation: requests 2.23.0
Uninstalling requests-2.23.0:
Successfully uninstalled requests-2.23.0
Attempting uninstall: regex
Found existing installation: regex 2022.6.2
Uninstalling regex-2022.6.2:
Successfully uninstalled regex-2022.6.2
Attempting uninstall: numpy
Found existing installation: numpy 1.21.6
Uninstalling numpy-1.21.6:
Successfully uninstalled numpy-1.21.6
Attempting uninstall: Pillow
Found existing installation: Pillow 7.1.2
Uninstalling Pillow-7.1.2:
Successfully uninstalled Pillow-7.1.2
Attempting uninstall: pandas
Found existing installation: pandas 1.3.5
Uninstalling pandas-1.3.5:
Successfully uninstalled pandas-1.3.5
Attempting uninstall: lxml
Found existing installation: lxml 4.2.6
Uninstalling lxml-4.2.6:
Successfully uninstalled lxml-4.2.6
Attempting uninstall: beautifulsoup4
Found existing installation: beautifulsoup4 4.6.3
Uninstalling beautifulsoup4-4.6.3:
Successfully uninstalled beautifulsoup4-4.6.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xarray-einstats 0.2.2 requires numpy>=1.21, but you have numpy 1.19.0 which is incompatible.
tensorflow 2.8.2+zzzcolab20220527125636 requires numpy>=1.20, but you have numpy 1.19.0 which is incompatible.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.28.1 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.
Successfully installed Pillow-8.4.0 beautifulsoup4-4.11.1 boto3-1.24.34 botocore-1.27.34 dateparser-1.1.1 de-autobahn-1.0.4 de-bundesrat-0.1.0 de-bundestag-0.1.0 de-dwd-1.0.1 de-interpol-0.1.0 de-jobsuche-0.1.0 de-ladestationen-1.0.5 de-mudab-0.1.0 de-nina-1.0.2 de-polizei-brandenburg-0.1.0 de-risikogebiete-0.1.0 de-smard-0.1.0 de-strahlenschutz-1.0.0 de-travelwarning-0.1.0 de-zoll-0.1.0 deutschland-0.3.0 gql-2.0.0 graphql-core-2.3.2 jmespath-1.0.1 lxml-4.9.1 mapbox-vector-tile-1.2.1 numpy-1.19.0 onnxruntime-1.10.0 pandas-1.1.5 pyclipper-1.3.0.post3 pypresseportal-0.1 regex-2022.3.2 requests-2.28.1 rx-1.6.1 s3transfer-0.6.0 slugify-0.0.1 urllib3-1.26.10

WARNING: The following packages were previously imported in this runtime:
[PIL,numpy]
You must restart the runtime in order to use newly installed versions.

import deutschland as de
geo = de.Geo()
AttributeError Traceback (most recent call last)

in ()
1 import deutschland as de
----> 2 geo = de.Geo()
AttributeError: module 'deutschland' has no attribute 'Geo'

Running an older version:
deutschland = "==0.1.9"
I don't get the import errors in Geo() but an empy result:

Python 3.9.13 (main, May 23 2022, 22:01:06)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from deutschland import Geo
geo = Geo()
data = geo.fetch([52.530116236589244, 13.426532801586827],
... [52.50876180448243, 13.359631043007212])
print(data.keys())
dict_keys([])

With the Bundesanzeiger I get the import error again:

from deutschland import Bundesanzeiger
ba = Bundesanzeiger()
2022-07-22 00:09:52.683533: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-07-22 00:09:52.683555: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "", line 1, in
File "/home/moritz/.local/share/virtualenvs/deutschland-fqErnsp1/lib/python3.9/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py", line 47, in init
self.model = deutschland.bundesanzeiger.model.load_model()
File "/home/moritz/.local/share/virtualenvs/deutschland-fqErnsp1/lib/python3.9/site-packages/deutschland/bundesanzeiger/model.py", line 36, in load_model
return keras.models.load_model(
File "/home/moritz/.local/share/virtualenvs/deutschland-fqErnsp1/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/moritz/.local/share/virtualenvs/deutschland-fqErnsp1/lib/python3.9/site-packages/keras/saving/save.py", line 206, in load_model
raise IOError(f'No file or directory found at {filepath_str}')
OSError: No file or directory found at assets/model.h5
data = ba.get_reports("Deutsche Bahn AG")
Traceback (most recent call last):
File "", line 1, in
NameError: name 'ba' is not defined

Could you point to what I am doing wrong? Best regards

No module named 'deutschland.geo'; 'deutschland' is not a package

Hi,

trying to get your example https://github.com/bundesAPI/deutschland#geographic-data to run …

pip3 install deutschland on my macOS 12.5 with homebrew results in:

Requirement already satisfied: deutschland in /opt/homebrew/lib/python3.9/site-packages (0.1.4)
Requirement already satisfied: mapbox-vector-tile<2.0.0,>=1.2.1 in /opt/homebrew/lib/python3.9/site-packages (from deutschland) (1.2.1)
Requirement already satisfied: Shapely<2.0.0,>=1.7.1 in /opt/homebrew/lib/python3.9/site-packages (from deutschland) (1.8.2)
Requirement already satisfied: requests<3.0.0,>=2.26.0 in /opt/homebrew/lib/python3.9/site-packages (from deutschland) (2.28.1)
Requirement already satisfied: pyclipper in /opt/homebrew/lib/python3.9/site-packages (from mapbox-vector-tile<2.0.0,>=1.2.1->deutschland) (1.3.0.post3)
Requirement already satisfied: setuptools in /opt/homebrew/lib/python3.9/site-packages (from mapbox-vector-tile<2.0.0,>=1.2.1->deutschland) (63.3.0)
Requirement already satisfied: protobuf in /opt/homebrew/lib/python3.9/site-packages (from mapbox-vector-tile<2.0.0,>=1.2.1->deutschland) (4.21.4)
Requirement already satisfied: future in /opt/homebrew/lib/python3.9/site-packages (from mapbox-vector-tile<2.0.0,>=1.2.1->deutschland) (0.18.2)
Requirement already satisfied: certifi>=2017.4.17 in /opt/homebrew/lib/python3.9/site-packages (from requests<3.0.0,>=2.26.0->deutschland) (2022.6.15)
Requirement already satisfied: idna<4,>=2.5 in /opt/homebrew/lib/python3.9/site-packages (from requests<3.0.0,>=2.26.0->deutschland) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/homebrew/lib/python3.9/site-packages (from requests<3.0.0,>=2.26.0->deutschland) (1.26.11)
Requirement already satisfied: charset-normalizer<3,>=2 in /opt/homebrew/lib/python3.9/site-packages (from requests<3.0.0,>=2.26.0->deutschland) (2.1.0)

[notice] A new release of pip available: 22.2.1 -> 22.2.2
[notice] To update, run: python3.9 -m pip install --upgrade pip

(Didn’t save the initial output, but it seems to be successfully installed)

Now running your example code of https://github.com/bundesAPI/deutschland#geographic-data fails early in the game:

% python3 deutschland.py
Traceback (most recent call last):
  File "/Users/ghoffart/src/deutschland.py", line 1, in <module>
    from deutschland.geo import Geo
  File "/Users/ghoffart/src/deutschland.py", line 1, in <module>
    from deutschland.geo import Geo
ModuleNotFoundError: No module named 'deutschland.geo'; 'deutschland' is not a package

Am I overseeing something very obvious? Sorry, Python’s really not part of my knowledge package :-o

Problem mit Bundesanzeiger

Ich verwende Python3.9 in einem dockerimage und plane die bundesAPI in einem
headless Projekt zu verwenden. Aber die Bundesanzeiger API verwendet google-chrome,
kann man das umgehen?


ba = Bundesanzeiger()

====== WebDriver manager ======
====== WebDriver manager ======
/bin/sh: 1: google-chrome: not found
/bin/sh: 1: google-chrome-stable: not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/deutschland/bundesanzeiger/bundesanzeiger.py", line 46, in __init__
    self.driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
  File "/usr/local/lib/python3.9/site-packages/webdriver_manager/chrome.py", line 25, in __init__
    self.driver = ChromeDriver(name=name,
  File "/usr/local/lib/python3.9/site-packages/webdriver_manager/driver.py", line 57, in __init__
    self.browser_version = chrome_version(chrome_type)
  File "/usr/local/lib/python3.9/site-packages/webdriver_manager/utils.py", line 155, in chrome_version
    raise ValueError(f'Could not get version for Chrome with this command: {cmd}')
ValueError: Could not get version for Chrome with this command: google-chrome --version || google-chrome-stable --version

Verena test fails for windows

The tests fail under windows as can be seen here

The reason is under windows somehow the telephon symbol is not parsed properly.

When I tried to print it I saw something like

E       UnicodeEncodeError: 'charmap' codec can't encode character '\u260e' in position 38: character maps to <undefined>

Unfortunately I do not have access to a windows machine to dig deeper in a reasonable manner.

Raw error:

================================== FAILURES ===================================
_________________________________ test_verena _________________________________

    def test_verena():
        v = Verena()
>       res = v.get()

tests\verena\test_verena.py:6: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\deutschland\verena\verena.py:21: in get
    extract = VerenaExtractor(page).extract()
C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\deutschland\verena\verenaextractor.py:38: in extract
    phone, fax, homepage, email, deadline = self.__extract_part4(aus_parts[3])
C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\deutschland\verena\verenaextractor.py:158: in __extract_part4
    print(x)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <encodings.cp1252.IncrementalEncoder object at 0x000002AC31A54F10>
input = '\\r\\r\\n\\r\\r\\n                                \u260e 02381 973060\\r\\r\\n                                '
final = False

    def encode(self, input, final=False):
>       return codecs.charmap_encode(input,self.errors,encoding_table)[0]
E       UnicodeEncodeError: 'charmap' codec can't encode character '\u260e' in position 38: character maps to <undefined>

C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\encodings\cp1252.py:19: UnicodeEncodeError
___________________________ test_extractor_content ____________________________

    def test_extractor_content():
        with open("tests/verena/ausschreibung_test_input.html", "r") as f:
            with open("tests/verena/ausschreibung_correct_result.json", "r") as correct:
                content = "<html><body>" + f.read() + "</body></html>"
                ve = VerenaExtractor(content)
                res = ve.extract()
>               assert len(res) == 1 and res[0] == json.loads(correct.read())
E               AssertionError: assert (1 == 1 and {'comments': ...ulingen', ...} == {'comments': ...ulingen', ...}
E                +  where 1 = len([{'comments': 'Bemerkung zur Stelle: Testbemerkung', 'contact': {'fax': '0172 2222 2222', 'homepage': 'http://www.eine...line/': '17.09.2021', 'desc': 'Eine Schule\nSchule der Sekundarstufe II\ndes Landkreis Schuling\n9999 Schulingen', ...}])
E                 Omitting 11 identical items, use -vv to show
E                 Differing items:
E                 {'contact': {'fax': '0172 2222 2222', 'homepage': 'http://www.eine-schule.de/', 'mail': {'adress': 'bewerbung@eineschul...'mailto:[email protected]?subject=Stellenausschreibung in VERENA', 'subject': 'Stellenausschreibung in VERENA'}}} != {'contact': {'fax': '0172 2222 2222', 'homepage': 'http://www.eine-schule.de/', 'mail': {'adress': '[email protected]?subject=Stellenausschreibung in VERENA', 'subject': 'Stellenausschreibung in VERENA'}, 'phone': '0172 1111 1111'}}
E                 Full diff:
E                   {
E                    'comments': 'Bemerkung zur Stelle: Testbemerkung',
E                    'contact': {'fax': '0172 2222 2222',
E                                'homepage': 'http://www.eine-schule.de/',
E                                'mail': {'adress': '[email protected]',
E                                         'raw': 'mailto:[email protected]?subject=Stellenausschreibung '
E                                                'in VERENA',
E                 -                       'subject': 'Stellenausschreibung in VERENA'},
E                 +                       'subject': 'Stellenausschreibung in VERENA'}},
E                 ?                                                                   +
E                 -              'phone': '0172 1111 1111'},
E                    'deadline': '17.09.2021',
E                    'desc': 'Eine Schule\n'
E                            'Schule der Sekundarstufe II\n'
E                            'des Landkreis Schuling\n'
E                            '9999 Schulingen',
E                    'duration': '01.01.2021 - 01.01.2022',
E                    'geolocation': {'coord_system': 'epsg:25832',
E                                    'coordinates': [1111111,
E                                                    1111111],
E                                    'post_adress': 'Eine Straße 1\n'
E                                                   '99999 Schulingen'},
E                    'hours_per_week': '13,5',
E                    'replacement_job_title': 'Lehrkraft',
E                    'replacement_job_type': 'Vertretung',
E                    'replacement_job_type_raw': 'Vertretung für',
E                    'school_id': '99999',
E                    'subjects': ['Fach 1',
E                                 'Fach 2'],
E                   })

tests\verena\test_verenaextractor.py:12: AssertionError
============================== warnings summary ===============================

Documentation - Suggestions

I started to work on the documentation (see #41) and I have a few suggestions:

  • Include source-code documentation into sphinx
  • Provide the documentation in both german and english
  • Rethink the README - currently it is a mix of quickstart, documentation for specific apis and links to more documentation on autogenerated api-clients; IMO it should only contain a short description of the repo, how to install it and where to find more documentation
  • Provide an Issue template for documentation related issues and tag them as documentation, which would make it easier to plan and track documentation progress and allow to filter bugs and other issues.
  • Host the documentation on readthedocs (@LilithWittmann thats something you should look into)

Charging stations

Hi,
I would like to extract all the charging stations in Germany. I guess the ladestationen would give me this info. However, I'm not sure what "geometry" should be provided as input here. Also, the URI is localhost. Could you please provide a valid URI?

Thanks

UnboundLocalError: local variable 'parsed' referenced before assignment

>>> de.fetch([48.51999, 9.07136], [48.51999, 9.07137])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ajung/src/deutschland/lib/python3.9/site-packages/deutschland/geo.py", line 81, in fetch
    return parsed
UnboundLocalError: local variable 'parsed' referenced before assignment

Bundesanzeiger: Only 1 item is returned for reports w/ same name

Some reports share the same names, e.g. Mitteilung von Netto-Leerverkaufspositionen. The problem is that get_reports cannot return multiple items with the same name because it returns a dictionary/map.

#!/opt/homebrew/bin/python3
from deutschland.bundesanzeiger import Bundesanzeiger
ba = Bundesanzeiger()
data = ba.get_reports("DE000A0TGJ55")
print(data.keys())

print(data["Mitteilung von Netto-Leerverkaufspositionen"])
dict_keys(['Mitteilung von Netto-Leerverkaufspositionen'])
{'date': datetime.datetime(2022, 9, 26, 0, 0), 'name': 'Mitteilung von Netto-Leerverkaufspositionen', 'company': 'BlackRock Investment Management (UK) Limited', 'report': '\n\n\n\n\xa0\n\n\n\n\n\n\n\nBlackRock Investment Management (UK) Limited\nLondon\nMitteilung von Netto-Leerverkaufspositionen\nZu folgendem Emittenten wird vom oben genannten Positionsinhaber eine Netto-Leerverkaufsposition\n            gehalten:\n\nVARTA AKTIENGESELLSCHAFT\n\n\nISIN: DE000A0TGJ55\n\nDatum der Position: 23.09.2022\nProzentsatz des ausgegebenen Aktienkapitals: 2,26 %\n\xa0\n\n\n\n\n\n\n\n\n\n\n\n\n'}

Here you can see how it looks on the website:
Screenshot 2022-10-29 at 17 26 07

Bundesanzeiger not working properly anymore

Running the following sample code:

from deutschland.bundesanzeiger import Bundesanzeiger
ba = Bundesanzeiger()
data = ba.get_reports("Deutsche Bahn AG")

throws the following error:

[/usr/local/lib/python3.10/dist-packages/deutschland/bundesanzeiger/bundesanzeiger.py](https://localhost:8080/#) in __find_all_entries_on_page(self, page_content)
     88         soup = BeautifulSoup(page_content, "html.parser")
     89         wrapper = soup.find("div", {"class": "result_container"})
---> 90         rows = wrapper.find_all("div", {"class": "row"})
     91         for row in rows:
     92             info_element = row.find("div", {"class": "info"})

AttributeError: 'NoneType' object has no attribute 'find_all'

Hacktoberfest

Das topic hacktoberfest hinzufügen um Teilnehmern des Hacktoberfest (mehr Informationen unter https://hacktoberfest.digitalocean.com/) zu ermöglichen hier getätigte Pull-Requests zu nutzen.
Kann dazu beitragen, dass noch mehr Personen auf das Repository/die gesamte Organisation aufmerksam werden.

Bundesanzeiger Import python

Would be great to change the documentation to

from deutschland import bundesanzeiger
ba = bundesanzeiger.Bundesanzeiger()

This worked.

Add Documentation

A proper documentation would be quite useful. Especially if the system keeps growing.
The usage documentation in the Readme is nice to get started but does not show all parameters and advanced usage.

OpenAPI doc for geodata?

I'd like to create another library for .NET, so I thought I'd start with something simple like the geodata API. Turns out I couldn't find the open API doc.
Is there one? Or is there another kind of documentation about that that just wasn't enough to make a repository yet? A cURL/Postman/etc example request would be enough for me, as reading python is not one of my string suits (although I have already started doing that).

import of Bundesanzeiger and Handelsregister not working

The import of Bundesanzeiger and Handelsregister is not working in a new install of the "deutschland" package.

How to reproduce:

  • pip install Shapely
  • pip install deutschland
  • open python REPL
  • Try to import: from deutschland import Bundesanzeiger

Expected result:

  • python modules should be imported

What I got instead:

>>> from deutschland import Bundesanzeiger
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'Bundesanzeiger' from 'deutschland' (/opt/homebrew/Caskroom/miniforge/base/envs/py_de/lib/python3.9/site-packages/deutschland/__init__.py)
>>> from deutschland import Handelsregister
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'Handelsregister' from 'deutschland' (/opt/homebrew/Caskroom/miniforge/base/envs/py_de/lib/python3.9/site-packages/deutschland/__init__.py)

My environment:

  • Mac 11.5.2. Big Sur, Python 3.9.6, pip 21.2.4, conda 4.10.3 (via Miniforge)

Geo tests are failing

Seems like the API changed:

When I run the test, I get the following:

HTTPSConnectionPool(host='adv-smart.de', port=443): Max retries exceeded with url: /tiles/smarttiles_de_public_v1/15/17605/10747.pbf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)')))

To reproduce:

wget https://adv-smart.de/tiles/smarttiles_de_public_v1/15/17605/10745.pbf --no-check-certificate

Gives 404 error.

@LilithWittmann Do you know more?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.