enram / vptstools Goto Github PK
View Code? Open in Web Editor NEWPython library to transfer and convert vertical profile time series data
Home Page: https://enram.github.io/vptstools/
License: MIT License
Python library to transfer and convert vertical profile time series data
Home Page: https://enram.github.io/vptstools/
License: MIT License
In order to speed up the uploads towards the S3, the handling of multiple files at the same time would be a huge improvement.
A first option would be working with async, but as the boto3 libraryr currently not yet support async handling, this approach will yet fail to work. Working with multiple threads or working parallel would be an valid option to implement.
Need to be able to remove all CSV files (daily/monthly) in the s3 aloft bucket and replace them by a new run with a given schema-version implementation.
Need to be operational to have a rerun after is merged. enram/vpts-csv#42
Logs for entire bucket indicate:
September 06, 2023 at 12:34 (UTC+2:00)[WARNING] - During conversion from HDF5 files of baltrad/bejab at 2018-06-03 to daily VPTS file, the following error occurred: 'vcp'.
sync
September 06, 2023 at 12:33 (UTC+2:00)Create daily VPTS file baltrad/daily/bejab/2018/bejab_vpts_20180603.csv.
sync
What is the vcp
error?
vptstools/src/vptstools/vpts.py
Line 476 in 6a69534
-> by only adjusting the url so report = validate_vpts(df_vpts, version="v1")
becomes possible
@stijnvanhoey CSV files can be validated with:
datapackage.json
file with the following content:{
"profile": "tabular-data-package",
"resources": [
{
"name": "vpts",
"path": "vpts.csv",
"profile": "tabular-data-resource",
"format": "csv",
"mediatype": "text/csv",
"encoding": "utf-8",
"schema": "https://raw.githubusercontent.com/enram/vpts-csv/main/vpts-csv-table-schema.json"
}
]
}
datapackage.json
file in the same directory as your CSV file. Rename the path
value if necessary to point to the CSV file (above named vpts.csv
).frictionless validate datapackage.json
how/task
)I notice we have many tags and 1 release. @stijnvanhoey:
tag
sufficient to have it be picked up by GitHub Actions and used in the operational pipeline? Or does it require a release.tag
sufficient to have it be published to PyPi? Yes, see https://pypi.org/project/vptstools/I got an AWS notification email today:
CLI routine 'vph5_to_vpts --modified-days-ago 2' failed raising error: '<class 'ValueError'>: File name uva/hdf5/dbl/2008/02/17/nldbl_vp_20080217t0000_nl50_v0-3-20.h5 is not a valid ODIM h5 file.'.
It now includes the name of the file ๐. The mentioned file is however no longer in the repository. It was deleted August 21 or 22 (2 or 1 days ago). I'll see tomorrow if the issue resolves itself, i.e. the inventory is updated and the file is no longer listed there and no notification is generated.
@stijnvanhoey @TheJenne18 can deleted files linger in the inventory? Should this resolve itself automatically? Not sure we considered this when designing the architecture.
Note this error did not stop the creation of the daily and monthly files ๐
vph5_to_vpts
requires dotenv, but it seems the installation fails:
vph5_to_vpts
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/bin/vph5_to_vpts", line 5, in <module>
from vptstools.bin.vph5_to_vpts import cli
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vptstools/bin/vph5_to_vpts.py", line 10, in <module>
from dotenv import load_dotenv
pip3 install dotenv
Collecting dotenv
Using cached dotenv-0.0.5.tar.gz (2.4 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... error
error: subprocess-exited-with-error
ร pip subprocess to install backend dependencies did not run successfully.
โ exit code: 1
โฐโ> [29 lines of output]
Collecting distribute
Using cached distribute-0.7.3.zip (145 kB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
ร Preparing metadata (pyproject.toml) did not run successfully.
โ exit code: 1
โฐโ> [6 lines of output]
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: invalid command 'dist_info'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
ร Encountered error while generating package metadata.
โฐโ> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
ร pip subprocess to install backend dependencies did not run successfully.
โ exit code: 1
โฐโ> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
When running tox -e docs
, all works fine, but a warning is raised:
/Users/peter_desmet/Coding/Repositories/enram/vptstools/src/vptstools/__init__.py:docstring of vptstools.vpts.vpts:1: WARNING: duplicate object description of vptstools.vpts, other instance in api/vptstools, use :no-index: for one of them
WARNING: autodoc: failed to import module 'transfer_baltrad' from module 'vptstools.bin'; the following exception was raised:
cannot import name 'report_exception_to_sns' from 'vptstools.bin.click_exception' (/Users/peter_desmet/Coding/Repositories/enram/vptstools/docs/../src/vptstools/bin/click_exception.py)
Using these files that I save in data
:
https://aloftdata.s3-eu-west-1.amazonaws.com/baltrad/hdf5/nldbl/2013/11/23/nldbl_vp_20131123T0000Z.h5
https://aloftdata.s3-eu-west-1.amazonaws.com/baltrad/hdf5/nldbl/2013/11/23/nldbl_vp_20131123T0015Z.h5
https://aloftdata.s3-eu-west-1.amazonaws.com/baltrad/hdf5/nldbl/2013/11/23/nldbl_vp_20131123T0030Z.h5
I get an error if I want to reproduce the second README example:
from pathlib import Path
from vptstools.vpts import vpts
file_paths = sorted(Path("./data").rglob("*.h5")) # Get all h5 files within the data directory
df_vpts = vpts(file_paths)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 120, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 129, in _main
prepare(preparation_data)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 240, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/spawn.py", line 291, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen runpy>", line 291, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/test.py", line 5, in <module>
df_vpts = vpts(file_paths)
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vptstools/vpts.py", line 256, in vpts
with multiprocessing.Pool(processes=cpu_count) as pool:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 215, in __init__
self._repopulate_pool()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 306, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If I print file_paths
I get:
[PosixPath('data/nldbl_vp_20131123T0000Z.h5'), PosixPath('data/nldbl_vp_20131123T0015Z.h5'), PosixPath('data/nldbl_vp_20131123T0030Z.h5')]
Cf. #17 @stijnvanhoey, you probably have already started with this?
Note that the generate_coverage.py
might no longer be needed if we use AWS Inventory.
test.yml
: run for every commit, one Python version / for PR, all Python versionsdocumentation.yml
: for every commit to main, run test, run Sphinx, push to gh-pages
branchrelease.yml
: run for commits with a tag, run tests for all Python version, push to Pypihttps://github.com/enram/vptstools/blob/main/setup.cfg has been updated to have INBO as author. When I install from PyPi, I still see the old information:
pip3 show vptstools
Name: vptstools
Version: 0.2.2
Summary: Tools to work with vertical profile time series.
Home-page: https://enram.github.io/vptstools/
Author: enram
Author-email:
License: MIT
Location: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages
Requires: click, frictionless, h5py, pandas, pytz
Required-by:
How can an updated be forced?
Monthly data, e.g. https://aloft.s3-eu-west-1.amazonaws.com/baltrad/monthly/bejab/2023/bejab_vpts_202302.csv.gz return an error when trying to unzip on Mac:
See enram/vpts-csv#40 (comment)
did you also encounter NULL (rather than NaN) values for these properties?
Need to double check this for ff, dd , sd_vvp, eta
from notebook/script -> alos CLI run for the cronjob
Points to cover:
Line 54 in f99dc47
When enram/vpts-csv#41 is approved, the v1 version of the vpts-class should include the source_file
as an additional sort-field: dict(radar=str, datetime=str, height=int, source_file=str)
Lines 94 to 97 in 6a69534
@stijnvanhoey I wonder if we can remove the https://github.com/enram/aws repo. Most commits are made by you.
s3fs is a convenient package to interact with s3 (feels like coding against normal file system), but since s3f3 relies on the async aiobotocore package some issues arised:
raw_headers
which are not according to specs in the moto-library used in the testing (see getmoto/moto#3259 for description and 802b9f5 for the fix)Hence, these issues have been handled in https://github.com/enram/vptstools/tree/SVH-country-filter, but I'm not sure if the convenience of s3fs is worthwhile having it as a dependency. It might be that excluding s3fs would make things easier to maintain.
I'd like to (sooner than later) use MyPy to make this package more robust.
A first try give me error: Skipping analyzing "odimh5.reader": found module but no type hints or library stubs
which seems weird since the odimh5
page is type-annotated.
To investigate.
In the https://github.com/adokter/vol2bird/wiki/ODIM-bird-profile-format-specification#specification-of-bird-profile-output-in-odim-hdf5-format specification there is a gain
and an offset
for the datasets/variables. In the conversion from h5 to vpts-CSV, the current implementation does not take these into account. @adokter, should this actually be done by default and store in the vpts-csv version for each record quantity*gain+offset
instead of the quantity
?
Are the following issues reported at data-repository still relevant and/or tackled by vptstools?
Add section Usage
with a simple reproducible example (python or command line code) to show how three h5 files (downloaded from aloft) can be converted to vpts csv. Tackle in #50
See #50 for a start
The coverage.csv
only provides coverage for the hdf5 portion of each source (baltrad, uva, ecog). It would be useful if it also provided the coverage for the daily
and monthly
portion for each source. That is not a trivial change however, since the coverage is based on the AWS inventory and that inventory is limited to the hdf5 files (because that is the aspect that is important for the vph5_to_vpts).
Make sure to add separate unit test for the duplication fact and sorting.
@peterdesmet what zip-format to use? .zip, .tar.gz,... ?
library(dplyr)
library(readr)
library(bioRad)
files <- list.files("~/Downloads/bejab/aloft/hdf5", full.names = TRUE)
vp <- bioRad::read_vpfiles(files)
vpts <-
bioRad::bind_into_vpts(vp) %>%
as.data.frame(geo = TRUE, suntime = FALSE) %>%
dplyr::arrange(datetime, height)
readr::write_csv(vpts, "vpts.csv") # This converts NaN to NA, I have manually set those back to NaN below
nrow(vpts)
# 35575
radar | datetime | ff | dbz | dens | u | v | gap | w | n_dbz | dd | n | DBZH | height | n_dbz_all | eta | sd_vvp | n_all | lat | lon | height_antenna |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
bejab | 2023-02-02T00:00:00Z | 0 | NaN | -2.578521966934204 | 18.094850540161133 | NaN | NaN | 1 | NaN | 2237 | NaN | 227 | 25.863487243652344 | 8821 | 199.04335021972656 | 2.0527188777923584 | 436 | 51.191700000000004 | 3.0642000000000005 | 50 |
bejab | 2023-02-02T00:00:00Z | 0 | NaN | -3.3399300575256348 | 15.184979438781738 | NaN | NaN | 1 | NaN | 1568 | NaN | 163 | 24.80197525024414 | 8828 | 167.03477478027344 | 2.5198092460632324 | 456 | 51.191700000000004 | 3.0642000000000005 | 50 |
bejab | 2023-02-02T00:00:00Z | 0 | NaN | -3.988534688949585 | 13.078372955322266 | NaN | NaN | 1 | NaN | 2122 | NaN | 223 | 23.728849411010742 | 8862 | 143.8621063232422 | 2.4304895401000977 | 444 | 51.191700000000004 | 3.0642000000000005 | 50 |
bejab | 2023-02-02T00:00:00Z | 200 | 3.0408787727355957 | -7.179384708404541 | 6.272905349731445 | 2.805347204208374 | -1.1734440326690674 | 0 | -21.234346389770508 | 19771 | 112.69898986816406 | 740 | 16.758867263793945 | 22850 | 69.00196075439453 | 2.9160335063934326 | 873 | 51.191700000000004 | 3.0642000000000005 | 50 |
bejab | 2023-02-02T00:00:00Z | 200 | 3.2996182441711426 | -7.66164493560791 | 5.613616466522217 | 2.8031976222991943 | -1.740564227104187 | 0 | 5.670511245727539 | 18076 | 121.8370132446289 | 568 | 18.90127182006836 | 22860 | 61.749778747558594 | 2.4480631351470947 | 866 | 51.191700000000004 | 3.0642000000000005 | 50 |
d1 <- readr::read_csv("~/Downloads/aloft/bejab/bejab_vpts_20230202.csv")
d2 <- readr::read_csv("~/Downloads/aloft/bejab/bejab_vpts_20230203.csv")
d3 <- readr::read_csv("~/Downloads/aloft/bejab/bejab_vpts_20230213.csv")
d4 <- readr::read_csv("~/Downloads/aloft/bejab/bejab_vpts_20230214.csv")
d5 <- readr::read_csv("~/Downloads/aloft/bejab/bejab_vpts_20230215.csv")
daily <- bind_rows(d1, d2, d3, d4, d5)
radar | datetime | height | u | v | w | ff | dd | sd_vvp | gap | eta | dens | dbz | dbz_all | n | n_dbz | n_all | n_dbz_all | rcs | sd_vvp_threshold | vcp | radar_latitude | radar_longitude | radar_height | radar_wavelength | source_file |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
bejab | 2023-02-02T00:00:00Z | 0 | NaN | NaN | NaN | NaN | NaN | 2.0527188777923584 | TRUE | 199.04335021972656 | 18.094850540161133 | -2.578521966934204 | 25.863487243652344 | 227 | 2237 | 436 | 8821 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T000000Z_0x9.h5 | |
bejab | 2023-02-02T00:00:00Z | 0 | NaN | NaN | NaN | NaN | NaN | 2.5198092460632324 | TRUE | 167.03477478027344 | 15.184979438781738 | -3.3399300575256348 | 24.80197525024414 | 163 | 1568 | 456 | 8828 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T000500Z_0x9.h5 | |
bejab | 2023-02-02T00:00:00Z | 0 | NaN | NaN | NaN | NaN | NaN | 2.4304895401000977 | TRUE | 143.8621063232422 | 13.078372955322266 | -3.988534688949585 | 23.728849411010742 | 223 | 2122 | 444 | 8862 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T001000Z_0x9.h5 | |
bejab | 2023-02-02T00:00:00Z | 200 | 2.805347204208374 | -1.1734440326690674 | -21.234346389770508 | 3.0408787727355957 | 112.69898986816406 | 2.9160335063934326 | FALSE | 69.00196075439453 | 6.272905349731445 | -7.179384708404541 | 16.758867263793945 | 740 | 19771 | 873 | 22850 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T000000Z_0x9.h5 | |
bejab | 2023-02-02T00:00:00Z | 200 | 2.8031976222991943 | -1.740564227104187 | 5.670511245727539 | 3.2996182441711426 | 121.8370132446289 | 2.4480631351470947 | FALSE | 61.749778747558594 | 5.613616466522217 | -7.66164493560791 | 18.90127182006836 | 568 | 18076 | 866 | 22860 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T000500Z_0x9.h5 |
monthly <- readr::read_csv("~/Downloads/aloft/bejab_vpts_202302.csv.gz")
radar | datetime | height | u | v | w | ff | dd | sd_vvp | gap | eta | dens | dbz | dbz_all | n | n_dbz | n_all | n_dbz_all | rcs | sd_vvp_threshold | vcp | radar_latitude | radar_longitude | radar_height | radar_wavelength | source_file |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
bejab | 2023-02-02T00:00:00Z | 0 | 2.0527188777923584 | TRUE | 199.04335021972656 | 18.094850540161133 | -2.578521966934204 | 25.863487243652344 | 227 | 2237 | 436 | 8821 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T000000Z_0x9.h5 | ||||||
bejab | 2023-02-02T00:00:00Z | 0 | 2.5198092460632324 | TRUE | 167.03477478027344 | 15.184979438781738 | -3.3399300575256348 | 24.80197525024414 | 163 | 1568 | 456 | 8828 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T000500Z_0x9.h5 | ||||||
bejab | 2023-02-02T00:00:00Z | 0 | 2.4304895401000977 | TRUE | 143.8621063232422 | 13.078372955322266 | -3.988534688949585 | 23.728849411010742 | 223 | 2122 | 444 | 8862 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T001000Z_0x9.h5 | ||||||
bejab | 2023-02-02T00:00:00Z | 200 | 2.805347204208374 | -1.1734440326690674 | -21.234346389770508 | 3.0408787727355957 | 112.69898986816406 | 2.9160335063934326 | FALSE | 69.00196075439453 | 6.272905349731445 | -7.179384708404541 | 16.758867263793945 | 740 | 19771 | 873 | 22850 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T000000Z_0x9.h5 | |
bejab | 2023-02-02T00:00:00Z | 200 | 2.8031976222991943 | -1.740564227104187 | 5.670511245727539 | 3.2996182441711426 | 121.8370132446289 | 2.4480631351470947 | FALSE | 61.749778747558594 | 5.613616466522217 | -7.66164493560791 | 18.90127182006836 | 568 | 18076 | 866 | 22860 | 11.0 | 2.0 | 51.1917 | 3.0642 | 50 | 5.3 | bejab_vp_20230202T000500Z_0x9.h5 |
testthat::expect_equal(daily, monthly)
@TheJenne18 ran a full processing with --modified-days-ago 0
, using 8 vCPUs en 16GB. The job stopped silently, with no data added to the bucket. The only line in the log file is:
Recreate the full set of bucket files (files modified since 401days). This will take a while!
Which I assume is just the start of the process. Did it fail reading the full inventory in memory? It might be useful to provide more messages, to know at what point the processing failed.
I downloaded a set of files from the bejab data as a test case and while trying out the CSV concatenation (to create a vpts-csv), I encountered repeated timestamps for multiple files not corresponding to the file name included timestamp:
FILENAME /what/time
bejab_vp_20221111T233000Z_0x9.h5 233000
bejab_vp_20221111T234000Z_0x9.h5 233000
bejab_vp_20221111T234500Z_0x9.h5 234500
bejab_vp_20221111T235000Z_0x9.h5 234500
bejab_vp_20221111T235500Z_0x9.h5 234500
To check, I downloaded some files from the Baltrad sftp directly and compared the timestamp of the file with the timestamp of the /what/time
, leading to several of these differences (67% on my quick test on 50 files):
FILE WHAT/TIME FILEPATH
2250 2245 bejab_vp_20221112T225000Z_0x9.h5
0235 0230 bewid_vp_20221113T023500Z_0xb.h5
1635 1630 chppm_vp_20221114T163500Z_0xb.h5
0310 0300 dedrs_vp_20221115T031000Z_0xb.h5
0105 0100 defbg_vp_20221114T010500Z_0xb.h5
1025 1015 deisn_vp_20221115T102500Z_0xb.h5
0125 0115 denhb_vp_20221114T012500Z_0xb.h5
0505 0500 denhb_vp_20221114T050500Z_0xb.h5
1210 1200 eehar_vp_20221113T121000Z_0xb.h5
0410 0400 eehar_vp_20221114T041000Z_0xb.h5
0520 0515 esalm_vp_20221114T052000Z_0xb.h5
1410 1400 esbar_vp_20221113T141000Z_0xb.h5
1420 1415 essse_vp_20221114T142000Z_0xb.h5
1040 1030 esval_vp_20221115T104000Z_0xb.h5
0150 0145 filuo_vp_20221114T015000Z_0xb.h5
0255 0245 finur_vp_20221114T025500Z_0xb.h5
1440 1430 frabb_vp_20221114T144000Z_0xb.h5
0050 0045 frcol_vp_20221115T005000Z_0xb.h5
1835 1830 frmcl_vp_20221114T183500Z_0xb.h5
1340 1330 frmom_vp_20221114T134000Z_0xb.h5
2050 2045 frnim_vp_20221113T205000Z_0xb.h5
0320 0315 frniz_vp_20221113T032000Z_0xb.h5
0640 0630 frtou_vp_20221113T064000Z_0xb.h5
2250 2245 frtra_vp_20221114T225000Z_0xb.h5
0825 0815 frtre_vp_20221113T082500Z_0xb.h5
0605 0600 nohgb_vp_20221115T060500Z_0xb.h5
1555 1545 nosmn_vp_20221113T155500Z_0xb.h5
0020 0015 plram_vp_20221114T002000Z_0xb.h5
2205 2200 sekaa_vp_20221113T220500Z_0xb.h5
0210 0200 sevax_vp_20221113T021000Z_0xb.h5
@peterdesmet is this a known issue or am I stuck on a bug I just can't get around? For the latter experiment I relied only on h5py package as a dependency (I left out the vptstools modules and just tried to extract only the timestamps):
import h5py
file_paths = sorted(Path("../data/raw/baltrad/").rglob("*.h5"))
for j, path_h5 in enumerate(file_paths):
with h5py.File(path_h5, mode="r") as odim_vp:
time_filename = path_h5.stem.split("_")[2][9:13]
time_h5_what = odim_vp["what"].attrs.get("time").decode("utf-8")[:-2]
if time_filename != time_h5_what:
print(time_filename, time_h5_what, path_h5)
The time difference might not be an issue if the timestamps are unique among the different files. Or should we rather use the timestamp from the file path of the h5 files?
I'm getting daily AWS notifications with:
CLI routine 'vph5_to_vpts --modified-days-ago 2' failed raising error: '<class 'ValueError'>: File name is not a valid ODIM h5 file.'.
It would be helpful to:
Pseudo code:
h5_files = get_h5_files(radar, start, end, source) # returns list of paths
df = empty df
for h5_file in h5_files:
df = h5_to_df(h5_file)
append(df, df)
pandas:write_csv(df, "some/path/name.csv")
So:
a custom function get_h5_files()
that understand the directory structure of the repo. It likely makes use of the s3 library under the hood to get a list of file paths that match a radar, start, end date, source criterium.
a custom function h5_to_df()
that reads a h5 file and converts it to VPTS CSV format, but as a dataframe, not a file. The function can be called many times to build a growing data frame.
a generic write_csv()
function (e.g. from pandas) that writes the df to a file at some location. The write_csv() settings should match those of the csv dialect defined for VPTS CSV
@stijnvanhoey in the call we discovered that the ECS revisions always have latest
in their tag. I think that is defined in the release.yml
in this repository though:
vptstools/.github/workflows/release.yml
Line 158 in 811e76f
Would you know how to change this to include the version number of vptstools?
Suggestion by @TheJenne18 to avoid confusion from what environment notifications are send (#62), it might be useful to provide the os.environment("ENV")
in error messages such as:
CLI routine 'vph5_to_vpts --modified-days-ago 2' failed raising error: '<class 'ValueError'>: File name uva/hdf5/dbl/2008/02/17/nldbl_vp_20080217t0000_nl50_v0-3-20.h5 is not a valid ODIM HDF5 file.'.
@stijnvanhoey, you probably have already started with this? The idea is to remove the script at https://github.com/enram/data-repository/tree/master/transfer_2022 and make it part of the vptstools.
See enram/vpts-csv#42, implementation wise:
s3://aloft/baltrad...
dict(radar=str, datetime=str, height=int, source_file=str)
The current configuration on the server for the logs does not take into account log-rotation:
...>> /home/ubuntu/transfer-
date +%Y%m%d%H%M%S.log 2>&1
in logs to a log to file and adding logrotate config on server-level.tox -e dev
source venv/bin/activate
tox
Tests works fine until 98% after which I get an error:
tests/test_vpts_csv.py::TestVptsCsvV1SupportFun::test_check_source_file PASSED [ 98%]
/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/coverage/data.py:166: CoverageWarning: Couldn't use data file '/Users/peter_desmet/Coding/Repositories/enram/vptstools/.coverage.Peters-MacBook-Air.local.50554.789008-journal': file is not a database
data._warn(str(exc))
tests/test_vpts_csv.py::TestVptsCsvV1SupportFun::test_check_source_file_wrong_file PASSED [100%]
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/_pytest/main.py", line 270, in wrap_session
INTERNALERROR> session.exitstatus = doit(config, session) or 0
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/_pytest/main.py", line 324, in _main
INTERNALERROR> config.hook.pytest_runtestloop(session=session)
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/pluggy/_hooks.py", line 265, in __call__
INTERNALERROR> return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/pluggy/_manager.py", line 80, in _hookexec
INTERNALERROR> return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/pluggy/_callers.py", line 55, in _multicall
INTERNALERROR> gen.send(outcome)
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/pytest_cov/plugin.py", line 297, in pytest_runtestloop
INTERNALERROR> self.cov_controller.finish()
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/pytest_cov/engine.py", line 44, in ensure_topdir_wrapper
INTERNALERROR> return meth(self, *args, **kwargs)
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/pytest_cov/engine.py", line 242, in finish
INTERNALERROR> self.cov.stop()
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/coverage/control.py", line 807, in combine
INTERNALERROR> combine_parallel_data(
INTERNALERROR> File "/Users/peter_desmet/Coding/Repositories/enram/vptstools/.tox/py39/lib/python3.9/site-packages/coverage/data.py", line 148, in combine_parallel_data
INTERNALERROR> with open(f, "rb") as fobj:
INTERNALERROR> FileNotFoundError: [Errno 2] No such file or directory: '/Users/peter_desmet/Coding/Repositories/enram/vptstools/.coverage.Peters-MacBook-Air.local.50618.245710-journal'
odimh5
functionality is integrated in vptstoolsodimh5
be removed from PyPI? See also enram/odimh5#2. Or be marked as deprecated?@niconoe when running the current unit tests, there is a reference to sample data, which is not available in the repository. Is there a reference or documentation available on the example data setup? Should I just put 'any' h5 file to make the test_error_non_vp_source_file
test working (as this should fail)?
The directory consensus for files is (#enram/data-repository#65 (comment)) is source/format/radar/yyyy/
I suggest:
# source data
baltrad/hdf5/radar/yyyy/mm/dd/ file.h5
# daily unzipped csv
baltrad/daily/radar/yyyy/ file.csv
# monthly gzipped csv
baltrad/monthly/radar/yyyy/ file.csv.gz
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.