ncasuk / amf-check-writer Goto Github PK
View Code? Open in Web Editor NEWLibrary to write AMF compliance checks
License: BSD 3-Clause "New" or "Revised" License
Library to write AMF compliance checks
License: BSD 3-Clause "New" or "Revised" License
Tasks:
Check we can install: https://github.com/cedadev/cc-yaml
Update dependencies
Update to Python 3
Discuss appropriate format.
Should it be converted to readthedocs
?
Can it be used for a paper?
@gapintheclouds this may overlap with an existing github Issue but just creating it as a reminder to pick up.
AMF_instrument.json
AMF_platform.json
AMF_product.json
AMF_scientist.json
Look at the overall workflow and document all the components and their interactions.
Write a basic document that outlines the entire system with diagram(s).
See if the workflow works okay retrospectively with v1.1 of the vocabs and checks.
Currently the voc-concentration.xlsx
spreadsheet contains the variable name mole_fraction_of_<voc_species_name>_in_air
. This needs to be addressed by adding a system that can cope with this syntax.
download-from-drive:
create-yaml-checks:
create-cvs:
Tagging @gap736uk
The tests need checking and updating.
cd compliance-check-lib/
python -m pytest tests
source
attribute when comparing to multiple vocab lists (NCAS and community instruments - accessed via data:description
properties.Tagging: @gapintheclouds
The spreadsheets define a possible type as: "Integer"
However, the regex checks expect everything to be a string. This causes a warning when running the test because the check tries to perform a regular expression match against a non-string type.
The simplest fix is just to patch compliance-check-lib
so that it will convert all global attrs to strings if doing a regex check:
(amf)$ cd compliance-check-lib/
(amf)$ git diff
diff --git a/checklib/code/nc_util.py b/checklib/code/nc_util.py
index 66553b2..41672e6 100644
--- a/checklib/code/nc_util.py
+++ b/checklib/code/nc_util.py
@@ -77,7 +77,9 @@ def check_global_attr_against_regex(ds, attr, regex):
"""
if attr not in ds.ncattrs():
return 0
- if not re.match("^{}$".format(regex), getattr(ds, attr), re.DOTALL):
+
+ # Always coerce the attribute to a string to do the regex check
+ if not re.match("^{}$".format(regex), str(getattr(ds, attr)), re.DOTALL):
return 1
# Success
return 2
NOTE: The test code for seeing if you get a warning is:
export DATA_DIR=check-data-2021-09-15
VERSION=v2.0
export PYESSV_ARCHIVE_HOME=$DATA_DIR/$VERSION/pyessv-vocabs
TEST_FILE=../NCAS-Data-Project-Training-Data/Data/ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc
amf-checker --yaml-dir $DATA_DIR/$VERSION/checks $TEST_FILE --version $VERSION
The initial workflow is as follows:
With a change made to the compliance-checker-lib this now works. The change was:
Suggested a meeting on Wed 15th Sept (2021!)
Once we have a reproducible installation for the compliance-checker, we want to check that our plugin works with it.
The repository is here:
https://github.com/cedadev/cc-yaml
See README for instructions to getting it started.
The measure of success is whether the command-line setup works okay - i.e. the cchecker.py
script is accepting our new command-line option:
--yaml <path-to-YAML-file>
These 2 packages can be installed from external releases (PyPI or conda-forge):
We should package, release and publish our own packages to PyPI - and maybe conda-forge.
Because of the way I've implemented specific global attributes for the YAML check creator, the controlled variable creator doesn't work for specific global attributes.
Get amf-checker working. Stages are:
requirements.txt
to non-specific versionspip install -r requirements.txt
- to install all dependencies.pip install --editable . --no-deps
- to install local package as a link so that any local edits are reflected when you run.Can you now run command-line script: download-from-drive
?
To generate the training data and run the checker on it all (in a re-runnable way):
https://github.com/barbarabrooks/NCAS-Data-Project-Training-Data
Download, run it locally and it should generate the training data in a local data/
directory.
A new compliance checking rule for global attributes called "Exact match in vocabulary" needs to check whether the corresponding issue exists in the corresponding volcabulary.
Eg. In the 'global_attributes' sheet in the file '_common.xlsx', the check called 'source' needs an "Exact match in vocabulary" from the list contained in the 'Descriptor' column of the 'ncas_instrument_name_and_descriptors' (or in this case possibly also 'community-instrument-name-and-descriptors') sheet in the file '_vocabularies.xlsx'.
The file I've been testing with is the training file 'ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc'. For this file the source global attribute needs to equal "NCAS Mechanicle Anemometer unit 1" as given in cell L37.
Currently a warning is produced when the spreadsheet_handler is looking for vocabularies sheets. Need to adjust for this.
Mirror other classes, DimensionsCV
is probably most useful.
We might be able to completely re-use this class:
Also need to make sure that global attributes worksheets are saved as TSV files in the same way that dimensions-specific
and variables-specific
are saved, e.g.:
https://github.com/ncasuk/AMF_CVs/tree/master/product-definitions/tsv/moisture-profiles
Something is going wrong when using a relative path for spreadsheets dir with create-cvs
:
(venv) vagrant@localhost:/vagrant/AMF_CVs$ create-cvs spreadsheets ./AMF_CVs
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Variables - Air.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Variables - Land.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Variables - Sea.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Dimensions - Air.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Dimensions - Land.tsv'
WARNING: Expected to find file at 'spreadsheets/spreadsheets/Common.xlsx/Dimensions - Sea.tsv'
It is looking under spreadsheets/spreadsheets
instead of just spreadsheets
. Using absolute path works as expected.
This may also affect create-yaml-checks
.
Once published:
amf-compliance-checks
to ZenodoAMF_CVs
to ZenodoOld
Git submodule does behave exactly as we would expect. In particular, the main github repo that holds the submodule, keeps track of the specific commit point on the submodule timeline and binds permanently to that unless you manage it. In order to manage it you might need to:
cd submod/
git checkout master # or other point
cd ../ # back to main repo
git add submod # to tell it you want to update commit that is used in submodule
git commit -m 'Updated submodule commit point'
git push
See details here:
Maybe it would be easier for us to avoid using submodules. Not sure at the moment.
To get YAML checks to work they need version numbers
See:
$ diff /vagrant-share/amf-compliance-checker-work-2021/check-data-2021-08-27/yaml_checks/v2.0/AMF_global_attrs.yml /vagrant-share/amf-compliance-checker-work-2021/check-data-2021-08-27/yaml_checks/v2.0/AMF_product_common_global-attributes_land.yml
1c1
< suite_name: global_attrs_checks:2.0
---
> suite_name: product_common_global-attributes_land_checks:2.0
3c3
< description: Check 'global_attrs' in AMF files
---
> description: Check 'product common global-attributes land' in AMF files
They are the same, but both are referenced in the main check files (land, sea, air and trajectory):
$ grep attr yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_{land,sea,air,trajectory}.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_land.yml:- __INCLUDE__: AMF_global_attrs.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_land.yml:- __INCLUDE__: AMF_product_aerosol-backscatter-radial-winds_global-attributes.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_land.yml:- __INCLUDE__: AMF_product_common_global-attributes_land.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_sea.yml:- __INCLUDE__: AMF_global_attrs.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_sea.yml:- __INCLUDE__: AMF_product_aerosol-backscatter-radial-winds_global-attributes.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_sea.yml:- __INCLUDE__: AMF_product_common_global-attributes_sea.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_air.yml:- __INCLUDE__: AMF_global_attrs.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_air.yml:- __INCLUDE__: AMF_product_aerosol-backscatter-radial-winds_global-attributes.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_air.yml:- __INCLUDE__: AMF_product_common_global-attributes_air.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_trajectory.yml:- __INCLUDE__: AMF_global_attrs.yml
yaml_checks/v2.0/AMF_product_aerosol-backscatter-radial-winds_trajectory.yml:- __INCLUDE__: AMF_product_aerosol-backscatter-radial-winds_global-att
We need to work out which one should be favoured.
meaningful stuff
We are dependent on the IOOS compliance checker. It has a number of its own dependencies. We need a standard python environment to install it into.
Tasks:
Installation recipes:
https://drive.google.com/drive/u/1/folders/1GNfifCvctYJgTjUoBjfkFKx9yR_ddMjv
Data Products / v3.0.0 /
comm & vocab/
_common
_vocabularies
timeSeries/
<instr sub-type>/
<product>
timeSeriesProfile/
<instr sub-type>/
<product>
trajectory/
<instr sub-type>/
<product>
** We hope that we can just identify the product spreadsheets and
write them to the same flattened structure used in v1 and v2! **
We need to make sure that the spreadsheet scanner can walk the directory structure and download all the spreadsheets that it finds.
Bring up to date with current state.
See example repo here, with links to binder
for running notebooks in the cloud.
Vocabularies_v2.xlsx
is now Vocabularies.xlsx
Instrument Names & Data Products.xlsx
I've currently commented out the raw spreadsheet downloader as I am having difficulty authorising the downloader with the correct credentials file.
Install:
https://github.com/cedadev/compliance-check-lib
Initially, we just want the unit tests to run, as outlined here:
https://github.com/cedadev/compliance-check-lib#testing
The tests use lots of test data inside the repository.
@gapintheclouds it would be great to do more extensive testing of Barb's test data repository - against the current state of the checker. It is bound to show up some issues in the integrity of the checker.
E.g.: aerosol-size-distribution/aerosol-size-distribution.xlsx/Global Attributes - Specific
Put in WARNING for now in code that they are not working yet.
Request BB adds in compliance checking rules in spreadsheets.
I have added a warning line in the output (to stderr) when the parser cannot understand a row (normally because the compliance checking rule is not defined yet). We can use these to retrospectively fix the spreadsheets on the google drive.
I will push to master
.
This code shows how to create YAML with nice line breaks and it preserves the order of components based on an OrderedDict
in the order they were added:
https://github.com/roocs/roocs-utils/blob/master/roocs_utils/inventory/inventory.py#L21-L44
@gapintheclouds Assuming you have your base environment activated. Try:
git clone https://github.com/ES-DOC/pyessv
export PYESSV_ARCHIVE_HOME=<YOUR_WORKING_DIR>/compliance-check-lib/cc-vocab-cache/pyessv-archive-eg-cvs
Then save this in a script (test-pyessv.py
) and see if it runs okay with python test-pyessv.py
:
import pyessv
# Authority: ncas.
ncas = pyessv.NCAS
assert isinstance(ncas, pyessv.Authority)
print('ncas/amf/flux-components-variable/air-pressure')
amf = ncas.amf
assert isinstance(amf, pyessv.Scope)
fcv = amf.flux_components_variable
assert isinstance(fcv, pyessv.Collection)
air_pressure = fcv.air_pressure
assert isinstance(air_pressure, pyessv.Term)
assert pyessv.load('ncas:amf:flux-components-variable') == fcv
assert pyessv.load('ncas:amf:flux-components-variable:air-pressure') == air_pressure
It seems to work fine for me.
Updates to spreadsheets (progress):
Three of them still need attention. @barbarabrooks - please can you take a look at those 3.
Update documentation within python scripts with more details on how they work.
Re-run compliance-check-lib tests with newly generated version of PYESSV vocabs. To check nothing big has changed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.