wswup / agweather-qaqc Goto Github PK

View Code? Open in Web Editor NEW

12.0 7.0 8.0 17.54 MB

Visualized QA/QC of weather station data

Home Page: https://wswup.github.io/agweather-qaqc/

License: Apache License 2.0

Python 95.41% TeX 4.59%

evapotranspiration python quality-assurance quality-control weather-data weather-station

agweather-qaqc's Introduction

agweather-qaqc (Weather Data QAQC Script)

agweather-qaqc provides a flexible workflow for the visualization, review, and QAQC of daily weather data. This script is intended to be used as an early step in any analysis that might use daily sources of agricultural weather data, particularly for projects with an interest in reference evapotranspiration (ET) data, or where observational data are considered to be 'truth' when evaluating model predictions. agweather-qaqc is command-line interface driven, and provides reminders, prompts, and recommendations to assist users who may not be overly proficient with Python.

Functionalities include:

Importing data without having to convert it to a standardized format, with unit conversions based on a user-specified configuration file.
Converting multiple input formats from separate sources or networks into a single, uniform format for easier downstream analysis.
Visualizing data before and after processing with interactive plots, as daily time series and as mean monthly averages.
Filtering and removal of data, both manually and automatically, with statistics-based approaches to identify and correct issues such as sensor miscalibration.
Calculation of theoretical clear-sky solar radiation and Thornton-Running solar radiation.
Calculation of grass and alfalfa reference ET according to the American Society of Civil Engineers Standardized reference evapotranspiration equation via the RefET library.
Evaluating station aridity through the visualization of both relative humidity and dew point depression plots.
Optional gap-filling of data using station climatologies, empirical approaches (e.g. Thornton-Running solar), or random sampling.

Documentation

Github Page

Installation

Clone the repository:

git clone https://github.com/WSWUP/agweather-qaqc

Navigate the command line/terminal into the repository root directory:
```
cd path/to/agweather-qaqc
```

Setting up and activating the environment can be done one of three ways:

Conda Environment:

conda env create -f environment.yml

conda activate agweatherqaqc

Pipenv Environment:

pipenv install -r requirements.txt

pipenv shell

PDM Environment:
```
pdm install
```
```
pdm shell
```

Run the script via the file qaqc_single_station.py

python qaqc_single_station.py <OPTIONAL ARGUMENTS>

See the documentation for more information.

agweather-qaqc's People

Contributors

Stargazers

Watchers

Forkers

dgketchum markwbrown gabe-parrish drewf7 amorway mehmetpek dlebauer dostuffthatmatters

agweather-qaqc's Issues

Usability Review

This issue is related to the review process for JOSS: openjournals/joss-reviews#6368

Installation

Make it possible to install without conda

It would be great to support an installation without conda because it add a quite heavy dependency that is not necessary. You could simply use Pip (using a requirements.txt file). I personally recommend PDM because it produces a pyproject.toml file that complies with PEP standards - hence even though you can use PDM as a developer, users (or a CI environment) can install the project using Pip. Poetry and Pipenv are alternatives to PDM, but the establish their own metadata format - hence users/CI environments would have to also install the respective package.

Put pyproject.toml/requirements.txt in root

I also recommend putting environment.yml/requirements.txt/pyproject.toml files in the project's root because users and environment management tools do not have to search for the package metadata.

.gitignore does not include Python specific files

The .gitignore file does not include Python specific files. Hence, when I create a virtual environment inside the project, Git will show that there are 2000+ new files. You can use https://www.toptal.com/developers/gitignore to generate a good .gitignore file (maybe exclude files from python and all major operating systems).

Consider supporting Python 3.12

FYI, the tests also pass with Python 3.12. Since Feature releases 3.X are backwards compatible (with a few very minor exceptions), you could consider supporting any Python version >=3.9,<4.0.

Functionality

Add more output formats for tabular outputs

Save the XLSX output files in more formats (CSV, Parquet, etc.), so that it easier to look at them when not having Microsoft Excel installed.

Do not only save HTML files but also static images (PNG/SVG). Makes it easier to look at the results and embed them in slides/documents. Also, since the HTML files fetch the Bokeh JS library, they only work offline if that JS library is present in cache.

Other than that, the plots are nicely done!

Packaging

Just an idea for future releases.

You might consider distributing it as an installable package via PyPI. This way, users could install it using pip install agweather-qaqc and running it inside another project instead of having to manually set up a dedicated directory for it.

If you choose to use PDM or Poetry, both make it quite easy to publish a package to PyPI. Be careful with version tagging then, because you cannot overwrite a version once it is published.

implicit imports

Great package!

for readability please use explicit imports:

from foo import bar

is better than

from foo import *

because later in the code we see

bar.do_something()
rather than do_something which is good because a new user reading your code knows where

do_something() came from

need XlsxWriter module to write output file

Corrections in Paper Text

This issue is related to the review process for JOSS: openjournals/joss-reviews#6368

Below, you can find some things in the paper that could be corrected. Some of them are rather stylistic:

Higher resolution image files for Figures 1 and 2; maybe as SVG/EPS?
Lines 99-100: "While ease-of-use for a non-technical user was one of the principal goals, the software workflow can be automated with the inclusion of libraries such as PyAutoGUI." Automated as in checking for trends/anomalies visually in an automated way?
Lines 102-109, 116-119, and 125-128 are a repetition of lines 56-98. You could introduce the bullet point list as "In the already described process, the package also performs the following steps: ..." instead of an "all features" list. Since 80-81 does not entirely cover 125-128, you could expand the sentence in 80-81.
Figure 2: I would prefer having some plot titles (more descriptive than "A", "B", and "C").
I would move the sections "State of the Field" and "Research enabled by agweather-qaqc" after the "Statement of Need".
The paper would benefit from a grammar check. I didn't find many, but there are some grammatical errors.

ADDITION: From the repository, it is obvious, that you are the main developer and maintainer of the package. Could you dedicate a small section in the paper to mentioning the roles of the co-authors of the paper?

clarify the roles of the different authors

[Feature Request] include reproducible results from JOSS paper fig 2

Please describe the feature you'd like to see

Regarding openjournals/joss-reviews#6368, JOSS Requires 'reproducibility' defined as

If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.

Please add the example data and script or README to reproduce the analysis in the "paper" folder or subdirectory thereof.

[Feature Request] Add support for sub-daily data

Please describe the feature you'd like to see
Currently agweather-qaqc only functions on daily data, however there would be a benefit to adding support for sub-daily data.

Additional context
Certain correction methods will need adjustment for time series data that is no longer daily (ex. the RH percentile correction will need its suggested parameters updated to reflect the increase in data points for any given year)

As a halfway measure, support could be added for taking in sub-daily data and resampling it to daily, saving the user an extra external step.

Format of output files, column names for corrected and metadata

Hi Christian, I am working on gridwxcomp a package that creates bias creation ratios between station climate and gridMET data. I am setting the default names of input climate variables based on what PyWeatherQAQC outputs, also I am using the same format for a climate station metadata file from what PyWeatherQAQC uses or produces. In other words the default assumption is that users first use PyWeatherQAQC before running gridwxcomp, although it is not necessary.

Anyhow, I want to be clear on the formats and column names you use in your output files. Here is the header and hence variable names I am currently using that is assumed to be produced by PyWeatherQAQC in a corrected climate xlsx time series file (on the corrected sheet):

	year	month	day	TAvg (C)	TMax (C)	TMin (C)	TDew (C)	Vapor Pres (kPa)	RHAvg (%)	RHMax (%)	RHMin (%)	Rs (w/m2)	Rs_TR (w/m2)	Rso (w/m2)	Windspeed (m/s)	Precip (mm)	Data_ETr (mm)	Data_ETo (mm)	Calc_ETr (mm)	Calc_ETo (mm)	ws_2m (m/s)

And here is what I have for columns in a station_data.csv metadata file which has info for all the corrected station time series files:

FID	OBJECTID	Id	State	Source	Status	Station	LATDECDEG	LONGDECDEG	Date	Station_ID	Elev_FT	Comments	Location	FileName	Irrigation	Website	Elev_m

Can you let me know what the normal format for these files will be? In particular the climate variable names in the time series files and the fields for lat, long, station, elevation, and file name in the metadata file. Those will be required inputs for gridwxcomp. Also, if you could, show an example row after a standard header for a metadata file, it would be greatly appreciated! Thanks John.

pyWeatherQAQC crashes after reading in raw data

Hi Christian,
I'm having issues getting pyWeatherQAQC to get past reading in the raw data. I'm guessing it's an issue with the data or ini file but can't find what. I sent these files by email. I'm including the packages below as well as the error script. Thanks in advance!
-Charlie

(pyweatherqaqc) C:\pyWeatherQAQC>python qaqc_master.py

System: Starting data correction script.

System: Opening config file: config.ini

System: Raw data successfully read in.
Traceback (most recent call last):
File "qaqc_master.py", line 39, in
missing_fill_value, script_mode, generate_bokeh) = input_functions.obtain_da
ta(config_path)
File "C:\pyWeatherQAQC\qaqc_modules\input_functions.py", line 438, in obtain_d
ata
data_month = np.array(raw_data[:, month_col].astype('int'))
ValueError: invalid literal for int() with base 10: 'nan'

(pyweatherqaqc) C:\pyWeatherQAQC>conda list

packages in environment at C:\Anaconda3\envs\pyweatherqaqc:

(pyweatherqaqc) C:\pyWeatherQAQC>pip list
Package Version

bokeh 1.0.4
certifi 2019.3.9
configparser 3.7.4
DateTime 4.3
Jinja2 2.10
MarkupSafe 1.1.1
numpy 1.16.2
packaging 19.0
pandas 0.24.2
Pillow 5.4.1
pip 19.0.3
pyparsing 2.3.1
python-dateutil 2.8.0
pytz 2018.9
PyYAML 5.1
refet 0.3.10
setuptools 40.8.0
six 1.12.0
tornado 6.0.2
wheel 0.33.1
wincertstore 0.2
XlsxWriter 1.1.5
zope.interface 4.6.0

Add contributor guidelines

Regarding JOSS review openjournals/joss-reviews#6368

JOSS requires

Clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Consider adding CONTRIBUTING.md and CODE_OF_CONDUCT files as well as brief information about contributing to the README

See https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-code-of-conduct-to-your-project and https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors

Add 2m Short Grass Windspeed column to output .csv

Call station windspeed column: "Windspeed_uz_{} (m/s)".format(anemometer_height)

Add 2m logarithmic correction wind speed column using ASCE eq. 33 (short grass)
"Windspeed_u2_sg (m/s)"

Optional Comments on Code Quality

This issue is related to the review process for JOSS: openjournals/joss-reviews#6368

JOSS does not directly require some level of code quality - hence the following is independent of the review. Also code quality is often quite subjective. Nevertheless, the higher quality your codebase is, the easier it is to onboard new developers and the less likely they will develop their own tool because yours misses one feature.

Consider renaming WeatherQC._obtain_data to avoid confusing with the input_files._obtain_data function
The input_files._obtain_data function is almost 300 lines long. Consider breaking it into smaller parts which also forces you to structure the logic of your code
Remove the underscores from all the functions in input_files.py because they are used by other modules
Use a code formatter like Yapf of Black to format your code - removes visual clutter without you having to do it manually. VS Code has a "format on save" setting.
Consider return less tuples (like in calc_functions.calc_temperature_variables) and relying more in dicts where each item is a names value - i.e. return_value["monthly_k_not"] instead of return_value[3]. When possible, you might split up the functions so that each function calculates one thing.
The same is true for long lists of input variables (like in calc_functions.calc_humidity_variables). These make it so easy to accidentily pass tmin, tavg, tmax, ... instead of tmax, tmin, tavg.

I can give you a full review regarding code quality - if you want. But I also don't want to spam you with optional/stylistic comments ^^ Tools for automated code reviews can be very helpful too.

In general, I am a big fan of static typing for production codebases. With static type hints you can use Mypy to test your code on typing errors. However, I understand that it adds complexity to a codebase for developers who have not used it before. There are pro's and con's to it but I found it to save a lot of time because many bugs are caught by MyPy in (CI) tests instead of at runtime.

You can add Yapf/Black/Mypy options to a pyproject.toml file.

Test Setup Review

This issue is related to the review process for JOSS: openjournals/joss-reviews#6368

You already have a suite of PyTest test cases which is very good. All tests passed on my system on the first try.

Use a more established directory structure

I would recommend moving the file test_calculations.py and the directory test_files into a tests directory. Maybe also split up the test file into multiple test files - this is quite stylistic though. This repo follows a widely used project structure: https://github.com/navdeep-G/samplemod

README.md
...
tests/
    test_files/
    test_io.py
    test_conversions.py
    test_calculations.py

Add pytest to CI

You should run the tests using GitHub actions. If you choose to use PDM, you can simply use Pip to install the dependencies inside the CI and don't have to bother with setting up Conda/PDM/Pipenv/Poetry.

Run tests for different python versions (matrix)

The GitHub CI supports running tests for multiple Python versions in parallel

Run the end to end test on your test data in the CI

I.e. running qaqc_single_station.py using the default config. You would have to add an option to skip/do the corrections without a user input via keyboard. You can pipe a stdin to your script using echo 0 | python qaqc_single_station.py.

Development towards an automated process

Greetings,

I thoroughly enjoy this QAQC package and especially the documentation that’s being provided; it makes learning this package far easier and smoother than most other packages. I would like to make an automated process of it for personal weather stations. From my understanding, it seems like pyWeatherQAQC only works through manual inputs. However, if I were to be content with one specific kind of input (say the default one), would there be any way for to me automate this process? Perhaps I could write down a simple text file that has all the steps/inputs written down and so there would be no need for me (or another user) to go through the process of correcting the data every day? Are there any developments in that regard? Perhaps this has already been considered but deemed irrelevant?

Thank you,

Add high level description of docs to readme

I’m conducting review for JOSS and will add comments as issues as I go. openjournals/joss-reviews#6368

README should contain a high level overview of the documentation. See https://joss.readthedocs.io/en/latest/review_criteria.html for details

Add more output formats for images outputs

Add options to export output graphs as PNG/SVG.

Avoids issue of the HTML files fetching the Bokeh JS library, they only work offline if that JS library is present in cache.

Issues:
Sizing can be unreliable ( https://docs.bokeh.org/en/2.4.2/docs/user_guide/export.html )
Don't have alternative method to identify intervals in QC process, will have to leave most plots as HTML

Latitude Units

Inputting latitude units as decimal degrees raises ValueError:

I am using refet version 0.3.7 and I used a station latitude in decimal degrees in the .ini file.
refet version 0.3.7 is supposed to expect decimal degrees but I am being warned that latitudes must be in radians.

ini input "lines_of_footer" doesn't skip final line(s) of met data

Not very important, but Agrimet data ends with an 'END DATA' row. I attempted to increase 'lines_of_footer' in the ini, but it didn't seem to skip that final line. It's easy to delete that row of data in excel, but it appears to be a feature that was wanted.

wswup / agweather-qaqc Goto Github PK

agweather-qaqc's Introduction

agweather-qaqc (Weather Data QAQC Script)

Documentation

Installation

agweather-qaqc's People

Contributors

Stargazers

Watchers

Forkers

agweather-qaqc's Issues

Installation

Functionality

Packaging

packages in environment at C:\Anaconda3\envs\pyweatherqaqc:

Name Version Build Channel

Recommend Projects

Recommend Topics

Recommend Org