Code Monkey home page Code Monkey logo

agweather-qaqc's Introduction

agweather-qaqc (Weather Data QAQC Script)

agweather-qaqc provides a flexible workflow for the visualization, review, and QAQC of daily weather data. This script is intended to be used as an early step in any analysis that might use daily sources of agricultural weather data, particularly for projects with an interest in reference evapotranspiration (ET) data, or where observational data are considered to be 'truth' when evaluating model predictions. agweather-qaqc is command-line interface driven, and provides reminders, prompts, and recommendations to assist users who may not be overly proficient with Python.

Functionalities include:

  • Importing data without having to convert it to a standardized format, with unit conversions based on a user-specified configuration file.
  • Converting multiple input formats from separate sources or networks into a single, uniform format for easier downstream analysis.
  • Visualizing data before and after processing with interactive plots, as daily time series and as mean monthly averages.
  • Filtering and removal of data, both manually and automatically, with statistics-based approaches to identify and correct issues such as sensor miscalibration.
  • Calculation of theoretical clear-sky solar radiation and Thornton-Running solar radiation.
  • Calculation of grass and alfalfa reference ET according to the American Society of Civil Engineers Standardized reference evapotranspiration equation via the RefET library.
  • Evaluating station aridity through the visualization of both relative humidity and dew point depression plots.
  • Optional gap-filling of data using station climatologies, empirical approaches (e.g. Thornton-Running solar), or random sampling.

Documentation

Github Page

Installation

  1. Clone the repository:

    git clone https://github.com/WSWUP/agweather-qaqc
    
  2. Navigate the command line/terminal into the repository root directory:

    cd path/to/agweather-qaqc
    
  3. Setting up and activating the environment can be done one of three ways:

    • Conda Environment:
      conda env create -f environment.yml
      
      conda activate agweatherqaqc
      
    • Pipenv Environment:
      pipenv install -r requirements.txt
      
      pipenv shell
      
    • PDM Environment:
      pdm install
      
      pdm shell
      
  4. Run the script via the file qaqc_single_station.py

    python qaqc_single_station.py <OPTIONAL ARGUMENTS>
    

See the documentation for more information.

agweather-qaqc's People

Contributors

amorway avatar cwdunkerly avatar dlebauer avatar dostuffthatmatters avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

agweather-qaqc's Issues

Usability Review

This issue is related to the review process for JOSS: openjournals/joss-reviews#6368

Installation

  • Make it possible to install without conda

It would be great to support an installation without conda because it add a quite heavy dependency that is not necessary. You could simply use Pip (using a requirements.txt file). I personally recommend PDM because it produces a pyproject.toml file that complies with PEP standards - hence even though you can use PDM as a developer, users (or a CI environment) can install the project using Pip. Poetry and Pipenv are alternatives to PDM, but the establish their own metadata format - hence users/CI environments would have to also install the respective package.

  • Put pyproject.toml/requirements.txt in root

I also recommend putting environment.yml/requirements.txt/pyproject.toml files in the project's root because users and environment management tools do not have to search for the package metadata.

  • .gitignore does not include Python specific files

The .gitignore file does not include Python specific files. Hence, when I create a virtual environment inside the project, Git will show that there are 2000+ new files. You can use https://www.toptal.com/developers/gitignore to generate a good .gitignore file (maybe exclude files from python and all major operating systems).

  • Consider supporting Python 3.12

FYI, the tests also pass with Python 3.12. Since Feature releases 3.X are backwards compatible (with a few very minor exceptions), you could consider supporting any Python version >=3.9,<4.0.

Functionality

  • Add more output formats for tabular outputs

Save the XLSX output files in more formats (CSV, Parquet, etc.), so that it easier to look at them when not having Microsoft Excel installed.

Do not only save HTML files but also static images (PNG/SVG). Makes it easier to look at the results and embed them in slides/documents. Also, since the HTML files fetch the Bokeh JS library, they only work offline if that JS library is present in cache.

Other than that, the plots are nicely done!

Packaging

Just an idea for future releases.

You might consider distributing it as an installable package via PyPI. This way, users could install it using pip install agweather-qaqc and running it inside another project instead of having to manually set up a dedicated directory for it.

If you choose to use PDM or Poetry, both make it quite easy to publish a package to PyPI. Be careful with version tagging then, because you cannot overwrite a version once it is published.

implicit imports

Great package!

for readability please use explicit imports:

from foo import bar

is better than

from foo import *

because later in the code we see

bar.do_something()
rather than do_something which is good because a new user reading your code knows where

do_something() came from

Corrections in Paper Text

This issue is related to the review process for JOSS: openjournals/joss-reviews#6368

Below, you can find some things in the paper that could be corrected. Some of them are rather stylistic:

  • Higher resolution image files for Figures 1 and 2; maybe as SVG/EPS?
  •  Lines 99-100: "While ease-of-use for a non-technical user was one of the principal goals, the software workflow can be automated with the inclusion of libraries such as PyAutoGUI." Automated as in checking for trends/anomalies visually in an automated way?
  • Lines 102-109, 116-119, and 125-128 are a repetition of lines 56-98. You could introduce the bullet point list as "In the already described process, the package also performs the following steps: ..." instead of an "all features" list. Since 80-81 does not entirely cover 125-128, you could expand the sentence in 80-81.
  • Figure 2: I would prefer having some plot titles (more descriptive than "A", "B", and "C").
  • I would move the sections "State of the Field" and "Research enabled by agweather-qaqc" after the "Statement of Need".
  • The paper would benefit from a grammar check. I didn't find many, but there are some grammatical errors.

ADDITION: From the repository, it is obvious, that you are the main developer and maintainer of the package. Could you dedicate a small section in the paper to mentioning the roles of the co-authors of the paper?

  • clarify the roles of the different authors

[Feature Request] Add support for sub-daily data

Please describe the feature you'd like to see
Currently agweather-qaqc only functions on daily data, however there would be a benefit to adding support for sub-daily data.

Additional context
Certain correction methods will need adjustment for time series data that is no longer daily (ex. the RH percentile correction will need its suggested parameters updated to reflect the increase in data points for any given year)

As a halfway measure, support could be added for taking in sub-daily data and resampling it to daily, saving the user an extra external step.

Format of output files, column names for corrected and metadata

Hi Christian, I am working on gridwxcomp a package that creates bias creation ratios between station climate and gridMET data. I am setting the default names of input climate variables based on what PyWeatherQAQC outputs, also I am using the same format for a climate station metadata file from what PyWeatherQAQC uses or produces. In other words the default assumption is that users first use PyWeatherQAQC before running gridwxcomp, although it is not necessary.

Anyhow, I want to be clear on the formats and column names you use in your output files. Here is the header and hence variable names I am currently using that is assumed to be produced by PyWeatherQAQC in a corrected climate xlsx time series file (on the corrected sheet):

  year month day TAvg (C) TMax (C) TMin (C) TDew (C) Vapor Pres (kPa) RHAvg (%) RHMax (%) RHMin (%) Rs (w/m2) Rs_TR (w/m2) Rso (w/m2) Windspeed (m/s) Precip (mm) Data_ETr (mm) Data_ETo (mm) Calc_ETr (mm) Calc_ETo (mm) ws_2m (m/s)

And here is what I have for columns in a station_data.csv metadata file which has info for all the corrected station time series files:

FID OBJECTID Id State Source Status Station LATDECDEG LONGDECDEG Date Station_ID Elev_FT Comments Location FileName Irrigation Website Elev_m

Can you let me know what the normal format for these files will be? In particular the climate variable names in the time series files and the fields for lat, long, station, elevation, and file name in the metadata file. Those will be required inputs for gridwxcomp. Also, if you could, show an example row after a standard header for a metadata file, it would be greatly appreciated! Thanks John.

pyWeatherQAQC crashes after reading in raw data

Hi Christian,
I'm having issues getting pyWeatherQAQC to get past reading in the raw data. I'm guessing it's an issue with the data or ini file but can't find what. I sent these files by email. I'm including the packages below as well as the error script. Thanks in advance!
-Charlie

(pyweatherqaqc) C:\pyWeatherQAQC>python qaqc_master.py

System: Starting data correction script.

System: Opening config file: config.ini

System: Raw data successfully read in.
Traceback (most recent call last):
File "qaqc_master.py", line 39, in
missing_fill_value, script_mode, generate_bokeh) = input_functions.obtain_da
ta(config_path)
File "C:\pyWeatherQAQC\qaqc_modules\input_functions.py", line 438, in obtain_d
ata
data_month = np.array(raw_data[:, month_col].astype('int'))
ValueError: invalid literal for int() with base 10: 'nan'

(pyweatherqaqc) C:\pyWeatherQAQC>conda list

packages in environment at C:\Anaconda3\envs\pyweatherqaqc:

Name Version Build Channel

anaconda custom py36h363777c_0
bokeh 1.0.4 pypi_0 pypi
certifi 2019.3.9 py36_0 conda-forge
configparser 3.7.4 pypi_0 pypi
datetime 4.3 pypi_0 pypi
jinja2 2.10 pypi_0 pypi
markupsafe 1.1.1 pypi_0 pypi
numpy 1.16.2 pypi_0 pypi
packaging 19.0 pypi_0 pypi
pandas 0.24.2 pypi_0 pypi
pillow 5.4.1 pypi_0 pypi
pip 19.0.3 py36_0 conda-forge
pyparsing 2.3.1 pypi_0 pypi
python 3.6.7 he025d50_1004 conda-forge
python-dateutil 2.8.0 pypi_0 pypi
pytz 2018.9 pypi_0 pypi
pyyaml 5.1 pypi_0 pypi
refet 0.3.10 pypi_0 pypi
setuptools 40.8.0 py36_0 conda-forge
six 1.12.0 pypi_0 pypi
tornado 6.0.2 pypi_0 pypi
vc 14 0 conda-forge
vs2015_runtime 14.0.25420 0 conda-forge
wheel 0.33.1 py36_0 conda-forge
wincertstore 0.2 py36_1002 conda-forge
xlsxwriter 1.1.5 py_0 conda-forge
zope-interface 4.6.0 pypi_0 pypi

(pyweatherqaqc) C:\pyWeatherQAQC>pip list
Package Version


bokeh 1.0.4
certifi 2019.3.9
configparser 3.7.4
DateTime 4.3
Jinja2 2.10
MarkupSafe 1.1.1
numpy 1.16.2
packaging 19.0
pandas 0.24.2
Pillow 5.4.1
pip 19.0.3
pyparsing 2.3.1
python-dateutil 2.8.0
pytz 2018.9
PyYAML 5.1
refet 0.3.10
setuptools 40.8.0
six 1.12.0
tornado 6.0.2
wheel 0.33.1
wincertstore 0.2
XlsxWriter 1.1.5
zope.interface 4.6.0

Add contributor guidelines

Regarding JOSS review openjournals/joss-reviews#6368

JOSS requires

Clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Consider adding CONTRIBUTING.md and CODE_OF_CONDUCT files as well as brief information about contributing to the README

See https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-code-of-conduct-to-your-project and https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors

Optional Comments on Code Quality

This issue is related to the review process for JOSS: openjournals/joss-reviews#6368

JOSS does not directly require some level of code quality - hence the following is independent of the review. Also code quality is often quite subjective. Nevertheless, the higher quality your codebase is, the easier it is to onboard new developers and the less likely they will develop their own tool because yours misses one feature.

  • Consider renaming WeatherQC._obtain_data to avoid confusing with the input_files._obtain_data function
  • The input_files._obtain_data function is almost 300 lines long. Consider breaking it into smaller parts which also forces you to structure the logic of your code
  • Remove the underscores from all the functions in input_files.py because they are used by other modules
  • Use a code formatter like Yapf of Black to format your code - removes visual clutter without you having to do it manually. VS Code has a "format on save" setting.
  • Consider return less tuples (like in calc_functions.calc_temperature_variables) and relying more in dicts where each item is a names value - i.e. return_value["monthly_k_not"] instead of return_value[3]. When possible, you might split up the functions so that each function calculates one thing.
  • The same is true for long lists of input variables (like in calc_functions.calc_humidity_variables). These make it so easy to accidentily pass tmin, tavg, tmax, ... instead of tmax, tmin, tavg.

I can give you a full review regarding code quality - if you want. But I also don't want to spam you with optional/stylistic comments ^^ Tools for automated code reviews can be very helpful too.

In general, I am a big fan of static typing for production codebases. With static type hints you can use Mypy to test your code on typing errors. However, I understand that it adds complexity to a codebase for developers who have not used it before. There are pro's and con's to it but I found it to save a lot of time because many bugs are caught by MyPy in (CI) tests instead of at runtime.

You can add Yapf/Black/Mypy options to a pyproject.toml file.

Test Setup Review

This issue is related to the review process for JOSS: openjournals/joss-reviews#6368

You already have a suite of PyTest test cases which is very good. All tests passed on my system on the first try.

  • Use a more established directory structure

I would recommend moving the file test_calculations.py and the directory test_files into a tests directory. Maybe also split up the test file into multiple test files - this is quite stylistic though. This repo follows a widely used project structure: https://github.com/navdeep-G/samplemod

README.md
...
tests/
    test_files/
    test_io.py
    test_conversions.py
    test_calculations.py
  • Add pytest to CI

You should run the tests using GitHub actions. If you choose to use PDM, you can simply use Pip to install the dependencies inside the CI and don't have to bother with setting up Conda/PDM/Pipenv/Poetry.

  • Run tests for different python versions (matrix)

The GitHub CI supports running tests for multiple Python versions in parallel

  • Run the end to end test on your test data in the CI

I.e. running qaqc_single_station.py using the default config. You would have to add an option to skip/do the corrections without a user input via keyboard. You can pipe a stdin to your script using echo 0 | python qaqc_single_station.py.

Development towards an automated process

Greetings,

I thoroughly enjoy this QAQC package and especially the documentation that’s being provided; it makes learning this package far easier and smoother than most other packages. I would like to make an automated process of it for personal weather stations. From my understanding, it seems like pyWeatherQAQC only works through manual inputs. However, if I were to be content with one specific kind of input (say the default one), would there be any way for to me automate this process? Perhaps I could write down a simple text file that has all the steps/inputs written down and so there would be no need for me (or another user) to go through the process of correcting the data every day? Are there any developments in that regard? Perhaps this has already been considered but deemed irrelevant?

Thank you,

Latitude Units

Inputting latitude units as decimal degrees raises ValueError:

lat_units

I am using refet version 0.3.7 and I used a station latitude in decimal degrees in the .ini file.
refet version 0.3.7 is supposed to expect decimal degrees but I am being warned that latitudes must be in radians.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.