arm-doe / act Goto Github PK

Atmospheric data Community Toolkit - A python based toolkit for exploring and analyzing time series atmospheric datasets

Home Page: https://ARM-DOE.github.io/ACT/

License: Other

Python 93.96% TeX 6.04%

atmospheric-science visualization time-series meteorological-data meteorology retrieval corrections

act's People

Contributors

Stargazers

Watchers

act's Issues

Standard Way of Sharing Example Codes without having them as actual examples.

What's the standard practice for sharing scripts that use the ACT library? We don't necessarily want to have a full example for display on the website. Is there another way to do this as part of the repo or would we want to handle this separately?

Check arm standards not really needed anymore

Since Ken now has uploaded a separate routine in the code that checks for ARM standards in the code, we really don't need to do a hard check for the datastream when reading netcdf and csv files. I suggest removing the warning we get.

Also, add **kwargs to csvfiles, pd.read_csv function.

Discuss guidelines for making ACT release versions

@AdamTheisen and @kenkehoe were discussing how the DQ Office installs ACT on ADC production system. We currently release off the GitHub repo with a simple RPM build. This does not track with the official ACT release version number. We will be changing to use PIP soon which should resovle this issue.

I would like to understand our guidelines on how often we create a new ACT release version so I can understand how often we need to create new RPMs and how we can get bug fixes implemented quickly.

Using act object attributes to store important information

I've tried to use the ACT plotting library to plot data that I read with my xarray wrapper routine. To my surprise the plotting routine is expecting some hidden object attributes set during the ACT reading process. Namely the act.file_dates. I don't think we should expect the user to use the ACT reading routine, so we should not have any necessary hidden information.

My current issue is with the xarray object needing arm_ds.act.file_dates which the default or my xarray reader does not set.

Change get_files.py in discovery module to get_armfiles.py?

If we are going to potentially incorporate some other projects like the AOT API, we should differentiate the get_files.py module to be ARM specific

Add functions for QC data

The DQ Office has a few functions to work with bit packed QC. I will look into implementing these into qc_utils. I think we will need to have some discussions on the metadata format standard once in the xarray since ARM, CF and others have different standards. I'm leaning towards the current updated CF method. Do we have any examples of other program's QC after reading (NEON, NWC, ...)?

overplot times of an event at top/bottom of time series plot

I think we need a way to overplot a status onto a time series plot. For example when a sensor is tripped (e.g. MWR rain flag), when rain is detected by another instrument (i.e. MET TBRG) or when some state is happening with the data (i.e. a fan is running indicated by a variable being over a threshold).

My plan is to add a bar or something like that to an existing plot. But instead of adding this feature to the existing plot definition, I'm thinking of a new method to add to an existing plot. This may be better for cases when we want to add information from a different instrument (object) without adding to the primary object used for plotting.

You can assign this issue to me.

Standard in docs and bug in Sphinx

So I was converting over Py-ARTs docs and ran into something from ACT early on. I remember we had the docs set where parameters had a space on both side of the colon before the parameter type. This is inline with doc styles for major packages. When it stopped working the spaces were removed. It turned out this was an update sphinx did that messed with numpy docs:
readthedocs/sphinx_rtd_theme#766 (comment)

The fixed mentioned, however, will fix rtd theme but then centers all tables for parameters which looks ugly, in my opinion.

A way around all these and looks cleaner both standard wise and in the documentation itself is to remove numpy doc and have the sphinx napolean extension and revert to having the space before and after the colon in the docs to keep with standards. I can do a pull request soon if you all wish.

Need to find way to unit test discovery module safely

Currently, the discovery module takes in a user specified username and token as a string, but if we want to do unit testing on this module having these hard coded in the unit test is troublesome. Having a more secure way to log into the ARM Archive for testing would enable us to do this unit test safely.

In addition, due to the size of 2D datasets such as celiometer data, I would like to add a unit test for time-range plots but I need a way to download the celiometer data either from the ARM archive or, for now, downloaded from a temporary location to unit test 2D plotting.

Filtering of data in plots (like a Py-ART gatefilter)

Like what we have in the Py-ART GateFilter, we should have a way to filter out values when viewing/processing data. I see an ACT "gatefilter" in the future, but perhaps we can call it a DataFilter object.

Plotting all NaN data causes error with y limits

File "/home/kehoe/dev/dq/lib/python/ACT/act/plotting/plot.py", line 365, in set_yrng
self.axes[subplot_index].set_ylim(yrng)
File "/apps/base/python3.6/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 3616, in set_ylim
bottom = self._validate_converted_limits(bottom, self.convert_yunits)
File "/apps/base/python3.6/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 3139, in _validate_converted_limits
raise ValueError("Axis limits cannot be NaN or Inf")
ValueError: Axis limits cannot be NaN or Inf

We need to catch when all the data are set to NaN and set the y limits to something like [0, 1].

I can fix this if needed, let me know.

Dependencies need to be agreed upon

Currently, ACT has the following dependencies:

Numpy
Xarray
Matplotlib
Astral (for shading in day vs night in timeseries plots)

as well as the packages these depend on such as scipy and pandas. We should agree about what dependencies ACT should have.

read_netcdf function causes segfault when attempting to read in a day of radar data

It seems that the read_netcdf function in act.io.armfiles causes a segfault when reading in radar data. Perhaps because of the large file sizes? the specific case that causes the segfault is reading in the following list of files:

['/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.000013.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.010019.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.020010.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.030009.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.040011.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.050013.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.060019.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.070007.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.080020.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.090004.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.100010.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.110016.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.120011.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.130013.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.140004.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.150010.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.160019.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.170014.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.180013.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.190008.nc']

The function call looks like:
read_netcdf(files, return_None=True, mask_and_scale=False, data_vars='minimal')
where data_vars is simple the list of variables

Speed up import time to load ACT

I think we need to pay more attention to the way we load ACT modules. Currently it takes about 8 seconds to load ACT. Also it would be better to not load modules that are not used. For example the cartopy module is loaded for any plot created even if it is not used. I think we should put that module import under the GeographicPlotDisplay class so it is only imported when that class is loaded.

Not sure how this is typically done so if someone has suggestions I'm happy to implement.

Shade time series data based on QC

Some work was started in this area, but I don't think it was finished. Thhis is to document that we need to finish it.

New QC not showing up in Documentation

I don't see any of the new QC features documented in the ACT docs.

https://anl-digr.github.io/ACT/

@rcjackson @zssherman it looks like @kenkehoe has things documented fairly well. Is there something we need to do to get it to show up?

Best way to handle geographic plots (ship/aircraft data).

We need to determine what library would be best for plotting up maps with plane/ship tracks overlaid.

Xarray not able to read old MMCR data

Some older data might not be compatible with xarray. sgpmmcrmomC1.b1.20041107 data yielded this error:

xarray.core.variable.MissingDimensionsError: 'heights' has more than 1-dimension and the same name as one of its dimensions ('mode', 'heights'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

Is there a way around this?

Integration of Colorblind-friendly color palettes

Alyssa had mentioned that it would be good to have the plots made with ACT use a colorblind-friendly palette. This is a great idea and it should be incorporated.

geodisplay.py projection not working with any other options but the default

While using the GeographicPlotDisplay function when trying to use other cartopy projections the map extent changes or no data appears.

The default is PlateCarree and works fine but when trying to use a Mercator projection the data shows up but the map extent is changed.

When using a LambertConformal projection the extent appears to be off since no features appear that were suppose to be plotted.

Combine clean.py into armfiles.py

The clean.py script is very ARM-centric and so i'm wondering if it would make sense to include in the armfiles.py file instead of a stand-alone one. Or, as another option, since it is very qc centric, that it be moved into the qc directory and titled appropriately if it is mainly for ARM QC. @kenkehoe thoughts?

Roadmap Discussion

It seems like a lot of the initial framework is developed. It seems like a good time to discuss what else we envision adding to this repo in the near and far term.

Plot embedded QC

We need to create a plot similar to DQ Inspector of embedded QC lining up with data. We can assign this to me.

Auto Day/Night Background with 1D time Series

Right now, the day/night background is automatically displayed if the variable is a 1d time series and there's no "ydata". Should we remove this and leave up to users to display background?

if ydata is None:
self.day_night_background(subplot_index)
ax.plot(xdata, data, '.', color=line_color)
...

Easy Overlay of QC in TimeSeries Plot

Need an easy way to flag data based on QC variables in the TimeSeries plotting to start. Something that we could pass a list of test numbers or a list of quality level (bad, suspect) and produce something like the attached plot.

QC Barh Plot for 2D data

When a user tries to plot 2D-QC data, it currently runs into an error. It would be good to figure out plots similar to this.

New QC Tests to Add

This is to track what QC tests need to be added. I added all the numpy masked array options and two more, persistence and single variable comparison.

Clean way to store multiple objects in display object

Another issue that I need to get working on is to have a cleaner way to plot data from multiple objects at a time in one display. Right now you have to merge objects, but it would be nicer to have the display object natively support the display of data from more than one object at a time so that the user does not have to make a new object and hog up memory and resources.

Figsize Not Working Correctly

While we have figsize in an example, I don't think it is actually being used in the plotting script. Changing figsize in the example below does not actually change the plot size

display2 = act.plotting.TimeSeriesDisplay(new,figsize=(8,5))
display2.plot('backscatter')

License and Author Files Incorrect

It looks like the AUTHOR and LICENSE files from my initial build using the scientific cookie cutter were pulled over. We should update these accordingly.

Issue with Weekly Plotting when Number of Days of Available Data is Low

There seems to be an issue with weekly plots when only a small amount of data is avaialable to generate the plots with. The x-axis shows up as all zeros. This appears to correct itself once there are ~3 full available days of data, as in this example:

Secondary Y axis

@rcjackson @kenkehoe I was looking at adding in a feature for working with a secondary y axis on the timeseries plots, but it is a little more in depth than I thought, mainly due to the set_yrng function. We rely on the subplot indices to set a lot of things and that currently does not include anything about a secondary y-axis. I could put in a work around for now that I think will work, but it's not ideal long-term. Could we adjust the subplot indices at all or are there any other options you could think of?

Day/night background issues with polar night

The day/night plotting background has issues with astral and calculating times for sun rise/set. This is throwing an error when there is no sun during the day. We can fix that with a try/except.

Other issue is that if a user catches the exception the background for polar night comes out as all day. I think this is an issue with how things are plotted.

ACT Cheatsheet Needs to be Updated to Reflect ARM-DOE

Title says it all

Adding internal global attributes to Xarrray Dataset to replace ACT object attributes

Initially ACT would add some attributes to the Xarray Dataset that are used by the plotting functions. Some of them include file_dates, file_times, and datastream. We changed to adding them as Dataset global attributes so others who don't use our reader can fix the missing information when needed.

But the part I'm not a fan of is adding these without an indication they are part of ACT not part of the data file. I suggest prepending "__" to any internal information added to the Dataset so we know and can easily remove them if needed. The current variables I see file_dates, file_times, arm_standards_flag and datastream should be changed to __fiile_dates, __file_times, __arm_standards_flag and __datastream.

This is also causing issues with the variables that do exist in the data file like "datastream". We have some code that fixes issues when processing older ARM data that are missing global attributes, particularly datastream. Since ACT is adding that global attribute our code thinks things are OK and we have other issues.

Here is an example of a Dataset global attriubutes:

    ...
    serial_number:              P2710385
    qc_standards_version:       1.0
    qc_method:                  Standard Mentor QC
    qc_comment:                 The QC field values are a bit packed represen...
    zeb_platform:               enasondewnpnC1.b1
    history:                    created by user dsmgr on machine garnet at 20...
    file_dates:                 ['20191206']
    file_times:                 ['113100']
    datastream:                 act_datastream
    arm_standards_flag:         ARMStandardsFlag.OK
    __source_files:             ['enasondewnpnC1.b1.20191206.113100.cdf']

Everything after history: is added by ACT or DQ Office code. We add __source_files.

Need an avenue for users to share code that has not been vetted through publications

There is a need to differentiate between retrievals in ACT that have been vetted through peer-reviewed publications and those that have not. I propose we add an additional folder under retrievals for evaluation scripts. Would calling it "eval" or "exp" for experimental work? Other options?

@scollis @kenkehoe @zssherman @rcjackson Thoughts?

Stability Indices for BBSS data

It may be useful to have a way to calculate stability indices for the BBSS data. Perhaps a way to output a table with the plot or a separate file.

Docs has wrong location to install from

In the docs, it says to clone from the AdamTheisen repository, but it's actually better to clone from the ANL-DIGR repository. In the end, we'll fix this whole issue with pip and conda packaging, but this is a reminder for me to change that part of the docs tomorrow.

day_night_background throws error when no location info in file

We need to check if there is location data in the file before calling day_night_background. Right now it crashes if plotting the background with no location data, but we can just make it not do the background if there is no location information.

Way to set absolute limits for time series plot

I am putting in a feature request to have a way to set an absolute limit when making a time series plot so a single very large or small value will not make the plot unusable. I have some ideas on how to implement.

Thinking not used as default
can add to min or max independently
can add absolute values or percentage of mean, mode or median
Will indicate on plot that the range displayed does not show all values

I'll try to get to this soon.

Documentation issues

It looks like the way that we are currently doing the documentation is not translating well into the sphinx generated docs.

I.e.

obj : Xarray Dataset Object

creates this:

objXarray Dataset Object

There is no spacing between the variable name and the type. Are we doing this the correct way @rcjackson @zssherman ?

Wind rose and sounding plots.

In addition to a TimeSeriesDisplay, we need to have a WindRoseDisplay and a SkewTDisplay object. This needs to be done by the ARM-ASR PI meeting as a demonstration that shows that Paytsar is able to use it for her data.

For Skew-T plots, I am thinking of introducing metpy as a dependency since it does these really well.

reading csv into xarray

Adam mentioned the need to allow data to be in a more general format for this repo to be broader to the open source community. A quick search for how to read CSV data into xarray didn't show any results, but it does look like pandas can read CSV. Can we create a module or example using pandas to show how the CSV data will be read in with pandas and then converted to xarray with DataFrame.to_xarray()?

ACT Title Missing 'data'

The title for the ACT repo is missing 'data' and should read Atmospheric data Community Toolkit

u/v wind barb adding all colorbars to last subplot

When plotting multiple subplots using act.plotting.TimeSeriesDisplay.plot_barbs_from_u_v and adding a colormap to the subplots, all of the colorbars end up being plotted on the last subplot rather than their respective subplots:

I didn't notice anything in the code that I thought would be causing this. Any ideas?

Documentation typo

In the documentation under Setting up an Environment/Install there is a typo for using ACT and its dependencies. It has is dependencies.

DOC: Need to use specific file names in examples

It would be easier for the end user if the examples in the documentation used specific file names rather than wildcards.

Speed up Plotting of 2D datasets

We need to find a way to plot out high-res 2D datasets faster. The MPL data takes a ridiculous amount of time to plot up 3 plots. Maybe we need to play around with image plotting of the data as well instead of pcolormesh and see how that works for cases like this.

Missing dependency : appdirs

Hi, i just installed act from conda-forge and this error came up on import:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
~/.conda/envs/cmac_env/lib/python3.6/site-packages/metpy/_version.py in get_version()
     13     try:
---> 14         from setuptools_scm import get_version
     15         return get_version(root='..', relative_to=__file__,

ModuleNotFoundError: No module named 'setuptools_scm'

During handling of the above exception, another exception occurred:

DistributionNotFound                      Traceback (most recent call last)
<ipython-input-11-d20e7d291629> in <module>
     31 import matplotlib.ticker as mt
     32 import matplotlib.font_manager as fm
---> 33 import act
     34 get_ipython().run_line_magic('matplotlib', 'inline')

~/.conda/envs/cmac_env/lib/python3.6/site-packages/act/__init__.py in <module>
      1 from . import io
----> 2 from . import plotting
      3 from . import corrections
      4 from . import utils
      5 from . import tests

~/.conda/envs/cmac_env/lib/python3.6/site-packages/act/plotting/__init__.py in <module>
     19 from .ContourDisplay import ContourDisplay
     20 from .WindRoseDisplay import WindRoseDisplay
---> 21 from .SkewTDisplay import SkewTDisplay
     22 from .XSectionDisplay import XSectionDisplay
     23 from .GeoDisplay import GeographicPlotDisplay

~/.conda/envs/cmac_env/lib/python3.6/site-packages/act/plotting/SkewTDisplay.py in <module>
     12 
     13 try:
---> 14     import metpy.calc as mpcalc
     15     METPY_AVAILABLE = True
     16 except ImportError:

~/.conda/envs/cmac_env/lib/python3.6/site-packages/metpy/__init__.py in <module>
     34 from ._version import get_version  # noqa: E402
     35 from .xarray import *  # noqa: F401, F403
---> 36 __version__ = get_version()
     37 del get_version

~/.conda/envs/cmac_env/lib/python3.6/site-packages/metpy/_version.py in get_version()
     17     except (ImportError, LookupError):
     18         from pkg_resources import get_distribution
---> 19         return get_distribution(__package__).version

~/.conda/envs/cmac_env/lib/python3.6/site-packages/pkg_resources/__init__.py in get_distribution(dist)
    480         dist = Requirement.parse(dist)
    481     if isinstance(dist, Requirement):
--> 482         dist = get_provider(dist)
    483     if not isinstance(dist, Distribution):
    484         raise TypeError("Expected string, Requirement, or Distribution", dist)

~/.conda/envs/cmac_env/lib/python3.6/site-packages/pkg_resources/__init__.py in get_provider(moduleOrReq)
    356     """Return an IResourceProvider for the named module or requirement"""
    357     if isinstance(moduleOrReq, Requirement):
--> 358         return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
    359     try:
    360         module = sys.modules[moduleOrReq]

~/.conda/envs/cmac_env/lib/python3.6/site-packages/pkg_resources/__init__.py in require(self, *requirements)
    899         included, even if they were already activated in this working set.
    900         """
--> 901         needed = self.resolve(parse_requirements(requirements))
    902 
    903         for dist in needed:

~/.conda/envs/cmac_env/lib/python3.6/site-packages/pkg_resources/__init__.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
    785                     if dist is None:
    786                         requirers = required_by.get(req, None)
--> 787                         raise DistributionNotFound(req, requirers)
    788                 to_activate.append(dist)
    789             if dist not in req:

DistributionNotFound: The 'appdirs' distribution was not found and is required by pooch

What to do when file not found

I would like to change the base behavior when a file is not found with io.read_netcdf() to catch the FileNotFound error and instead just return a value of None. This makes more logical sense to me to use the reading function to go check if the file exists. We do a lot of stuff depending on availability of files and having to wrap everything in a try seems excessive. I would suggest adding a verbose option to make a print statement optional if the file is not found, but not make that the default option. If this is OK I'll make the update.

arm-doe / act Goto Github PK

act's People

Contributors

Stargazers

Watchers

Forkers

act's Issues

Recommend Projects

Recommend Topics

Recommend Org