arm-doe / act Goto Github PK
View Code? Open in Web Editor NEWAtmospheric data Community Toolkit - A python based toolkit for exploring and analyzing time series atmospheric datasets
Home Page: https://ARM-DOE.github.io/ACT/
License: Other
Atmospheric data Community Toolkit - A python based toolkit for exploring and analyzing time series atmospheric datasets
Home Page: https://ARM-DOE.github.io/ACT/
License: Other
What's the standard practice for sharing scripts that use the ACT library? We don't necessarily want to have a full example for display on the website. Is there another way to do this as part of the repo or would we want to handle this separately?
Since Ken now has uploaded a separate routine in the code that checks for ARM standards in the code, we really don't need to do a hard check for the datastream when reading netcdf and csv files. I suggest removing the warning we get.
Also, add **kwargs to csvfiles, pd.read_csv function.
@AdamTheisen and @kenkehoe were discussing how the DQ Office installs ACT on ADC production system. We currently release off the GitHub repo with a simple RPM build. This does not track with the official ACT release version number. We will be changing to use PIP soon which should resovle this issue.
I would like to understand our guidelines on how often we create a new ACT release version so I can understand how often we need to create new RPMs and how we can get bug fixes implemented quickly.
I've tried to use the ACT plotting library to plot data that I read with my xarray wrapper routine. To my surprise the plotting routine is expecting some hidden object attributes set during the ACT reading process. Namely the act.file_dates. I don't think we should expect the user to use the ACT reading routine, so we should not have any necessary hidden information.
My current issue is with the xarray object needing arm_ds.act.file_dates which the default or my xarray reader does not set.
If we are going to potentially incorporate some other projects like the AOT API, we should differentiate the get_files.py module to be ARM specific
The DQ Office has a few functions to work with bit packed QC. I will look into implementing these into qc_utils. I think we will need to have some discussions on the metadata format standard once in the xarray since ARM, CF and others have different standards. I'm leaning towards the current updated CF method. Do we have any examples of other program's QC after reading (NEON, NWC, ...)?
I think we need a way to overplot a status onto a time series plot. For example when a sensor is tripped (e.g. MWR rain flag), when rain is detected by another instrument (i.e. MET TBRG) or when some state is happening with the data (i.e. a fan is running indicated by a variable being over a threshold).
My plan is to add a bar or something like that to an existing plot. But instead of adding this feature to the existing plot definition, I'm thinking of a new method to add to an existing plot. This may be better for cases when we want to add information from a different instrument (object) without adding to the primary object used for plotting.
You can assign this issue to me.
So I was converting over Py-ARTs docs and ran into something from ACT early on. I remember we had the docs set where parameters had a space on both side of the colon before the parameter type. This is inline with doc styles for major packages. When it stopped working the spaces were removed. It turned out this was an update sphinx did that messed with numpy docs:
readthedocs/sphinx_rtd_theme#766 (comment)
The fixed mentioned, however, will fix rtd theme but then centers all tables for parameters which looks ugly, in my opinion.
A way around all these and looks cleaner both standard wise and in the documentation itself is to remove numpy doc and have the sphinx napolean extension and revert to having the space before and after the colon in the docs to keep with standards. I can do a pull request soon if you all wish.
Currently, the discovery module takes in a user specified username and token as a string, but if we want to do unit testing on this module having these hard coded in the unit test is troublesome. Having a more secure way to log into the ARM Archive for testing would enable us to do this unit test safely.
In addition, due to the size of 2D datasets such as celiometer data, I would like to add a unit test for time-range plots but I need a way to download the celiometer data either from the ARM archive or, for now, downloaded from a temporary location to unit test 2D plotting.
Like what we have in the Py-ART GateFilter, we should have a way to filter out values when viewing/processing data. I see an ACT "gatefilter" in the future, but perhaps we can call it a DataFilter object.
File "/home/kehoe/dev/dq/lib/python/ACT/act/plotting/plot.py", line 365, in set_yrng
self.axes[subplot_index].set_ylim(yrng)
File "/apps/base/python3.6/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 3616, in set_ylim
bottom = self._validate_converted_limits(bottom, self.convert_yunits)
File "/apps/base/python3.6/lib/python3.6/site-packages/matplotlib/axes/_base.py", line 3139, in _validate_converted_limits
raise ValueError("Axis limits cannot be NaN or Inf")
ValueError: Axis limits cannot be NaN or Inf
We need to catch when all the data are set to NaN and set the y limits to something like [0, 1].
I can fix this if needed, let me know.
Currently, ACT has the following dependencies:
as well as the packages these depend on such as scipy and pandas. We should agree about what dependencies ACT should have.
It seems that the read_netcdf function in act.io.armfiles causes a segfault when reading in radar data. Perhaps because of the large file sizes? the specific case that causes the segfault is reading in the following list of files:
['/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.000013.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.010019.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.020010.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.030009.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.040011.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.050013.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.060019.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.070007.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.080020.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.090004.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.100010.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.110016.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.120011.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.130013.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.140004.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.150010.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.160019.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.170014.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.180013.nc', '/data/datastream/sgp/sgpkazrcfrgeC1.a1/sgpkazrcfrgeC1.a1.20190225.190008.nc']
The function call looks like:
read_netcdf(files, return_None=True, mask_and_scale=False, data_vars='minimal')
where data_vars is simple the list of variables
I think we need to pay more attention to the way we load ACT modules. Currently it takes about 8 seconds to load ACT. Also it would be better to not load modules that are not used. For example the cartopy module is loaded for any plot created even if it is not used. I think we should put that module import under the GeographicPlotDisplay class so it is only imported when that class is loaded.
Not sure how this is typically done so if someone has suggestions I'm happy to implement.
Some work was started in this area, but I don't think it was finished. Thhis is to document that we need to finish it.
I don't see any of the new QC features documented in the ACT docs.
https://anl-digr.github.io/ACT/
@rcjackson @zssherman it looks like @kenkehoe has things documented fairly well. Is there something we need to do to get it to show up?
We need to determine what library would be best for plotting up maps with plane/ship tracks overlaid.
Some older data might not be compatible with xarray. sgpmmcrmomC1.b1.20041107 data yielded this error:
xarray.core.variable.MissingDimensionsError: 'heights' has more than 1-dimension and the same name as one of its dimensions ('mode', 'heights'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.
Is there a way around this?
Alyssa had mentioned that it would be good to have the plots made with ACT use a colorblind-friendly palette. This is a great idea and it should be incorporated.
While using the GeographicPlotDisplay function when trying to use other cartopy projections the map extent changes or no data appears.
The default is PlateCarree and works fine but when trying to use a Mercator projection the data shows up but the map extent is changed.
When using a LambertConformal projection the extent appears to be off since no features appear that were suppose to be plotted.
The clean.py script is very ARM-centric and so i'm wondering if it would make sense to include in the armfiles.py file instead of a stand-alone one. Or, as another option, since it is very qc centric, that it be moved into the qc directory and titled appropriately if it is mainly for ARM QC. @kenkehoe thoughts?
It seems like a lot of the initial framework is developed. It seems like a good time to discuss what else we envision adding to this repo in the near and far term.
Right now, the day/night background is automatically displayed if the variable is a 1d time series and there's no "ydata". Should we remove this and leave up to users to display background?
if ydata is None:
self.day_night_background(subplot_index)
ax.plot(xdata, data, '.', color=line_color)
...
This is to track what QC tests need to be added. I added all the numpy masked array options and two more, persistence and single variable comparison.
Another issue that I need to get working on is to have a cleaner way to plot data from multiple objects at a time in one display. Right now you have to merge objects, but it would be nicer to have the display object natively support the display of data from more than one object at a time so that the user does not have to make a new object and hog up memory and resources.
While we have figsize in an example, I don't think it is actually being used in the plotting script. Changing figsize in the example below does not actually change the plot size
display2 = act.plotting.TimeSeriesDisplay(new,figsize=(8,5))
display2.plot('backscatter')
It looks like the AUTHOR and LICENSE files from my initial build using the scientific cookie cutter were pulled over. We should update these accordingly.
@rcjackson @kenkehoe I was looking at adding in a feature for working with a secondary y axis on the timeseries plots, but it is a little more in depth than I thought, mainly due to the set_yrng function. We rely on the subplot indices to set a lot of things and that currently does not include anything about a secondary y-axis. I could put in a work around for now that I think will work, but it's not ideal long-term. Could we adjust the subplot indices at all or are there any other options you could think of?
The day/night plotting background has issues with astral and calculating times for sun rise/set. This is throwing an error when there is no sun during the day. We can fix that with a try/except.
Other issue is that if a user catches the exception the background for polar night comes out as all day. I think this is an issue with how things are plotted.
Title says it all
Initially ACT would add some attributes to the Xarray Dataset that are used by the plotting functions. Some of them include file_dates, file_times, and datastream. We changed to adding them as Dataset global attributes so others who don't use our reader can fix the missing information when needed.
But the part I'm not a fan of is adding these without an indication they are part of ACT not part of the data file. I suggest prepending "__" to any internal information added to the Dataset so we know and can easily remove them if needed. The current variables I see file_dates, file_times, arm_standards_flag and datastream should be changed to __fiile_dates, __file_times, __arm_standards_flag and __datastream.
This is also causing issues with the variables that do exist in the data file like "datastream". We have some code that fixes issues when processing older ARM data that are missing global attributes, particularly datastream. Since ACT is adding that global attribute our code thinks things are OK and we have other issues.
Here is an example of a Dataset global attriubutes:
...
serial_number: P2710385
qc_standards_version: 1.0
qc_method: Standard Mentor QC
qc_comment: The QC field values are a bit packed represen...
zeb_platform: enasondewnpnC1.b1
history: created by user dsmgr on machine garnet at 20...
file_dates: ['20191206']
file_times: ['113100']
datastream: act_datastream
arm_standards_flag: ARMStandardsFlag.OK
__source_files: ['enasondewnpnC1.b1.20191206.113100.cdf']
Everything after history: is added by ACT or DQ Office code. We add __source_files.
There is a need to differentiate between retrievals in ACT that have been vetted through peer-reviewed publications and those that have not. I propose we add an additional folder under retrievals for evaluation scripts. Would calling it "eval" or "exp" for experimental work? Other options?
@scollis @kenkehoe @zssherman @rcjackson Thoughts?
It may be useful to have a way to calculate stability indices for the BBSS data. Perhaps a way to output a table with the plot or a separate file.
In the docs, it says to clone from the AdamTheisen repository, but it's actually better to clone from the ANL-DIGR repository. In the end, we'll fix this whole issue with pip and conda packaging, but this is a reminder for me to change that part of the docs tomorrow.
We need to check if there is location data in the file before calling day_night_background. Right now it crashes if plotting the background with no location data, but we can just make it not do the background if there is no location information.
I am putting in a feature request to have a way to set an absolute limit when making a time series plot so a single very large or small value will not make the plot unusable. I have some ideas on how to implement.
I'll try to get to this soon.
It looks like the way that we are currently doing the documentation is not translating well into the sphinx generated docs.
I.e.
obj : Xarray Dataset Object
creates this:
objXarray Dataset Object
There is no spacing between the variable name and the type. Are we doing this the correct way @rcjackson @zssherman ?
In addition to a TimeSeriesDisplay, we need to have a WindRoseDisplay and a SkewTDisplay object. This needs to be done by the ARM-ASR PI meeting as a demonstration that shows that Paytsar is able to use it for her data.
For Skew-T plots, I am thinking of introducing metpy as a dependency since it does these really well.
Adam mentioned the need to allow data to be in a more general format for this repo to be broader to the open source community. A quick search for how to read CSV data into xarray didn't show any results, but it does look like pandas can read CSV. Can we create a module or example using pandas to show how the CSV data will be read in with pandas and then converted to xarray with DataFrame.to_xarray()?
The title for the ACT repo is missing 'data' and should read Atmospheric data Community Toolkit
When plotting multiple subplots using act.plotting.TimeSeriesDisplay.plot_barbs_from_u_v and adding a colormap to the subplots, all of the colorbars end up being plotted on the last subplot rather than their respective subplots:
I didn't notice anything in the code that I thought would be causing this. Any ideas?
It would be easier for the end user if the examples in the documentation used specific file names rather than wildcards.
We need to find a way to plot out high-res 2D datasets faster. The MPL data takes a ridiculous amount of time to plot up 3 plots. Maybe we need to play around with image plotting of the data as well instead of pcolormesh and see how that works for cases like this.
Hi, i just installed act from conda-forge and this error came up on import:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
~/.conda/envs/cmac_env/lib/python3.6/site-packages/metpy/_version.py in get_version()
13 try:
---> 14 from setuptools_scm import get_version
15 return get_version(root='..', relative_to=__file__,
ModuleNotFoundError: No module named 'setuptools_scm'
During handling of the above exception, another exception occurred:
DistributionNotFound Traceback (most recent call last)
<ipython-input-11-d20e7d291629> in <module>
31 import matplotlib.ticker as mt
32 import matplotlib.font_manager as fm
---> 33 import act
34 get_ipython().run_line_magic('matplotlib', 'inline')
~/.conda/envs/cmac_env/lib/python3.6/site-packages/act/__init__.py in <module>
1 from . import io
----> 2 from . import plotting
3 from . import corrections
4 from . import utils
5 from . import tests
~/.conda/envs/cmac_env/lib/python3.6/site-packages/act/plotting/__init__.py in <module>
19 from .ContourDisplay import ContourDisplay
20 from .WindRoseDisplay import WindRoseDisplay
---> 21 from .SkewTDisplay import SkewTDisplay
22 from .XSectionDisplay import XSectionDisplay
23 from .GeoDisplay import GeographicPlotDisplay
~/.conda/envs/cmac_env/lib/python3.6/site-packages/act/plotting/SkewTDisplay.py in <module>
12
13 try:
---> 14 import metpy.calc as mpcalc
15 METPY_AVAILABLE = True
16 except ImportError:
~/.conda/envs/cmac_env/lib/python3.6/site-packages/metpy/__init__.py in <module>
34 from ._version import get_version # noqa: E402
35 from .xarray import * # noqa: F401, F403
---> 36 __version__ = get_version()
37 del get_version
~/.conda/envs/cmac_env/lib/python3.6/site-packages/metpy/_version.py in get_version()
17 except (ImportError, LookupError):
18 from pkg_resources import get_distribution
---> 19 return get_distribution(__package__).version
~/.conda/envs/cmac_env/lib/python3.6/site-packages/pkg_resources/__init__.py in get_distribution(dist)
480 dist = Requirement.parse(dist)
481 if isinstance(dist, Requirement):
--> 482 dist = get_provider(dist)
483 if not isinstance(dist, Distribution):
484 raise TypeError("Expected string, Requirement, or Distribution", dist)
~/.conda/envs/cmac_env/lib/python3.6/site-packages/pkg_resources/__init__.py in get_provider(moduleOrReq)
356 """Return an IResourceProvider for the named module or requirement"""
357 if isinstance(moduleOrReq, Requirement):
--> 358 return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
359 try:
360 module = sys.modules[moduleOrReq]
~/.conda/envs/cmac_env/lib/python3.6/site-packages/pkg_resources/__init__.py in require(self, *requirements)
899 included, even if they were already activated in this working set.
900 """
--> 901 needed = self.resolve(parse_requirements(requirements))
902
903 for dist in needed:
~/.conda/envs/cmac_env/lib/python3.6/site-packages/pkg_resources/__init__.py in resolve(self, requirements, env, installer, replace_conflicting, extras)
785 if dist is None:
786 requirers = required_by.get(req, None)
--> 787 raise DistributionNotFound(req, requirers)
788 to_activate.append(dist)
789 if dist not in req:
DistributionNotFound: The 'appdirs' distribution was not found and is required by pooch
I would like to change the base behavior when a file is not found with io.read_netcdf() to catch the FileNotFound error and instead just return a value of None. This makes more logical sense to me to use the reading function to go check if the file exists. We do a lot of stuff depending on availability of files and having to wrap everything in a try seems excessive. I would suggest adding a verbose option to make a print statement optional if the file is not found, but not make that the default option. If this is OK I'll make the update.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.