Code Monkey home page Code Monkey logo

mpas-analysis's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mpas-analysis's Issues

Iteration of sea-ice observational time series appears incorrect

When plotting sea-ice aggregate area and volume time series, we also plot the climatological cycle of observational values. This cycle is repeated to cover the full extent of the model time series.
Something must be slightly off when setting the repeated cycle for the observational data, as the model and obs time series appear shifted by quite a bit after many cycles. For example, here is a plot of SH sea-ice area after about 200 years of one of the ACME beta0 simulations:

iceareacellnh_20161117 beta0 a_wcycl1850s ne30_oec_icg edison

CI fails with latest version of xarray

CI has started to fail despite no changes to MPAS-Analysis itself. The error would appear to be because of an update to xarray. Here is the result of pytest:

$ pytest
============================= test session starts ==============================
platform linux2 -- Python 2.7.13, pytest-3.0.5, py-1.4.32, pluggy-0.4.0
rootdir: /home/xylar/code/mpas-work/analysis/mpas_analysis_repo, inifile: 
collected 17 items 

mpas_analysis/test/test_date.py .
mpas_analysis/test/test_interpolate.py ....
mpas_analysis/test/test_io_utility.py .
mpas_analysis/test/test_mpas_config_parser.py .
mpas_analysis/test/test_mpas_xarray.py ....F..
mpas_analysis/test/test_namelist_streams_interface.py ...

=================================== FAILURES ===================================
__________________________ TestNamelist.test_selvals ___________________________

self = <mpas_analysis.test.test_mpas_xarray.TestNamelist testMethod=test_selvals>

    def test_selvals(self):
        fileName = str(self.datadir.join('example_jan.nc'))
        timestr = 'time_avg_daysSinceStartOfSim'
        varList = \
            ['time_avg_avgValueWithinOceanLayerRegion_avgLayerTemperature',
             'refBottomDepth']
    
        selvals = {'nVertLevels': 0}
        ds = xr.open_mfdataset(
            fileName,
            preprocess=lambda x: mpas_xarray.preprocess_mpas(x,
                                                             timestr=timestr,
                                                             onlyvars=varList,
                                                             selvals=selvals,
                                                             yearoffset=1850))
        self.assertEqual(ds.data_vars.keys(), varList)
        self.assertEqual(ds[varList[0]].shape, (1, 7))
>       self.assertEqual(ds['nVertLevels'].shape, ())

/home/xylar/code/mpas-work/analysis/mpas_analysis_repo/mpas_analysis/test/test_mpas_xarray.py:136: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/xylar/anaconda2/envs/mpas_analysis/lib/python2.7/site-packages/xarray/core/dataset.py:698: in __getitem__
    return self._construct_dataarray(key)
/home/xylar/anaconda2/envs/mpas_analysis/lib/python2.7/site-packages/xarray/core/dataset.py:642: in _construct_dataarray
    self._variables, name, self._level_coords, self.dims)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

variables = OrderedDict([(u'time_avg_avgValueWithinOceanLayerRegion_avgLayerTemperature', ...(Time: 1)>
array(['1855-01-13T12:24:13.593599000Z'], dtype='datetime64[ns]'))])
key = 'nVertLevels', level_vars = OrderedDict()
dim_sizes = Frozen(SortedKeysDict({u'nOceanRegionsTmp': 7, u'Time': 1}))

    def _get_virtual_variable(variables, key, level_vars=None, dim_sizes=None):
        """Get a virtual variable (e.g., 'time.year' or a MultiIndex level)
        from a dict of xarray.Variable objects (if possible)
        """
        if level_vars is None:
            level_vars = {}
        if dim_sizes is None:
            dim_sizes = {}
    
        if key in dim_sizes:
            data = pd.Index(range(dim_sizes[key]), name=key)
            variable = IndexVariable((key,), data)
            return key, key, variable
    
        if not isinstance(key, basestring):
            raise KeyError(key)
    
        split_key = key.split('.', 1)
        if len(split_key) == 2:
            ref_name, var_name = split_key
        elif len(split_key) == 1:
            ref_name, var_name = key, None
        else:
            raise KeyError(key)
    
        if ref_name in level_vars:
            dim_var = variables[level_vars[ref_name]]
            ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
        else:
>           ref_var = variables[ref_name]
E           KeyError: 'nVertLevels'

/home/xylar/anaconda2/envs/mpas_analysis/lib/python2.7/site-packages/xarray/core/dataset.py:71: KeyError
===================== 1 failed, 16 passed in 0.87 seconds ======================

Open source

All workflows and analysis scripts are required to be version controlled in an open repository.

python 3 not supported

At present, MPAS-Analysis is not working on python 3.5. This issue is meant to be a dialog about which versions of python and which packages we wish to support. See also #40, #42

Partial list of needed changes:

  • change from % formatting of string to .format(...)

Addition of automatic continuous integration (CI)

Now that there are several pytests use of automatic continuous integration will be helpful to automate testing, reduce human testing time, and help ensure that specific design decisions are maintained within the repository.

This issue's purpose is to socialize this idea to determine if it is still too premature to implement automatic CI.

Standardized file formats

The file formats used within the analysis framework should adhere to some well defined standards. For example, they should be in formats easily accessible by users of the analysis framework (e.g. netCDF, ascii, or similar).
An additional consideration is that any example datasets intended to allow one to play with the analysis framework need to be small in file size (e.g. < 1GB of total example data sets).

Sea ice concentration colorbar label should be 'fraction'

Currently, the plots for sea ice concentration have values that range from 0.0-1.0 but the label on the colorbar is "%". The label should be changed to "fraction" (or the values changed from 0-100).

Attached screenshot shows example:
screen shot 2017-01-09 at 10 44 24 am

Analysis system creates directories in my home directory

When running the analysis system in the tutorial I noticed that the system creates three directories in my home directory at /turquoise/usr/projects/climate/akt
It created these directories:
20161117.beta0.A_WCYCL1850S.ne30_oEC_ICG.edison.test.pp
coupled_diagnostics_20161117.beta0.A_WCYCL1850S.ne30_oEC_ICG.edison-obs
coupled_diagnostics_20161117.beta0.A_WCYCL1850S.ne30_oEC_ICG.edison-obs.logs

This is very obnoxious behaviour since it will clutter my home directory.

Analysis documentation required (with examples)

Each analysis notebook should be configured to work with a dataset (in "example-datasets") to produce a desired analysis with output produced in terms of a saved figure(s). The running of the notebooks operating on example datasets should be specified in a python file in “example-scripts”. Resultant saved figures are be included in a CORE-level README.md (or perhaps grouped by analysis type for finer granularity) to provide a visual example of the analysis capability.

  • All analysis functions must have an appropriate docstring.
  • Provide visual examples of MPAS-Analysis capabilities in README.md

Default config.analysis has un-useful default values

The default values in the config.analysis are not especially useful for someone trying to start to use the system. The values are specific to one particular system and its unclear what things should be set to for a target machine. Does it make sense to make a new directory, say config_templates, and then a set of default config files for machine we use:
config.lanl_ice
config.lanl_ice_seaice_standalone
config.edison
etc ?

Mechanism to access data sets for comparison

Many analysis tools will use datasets (e.g. observations or model results) for comparison with MPAS results. A mechanism should exist for downloading these datasets automatically from data archives (public or private). When possible (i.e. when these datasets are publicly available in the required format), the analysis should acquire these datasets from public sources to ensure the analysis can be used by the broadest possible community.

Observational data should be observational datasets, potentially stored in a separate repository (to keep analysis script repositories light-weight). For example, we could keep them in a centralized location where data that is need is downloaded and cached locally.

  • Keep track of hash sum verification that files are the same.
  • Want this resource to be publicly available (could be https://git-lfs.github.com/ or http://data.kitware.com/ or something else).
  • Only pull in necessary data / provide soft links to data as possible.

Make sure analysis scripts do not rely on mesh info present in input files

Pretty soon we will default to having no mesh information in output MPAS files. Therefore, we need to make sure that none of our analysis scripts rely on having mesh variables in model files.

A related issue is to make sure that the meshfile (initial condition or restart file) has all the info we need for running MPAS analysis scripts (see for example depth-related fields that are necessary for the OHC calculation).

Installable via a single command

To facilitate developers easily using / contributing to the analysis framework, the framework functionality should be able to be installed via a single command.

Selection of specific input files in xarray

At the moment, when loading multi-files with xarray, the input string cannot be a list or include wild-card characters of the kind, say [0-3].
This is not useful when wanting to load a reduced dataset when the goal is to compute climatologies over a certain number of years.

Say for example that we want to compute the following seasonal averages: DJF, MAM, JJA, SON, over year 31-40. We would want to load the following input files:
year3[0-9], year40
but instead xarray only allows something like year3?

Analysis linked to publicly available observational datasets

Observational data should be observational datasets, potentially stored in a separate repository (to keep analysis script repositories light-weight). For example, we could keep them in a centralized location where data that is need is downloaded and cached locally.

  • Keep track of hash sum verification that files are the same.
  • Want this resource to be publicly available (could be https://git-lfs.github.com/ or http://data.kitware.com/ or something else).
  • Only pull in necessary data / provide soft links to data as possible.

Provide user-friendly examples of analysis tools

To facilitate contributions from developers with less python experience to the analysis framework, well documented examples of analyses tools including required inputs and outputs and adhering to the language standards should be provided.

Timeseries data go from Jan of yr1 to Jan of yr2+1, instead of Dec of yr2

  1. When specifying a yr1 and a yr2 for the timeseries plots, the data loaded go through Jan of yr2+1, instead of Dec of yr2.

  2. As an aside, a similar thing happens for climo_yr1 and climo_y2 (data goes from Dec of climo_yr1-1 through Jan of climo_yr2+2), but perhaps this was done on purpose to compute proper seasonal climatologies for DJM? we should look into this too.

Analysis system chokes in sea ice timeseries for 2000 date data

The analysis system cannot handle the following setting:
[time]
climo_yr1 = 1960
climo_yr2 = 1961
yr_offset = 0
timeseries_yr1 = 1960
timeseries_yr2 = 1961

I get the following error:
[akt@wf-fe1 MPAS-Analysis]$ python run_analysis.py -f config.analysis

Plotting sea-ice area and volume time series...
Reading files /net/scratch2/akt/MPAS/rundirs/rundir_QU60km_polar/analysis_members/timeSeriesStatsMonthly.1959-12.nc through /net/scratch2/akt/MPAS/rundirs/rundir_QU60km_polar/analysis_members/timeSeriesStatsMonthly.1962-01.nc
Load sea-ice data...
Compute NH and SH time series of iceAreaCell...
Make plots...
Traceback (most recent call last):
File "run_analysis.py", line 182, in
analysis(config)
File "run_analysis.py", line 154, in analysis
variableMap=seaIceVariableMap)
File "/turquoise/usr/projects/climate/akt/MPAS/analysis/MPAS-Analysis/mpas_analysis/sea_ice/timeseries.py", line 219, in seaice_timeseries
preprocess=lambda x: preprocess_mpas(x,
File "/usr/projects/climate/SHARED_CLIMATE/anaconda_envs/default-2.7/lib/python2.7/site-packages/xarray/backends/api.py", line 306, in open_mfdataset
datasets = [preprocess(ds) for ds in datasets]
File "/turquoise/usr/projects/climate/akt/MPAS/analysis/MPAS-Analysis/mpas_analysis/sea_ice/timeseries.py", line 220, in
yearoffset=yr_offset))
File "/turquoise/usr/projects/climate/akt/MPAS/analysis/MPAS-Analysis/mpas_analysis/shared/mpas_xarray/mpas_xarray.py", line 317, in preprocess_mpas
assert_valid_datetimes(datetimes, yearoffset)
File "/turquoise/usr/projects/climate/akt/MPAS/analysis/MPAS-Analysis/mpas_analysis/shared/mpas_xarray/mpas_xarray.py", line 88, in assert_valid_datetimes
' must be large enough to ensure datetimes larger than year 1678'
AssertionError: ERROR: yearoffset=0 must be large enough to ensure datetimes larger than year 1678

Fully run-time configurable analysis capabilities

A user of the analysis framework should not be required to edit python code to perform the analysis they are looking for. Additionally, to the extent possible, analysis should be configurable via some external capability (i.e. editing plot labels, or color bars?). There should only one user-editable configuration file.

A simple mechanism should exist (e.g. modification of a small number of config options or specification of a small number of command-line arguments) by which a user can either run the full analysis suite or select and run a single analysis tool. This requirement seeks to avoid the user having to “switch off” each analysis tool that the she does not want to use (as is currently the state in the ACME analysis repository).

Ability to purge old analysis results

All products of the analysis framework must be easily removed to ensure there is no mix of new and old analysis products, or products representing multiple model runs.

Provenance stored in user-viewable output files

The analysis framework is required to be reproducible, and fully descriptive. To this end, a mechanism is required for documenting as much as possible about a given run through the analysis framework. This could include information such as the git version of the analysis framework, information about all of the input files, commands run, etc. The analysis framework should provenance all the data sources (including model hashes and configurations as well as observational data sources) and operations performed on this data that are needed to fully reproduce the produced analysis, independent of machine and user. Ultimately, analysis package operations could be documented in a history file that is used by the analysis package to fully replicate the analysis.

Summary of key analysis tasks

Key tasks to be completed include:

  • Add namelist and streams file interfaces (#27)
  • Ingest namelist and streams file data, e.g., don't hard code paths to model output but instead get path from streams file (#20 and #38)
  • Support time-series analysis over a subset of the output data (#45)
  • Generalize use of variable names to support multiple versions of ACME (e.g., account for changing of variable names under different versions) (#20)
  • Update submodule in PreAndPostProcessingScripts following support for reading namelist and streams files (#28)
  • mpas_xarray needs to support different approaches to specification of time output for AMs (#33, #38, others?)
  • Add a check on whether the needed AM for the analysis was turned on. For example, we will always need timeSeriesStats, but of course we'll want specific am's for metrics such as the OHC, MOC, etc. (#58)
  • Deal with problem of too many open files for xarray (#49)
  • See if we can generate a html interface for better sharing/visualization of MPAS standalone results
  • Make sure we support changes in MPAS-SI timeSeriesStats before they are merged in ACME
  • Add MHT script
  • Add MOC script (for offline calculation only for the moment)
  • Make sure analysis scripts do not rely on mesh info present in input files (#30)

Lower priority for now:

  • Generalize interpolation module (to include ncremap and possibly another method)
  • Add compute-climatologies module
  • Generalize modelvsobs for ocean and sea-ice (#31)
  • Do not re-compute climatologies or do not interpolate if these actions have already been done (files exist)
  • Perhaps at some point we should save time series to file (especially at high-res and when using many output files)
  • Add capability to compare time series with results from a different simulation
  • Improvement of documentation (#3, #9, #11)
  • General clean up (#18, #43, #19, #30)
  • Add changes when short-term archiving will be adopted in ACME

Analysis framework should be installable via pip with analysis callable as functions

Framework functionality should be easily installed via pip so that analysis can be conducted within an ipython or python level with minimal additional effort, e.g.,

from analysismpas import compute_analysis

compute_analysis(inputdir=<location of ACME run>, outputdir=<location for output from analysis>)

This assumes that ipython notebooks have been sufficiently abstracted into modular python functions.

Additionally, the framework needs to eventually be installed via conda.

Support time-series analysis over a subset of the output data

Currently, the the code will only work as expected if config option timeseries_yr1 = 1 and timeseries_yr2 is set to something beyond the end of the current run (e.g. 9999), which are the default settings. If timeseries_yr1 > 1, the code crashes, likely because mpas_xarray is expecting the time series to start from year 1 (it isn't currently making use of timeseries_yr1). If timeseries_yr1 = 1 and timeseries_yr2 is set to be less than the total duration of the run, the timeseries analysis may be performed beyond the desired end time. This is because we add one file with date stamp beyond the final date to be sure we're not missing any important output.

The time axis of the xarray data sets should be sliced to be consistent with these two new config options in each timeseries analysis function.

Design Document: Generalized Horizontal Interpolation

Title: Generalized Nearest-neighbor and First-order Interpolation in MPAS-Analysis
Xylar Asay-Davis
date: 01-19-2017

Summary

Many types of analysis in MPAS will require fields that are interpolated from MPAS grids to arbitrary points. Currently, nearest-neighbor interpolation is supported in MPAS-Analysis, but only for points on a uniformly spaced lat-lon grid. Building on the existing approach, this design document describes methods for performing both nearest-neighbor and first-order (linear or similar) interpolation from MPAS grids to arbitrary points.

Requirements

Requirement: Support for arbitrary output interpolation points Date last modified: 2017/01/19 Contributors: Xylar Asay-Davis

The calling code should be able to supply any desired interpolation points, not just a regular latitude-longitude grid.

Requirement: Cartesian input mesh Date last modified: 2017/01/19 Contributors: Xylar Asay-Davis

The source mesh should be described in Cartesian, rather than lon-lat, space. This is more consistent with how MPAS meshes are constructed (e.g. they have Voronoi cells only in Cartesian coordinates, not in lon-lat coordinates) and also is more conducive to supporting both spherical and planar meshes.

Requirement: First-order interpolation Date last modified: 2017/01/19 Contributors: Xylar Asay-Davis

The option to interpolate smoothly (e.g. linearly or with barycentric coordinates) between cell-centered values should be added. The calling code should easily be able to select either nearest-neighbor or first-order interpolation with a flag as part of the initialization step is required and from then on should not need to keep track of which method is being used.

Requirement: Interpolation should handle periodic boundaries Date last modified: 2017/01/19 Contributors: Xylar Asay-Davis

Interpolation code should be aware of periodic boundaries and should perform interpolation that is consistent with those boundaries.

Consideration: Support caching results from any costly, one-time geometric computations Date last modified: 2017/01/20 Contributors: Xylar Asay-Davis

For many potential algorithms used to perform interpolation, there is likely to be a relatively costly step of computing fields such as indices into input data fields and interpolation weights that 1) only need to be computed once for a given input mesh and set of output points and 2) are independent of the data in the field being interpolated. If this data were cached, it could mean that rerunning the analysis (which might be very desirable, e.g., while monitoring the progress of a run) would be much cheaper than the initial run.

Algorithmic Formulations

Design solution: Support for arbitrary output interpolation points Date last modified: 2017/01/19 Contributors: Xylar Asay-Davis

The calling code will be required to supply the interpolation points in cartesian space instead of supplying lon-lat bounds and steps. This will allow greater generality. A helper function will be provided that can produce a regularly spaced lon-lat grid. Another helper function (lon_lat_to_cartesian) is already available for converting from lon-lat to Cartesian space. Note: code calling using this helper function should use the spherical radius from an MPAS restart or input file instead of using the default value.
The KD tree method currently used for nearest-neighbor interpolation does not care about how the interpolation points are spaced and thus will remain largely unchanged.

Design solution: Cartesian input mesh Date last modified: 2017/01/19 Contributors: Xylar Asay-Davis

Currently, lonCell and latCell are passed as input arguments. For efficiency, it would make more sense to use xCell, yCell, zCell rather than lonCell and latCell. (Currently, fields equivalent to xCell, yCell and zCell are being computed from lonCell and latCell in the nearest neighbor code.) Moreover, to support more generality in the interpolation algorithm (see next design solution), it may be best to pass an xarray data set containing xCell, yCell and zCell but also other mesh variables needed for linear interpolation.

Design solution: First-order interpolation Date last modified: 2017/01/19 Contributors: Xylar Asay-Davis

Linear interpolation will require locations of MPAS mesh cells, vertices and a list of cellsOnVertex. Therefore, it seems simplest if the routine for initializing interpolation takes an MPAS mesh data set as an input argument, from which it can read whichever geometric information is needed for interpolation.
The output of the initialization routine (for either linear or nearest-neighbor interpolation) will be a 2D array of MPAS cell indices and corresponding interpolation weights. The first dimension in each array corresponds to the output interpolation points. The second index is over input cells neighboring the output point (either a single entry for nearest-neighbor or vertexDegree entries for linear).
The algorithm for linear interpolation will be as follows:
Store the shape of the array of output points
Use a KD tree method (the same currently used to find nearest neighboring cells) to find nearest neighboring vertices to each output point
For each output point at (xp, yp, zp), the interpolation weight for each neighboring cell is given by barycentric weights (e.g. http://answers.unity3d.com/questions/383804/calculate-uv-coordinates-of-3d-point-on-plane-of-m.html). If only 2 neighboring cells are available, linear interpolation based on distance between the cells should be used instead. If only one neighboring cell exists, the weight of that cell is (obviously) 1. Note: if we want to support vertexDegree = 4, we would need to either use Wachspress coordinates or divide each quadrilateral into 2 triangles.
Store the indices and weights for interpolating arbitrarily many fields
For each field, interpolation is as simple as indexing the field at the given indices, multiplying by the weights, summing over neighbors, and reshaping the field to conform to the shape stored in step 1.
It will likely be desirable to create an Interpolator class that keeps track of the indices, weights and output shape so these don't need to be stored or passed in later as input arguments. This approach is also consistent with how scipy handles interpolation (https://docs.scipy.org/doc/scipy-0.18.1/reference/interpolate.html)

Design solution: Interpolation should handle periodic boundaries Date last modified: 2017/01/19 Contributors: Xylar Asay-Davis

My first instinct is that we typically use smaller meshes when we use periodic boundaries, meaning it might be feasible to just make periodic copies of the mesh variables. This is what I have typically done to support periodic mesh computations in POP. For now, this is largely just a placeholder until I can give this some more thought.

Design and Implementation

Implementation: short-desciption-of-implementation-here
Date last modified: 2011/01/05
Contributors: (add your name to this list if it does not appear)

This section should detail the plan for implementing the design solution for requirement XXX. In general, this section is software-centric with a focus on software implementation. Pseudo code is appropriate in this section. Links to actual source code are appropriate. Project management items, such as svn branches, timelines and staffing are also appropriate. How do we typeset pseudo code?

Testing

Testing and Validation: short-desciption-of-testing-here
Date last modified: 2011/01/05
Contributors: (add your name to this list if it does not appear)

How will XXX be tested? i.e. how will be we know when we have met requirement XXX. Will these unit tests be included in the ongoing going forward?

For those with access to the ACME Confluence pages, see the following for more discussion:
https://acme-climate.atlassian.net/wiki/display/OCNICE/Analysis+-+Generalized+Nearest-neighbor+and+Linear+Interpolation

Design Document: Config File Reorganization

Config File Reorganization
Xylar Asay-Davis
date: 01-29-2017

Summary

This document describes various efforts to clean up the structure of the MPAS-Analysis config file. The idea is to create a template config file that will replace `config.analysis` as well as a number of example config files designed to make use of various MPAS and ACME runs on various machines. The reorganization should make the analysis easier for users to modify and run.

Requirements

Requirement: a simple way of turning on and off individual analysis modules
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

There should be a simple, intuitive method for turning on and off individual analysis modules (e.g. ocean/ohc_timeseries). This should replace the current approach of having a boolean generate flag for each analysis module in a separate config section. Preferably, there should be an equivalent method for turning on and off analysis modules from the command line that overrides that in the config file.

Requirement: there should be a simplified template for config files
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

The current example config file is specific to a run on Edison and should be made into a general template. Simplifications should be made to the template so that it can more easily and intuitively be modified for several analyses. Example config files should also be added for analyzing several existing runs on several different machines.

Requirement: removal of ACME specific config options
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

To the extent possible, ACME-specific config options such as casename and ref_casename_v0 should be eliminated in favor of something more general.

Requirement: optional reference dates for each model run or set of observations
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

Currently, there is an option yr_offset that is intended to apply to all dates and observations. This option should be removed and possibly be replaced by an input and output reference date for each model run and observational data set. Discussion is needed if these reference dates are actually needed once we find a more generalized calendar (e.g. netcdftime).

Design and Implementation

Implementation: a simple way of turning on and off individual analysis modules
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

The following comment describes the planned implementation in the config file.

# a list of which analyses to generate.  Valid names are:
#   'ohc_timeseries', 'sst_timeseries', 'sst_modelvsobs', 'sss_modelvsobs',
#   'mld_modelvsobs', 'seaice_timeseries', 'seaice_modelvsobs'
# the following shortcuts exist:
#   'all' -- all analyses will be run
#   'all_timeseries' -- all time-series analyses will be run
#   'all_modelvsobs' -- all analyses comparing model climatologies with
#                       observations will be run
#   'all_ocean' -- all ocean analyses will be run
#   'all_seaice' -- all sea-ice analyses will be run
#   'no_ohc_timeseries' -- skip the 'ohc_timeseries' (and similarly with the other analyses).
#   'no_ocean', 'no_timeseries', etc. -- in analogy to 'all_*', skip the given category of analysis
generate = ['all']

Where there are conflicts between items in the generate list, successive items will override earlier items. For example, generate = ['all', 'no_ohc_timeseries'] will generate all analyses except ohc_timeseries. As another example, generate = ['all', 'no_ocean', 'all_timeseries'] would generate all diagnostics except those comparing ocean model results with observations (and previous model results). (Note that a more efficient and intuitive way to do the same would be generate = ['all_seaice', 'all_timeseries'].)

An analogous approach will also be added at the command line, for example:

./run_analysis.py config.analysis --generate all,no_ocean,all_timeseries

(I am open to other syntax.) If the --generate flag is used on the command line, it will replace the generate option in the config file.

As an aside, I note that it is not clear if future analysis modules will fit neatly into categories like "time series" and "model vs. observations", and these categories are not meant to be all-encompassing.

Implementation: there should be a simplified template for config files
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

Such a template has been implemented in #86. A subdirectory configs will be added with several examples from runs on LANL IC and on Edison at NERSC. Other examples can be added as appropriate and useful.

Implementation: removal of ACME specific config options
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

This item needs some discussion. in #86, I have moved the casename and ref_casename_v0 options to an ACME section. These names are used for file names and figure titles (essentially legends). It would be useful to discuss what the relevant equivalents would be fore standalone MPAS runs.

Implementation: optional reference dates for each model run or set of observations
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

My current proposed solution in #86 is to remove yr_offset in favor of input_ref_date and analysis_ref_date for each set of model runs and observations. Note that input_ref_date for the current run would be read in from a restart file (simulationStartTime) rather than being given by the config file. As I said above, discussion is needed if these reference dates are actually needed once we find a more generalized calendar (e.g. netcdftime).

PR #86 also suggests that the bounds of time-series and climatology analyses be given by dates used in the analysis itself, not the input data (e.g. 1855 to 1859, not 6 to 10). Again, these concepts may no longer be necessary once we switch to a more flexible calendar.

Testing

Testing and Validation: a simple way of turning on and off individual analysis modules
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

This will be difficult to test with CI since we currently don't have CI that can perform tests with run_analysis.py.

Instead, I will manually test a variety of combinations of generate lists (both in the config file and on the command line). I will list the tests I perform in #86.

Testing and Validation: there should be a simplified template for config files
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

There is not a way to test the template in the usual sense. Instead, the test will be asking other developers and users to adapt the template for new runs to make sure it is intuitive.

Testing and Validation: removal of ACME specific config options
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

The analysis will be tested on both MPAS and ACME runs to make sure the resulting image files have useful file names and titles (legends).

Testing and Validation: optional reference dates for each model run or set of observations
Date last modified: 2017/01/29
Contributors: Xylar Asay-Davis

Tests will be made with MPAS runs that start at both 0001 and a calendar date such as 1959. In both cases, the analysis should work correctly and be able to be compared with observations using real calendar dates within the analysis. (The details of these tests depend somewhat on the implementation, which is subject to further discussion.)

Document configuration options used for analysis runtime

A config file that fully can reproduce an analysis should be output with MPAS-Analysis results to provide a reproducible-capability, e.g., MPAS-Analysis can be run with this config file to regenerate results.

This is needed because there are already a few config options that are generated in the code that may not be present in the config file. For instance, default values may change between MPAS-Analysis versions and this ensures that the precise parameters used for an analysis run are recorded.

Update submodule in PreAndPostProcessingScripts following support for reading namelist and streams files

Following completion of recent improvements to generalize usage of MPAS-Analysis we will need to update the submodule for inclusion in ACME (https://github.com/ACME-Climate/PreAndPostProcessingScripts). The goal of this issue is to document PRs that we want to make sure are included as part of this update process.

A partial list of improvements (issues / PRs) to be included:

  • #26 (generalized run analysis)
  • #27 (namelist / streams file interfaces)
  • #38 (read configuration from namelist files and stream file names from streams file)
  • #45 (support time-series analysis over a subset of the output data)

Modular Design

The analysis framework should be designed and implemented in as modular a way as possible to improve the ability for other analyses to re-use portions of previous analyses.

All "scripts" must work without user modification

All scripts work without user input (e.g., on example data) or for specific user configurations, reporting errors if they occur. This issue requires at least the following

  • some small output, e.g., from a GMPAS-240 run, in order to have appropriate data sets for this to work "out of the box" (to resolve #3)
  • a configuration scheme that allows scripts to be run in batch mode (#6) following modification of a configuration file (namelist?)

Following this issue only "working" code should be committed to the repository.

develop branch

I went ahead and created a develop branch and made it the default for this repo. However, this deserves discussion and buy-in from everyone involved in this repo. This issue page is meant to allow us to discuss any cons and to make sure everyone knows what changes were made and why.

Provenancing of xarray output files

Files produced by an xarray save via method to_netcdf should retain information related to how they are generated. This includes at a minimum the machine, script and arguments called, user, and date generated, although there are probably other important criteria too (git hash). This issue should be used to discuss relevant criteria that are useful prior to an implementation of a wrapped to_netcdf function.

See also pydata/xarray#826

Moved from pwolfram/mpas_xarray#5

Supports ingesting arbitrary MPAS output files (in general, input info from namelist and streams files)

The MPAS analysis framework is expected to support stand-alone MPAS workflows, as well as ACME workflows. As such, the analysis framework should be capable of ingesting MPAS output, regardless of how it was generated.

NOTE: This could imply that an input to the analysis framework should be the namelist and streams file used to generate the output, which can then be parsed to easily understand what output was generated, and where it should exist.

Robust exception handling

The analysis framework should be fault-tolerant to a variety of error situations. For example, when a data set or input file is missing, or an analysis is misconfigured. Individual processes should be allowed to "error out" without halting the system.

Date class only supports 365 day calendar

Currently the Date class used to parse MPAS date strings assumes that the calendar is the 365-day calendar with no leap years. This needs to be generalized (perhaps using netcdftime).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.