cosima / cosima-cookbook Goto Github PK

View Code? Open in Web Editor NEW

56.0 10.0 25.0 627 KB

Framework for indexing and querying ocean-sea ice model output.

Home Page: https://cosima-recipes.readthedocs.io/en/latest/

License: Apache License 2.0

Python 99.78% Mako 0.22%

ocean analysis

cosima-cookbook's Introduction

cosima-cookbook package

This repository hosts the cosima_cookbook which is a Python package for managing a database of ocean model output and loading the output via xarray.

⚠️ The cosima_cookbook Python package is deprecated and no longer being developed! ⚠️

Use the ACCESS-NRI Intake catalog instead.

What now? Where should I go?

We refer users to COSIMA Cookbook repository where they will find tutorials and 'recipes' (that is, examples) of various analyses that one can do using ocean-sea ice model output.

cosima-cookbook's People

Contributors

Stargazers

Watchers

cosima-cookbook's Issues

Running notebooks on Raijin?

I know that some time ago, James Munroe had ideas about how we could run notebooks, or perhaps just python code, on Raijin using cookbook functionality.

I think that he would start an interactive job, allocate the right amount of memory, use dask to set up workers and then run his job there.

If we could implement this in the cookbook, it would allow us to do more of the big, or memory intensive, calculations away from the VDI, noting that each individual can only log onto a single VDI node at any time.

I'm not sure how it could be done, but it would be worth looking into at some stage.

Improve Cookbook Organisation & Instructions (eg give directory tree explanation)

Loading ice data

@aekiss & I have been playing with one his ice scripts, trying to load some data. Basically, we have lost the capacity to load the whole ice dataset, and it's not clear why. The error can be easily replicated:

tmp = cc.get_nc_variable('01deg_jra55v13_iaf', 'iceh.????-??.nc', 'aice_m', time_units=None, n=None)

This is met with a dead kernel after two or so mins. Note that when N=-200 it works fine. N=-300 fails. There are only 396 files. Starting a dask-scheduler doesn't help, as it doesn't even seem to get to anything that can use all the cores. So, I don't think this is a memory issue, but I don't know what it is.

Options I can think of are:

a single file which is causing a problem?
just too many files?
Any other ideas or ways to get info out of the system??

Incorporate in Nic's plots and tests into the cookbook

See https://github.com/nicjhan/cm-tools

make Jupyter_vdi script work with passwords and ssh keys

Not everyone uses SSH keys. Extend script to ask user for password as well.

jupyter widget plots

I am seeing lots of error messages about jupyter widgets in the current plotting routine - something like the following before each plot:

Failed to display Jupyter Widget of type HBox.
If you're reading this message in Jupyter Notebook or JupyterLab, it may mean that the widgets JavaScript is still loading. If this message persists, it likely means that the widgets JavaScript library is either not installed or not enabled. See the Jupyter Widgets Documentation for setup instructions.
If you're reading this message in another notebook frontend (for example, a static rendering on GitHub or NBViewer), it may mean that your frontend doesn't currently support widgets.
Failed to display Jupyter Widget of type HBox.
If you're reading this message in Jupyter Notebook or JupyterLab, it may mean that the widgets JavaScript is still loading. If this message persists, it likely means that the widgets JavaScript library is either not installed or not enabled. See the Jupyter Widgets Documentation for setup instructions.
If you're reading this message in another notebook frontend (for example, a static rendering on GitHub or NBViewer), it may mean that your frontend doesn't currently support widgets.

Are others getting this?
Is this due to the tqdm_notebook additions in plotting functions?

Datetime indices don't match between old and new versions of ACCESS-OM2

Since the update to yatm and libaccessom2 the dates in the data files are no longer consistent.

With the older data files you could access them through the cookbook like so:

expt =  '025deg_jra55v13_ryf8485_KDS75'
variable = 'ke_tot'
darray = cc.get_nc_variable(expt,
                             'ocean_scalar.nc',
                             variable,
                             time_units='days since 1900-01-01')

In the data file the time axis is actually days since 0001-01-01 00:00:00, and the origin of the time axis was the same: dates started in year 0001. By specifying time_units='days since 1900-01-01' the time axis is remapped to start at a time that is within the limits of pandas time stamps.

The new data files have their time axis specified as days since 0001-01-01 00:00:00 but the values in the time dimension in the first file are 693150.5, which means they start at year 1900. The workaround for this is to not specify time_units in the call to get_nc_variable (or set it to None). Then no date/time shift is performed and it works fine.

The issue comes with convenience routines like cc.plots.annual_scalar, which take multiple experiments as arguments and try to plot them together.

Some possible solutions:

Move all time series to an arbitrary date and then say they are “years since start of run”
Check for origin and apply shift only to those that need it

There is an additional hurdle/wrinkle. xarray now supports arbitrary dates and unusual calendars by using the cftime library and and creating a CFTimeIndex

http://xarray.pydata.org/en/stable/time-series.html#non-standard-calendars-and-dates-outside-the-timestamp-valid-range

This needs to be explicitly "turned on" with an xarray option (until v0.11 is released) and there is currently no support for resampling, which is how the annual mean is calculated internally in cc.plots.annual_scalar:

https://github.com/OceansAus/cosima-cookbook/blob/master/cosima_cookbook/diagnostics/simple.py#L20

get_nc_variable misses some ocean_scalar files??

I've noticed that our vanilla diagnostics of ocean metrics sometimes has some discontinuities. For example, the following code:

cc.get_nc_variable('025deg_jra55v13_ryf8485_KDS50', 'ocean_scalar.nc',
                   'temp_global_ave',n=25,time_units='days since 1900-01-01').plot()

gives the following plot:

I had always assumed that there were a few missing data files, since the error is repeatble and occurs for all variables -- but all files are present and when viewed through ncview it looks fine:

It also seems to work fine if we import with xr.open_mfdataset.

So, what is stopping get_nc_variable from finding all the files? In this case it seems to be .../cosima/access-om2-025/025deg_jra55v13_ryf8485_KDS50/output023/ocean/ocean_scalar.nc that is being missed.

Any ideas?

Index querying

The get_ncfiles, get_variables and get_nc_variable functions do not take configuration as an argument. If there are identically named experiments in multiple configurations they will all be returned.

This might not be what is desired and I don't think is the intent of these functions.

Happy to be told I am wrong.

Too many open files error in get_nc_variable

cc.get_nc_variable('1deg_jra55v13_iaf_spinup1_A', 'iceh.????-??.nc', 'vicen_m')

gives

OSError: [Errno 24] Too many open files

This dies in open_dataset after doing 963 of 3600 files.
This is trying to combine 3600 files so that may be asking a bit much. But it's unclear to me why open_dataset can't open and close each file in turn.
So this is probably a limitation/bug in open_dataset but we could possibly work around it by sending size-limited batches to open_dataset and then combining the resulting datasets.

edits to kinetic energy documentation

I was reading the documentation at the top of the file

cosima-cookbook/diagnostics/kinetic_energy.ipynb

I suggest the following edits to the "theory" section.

1/ For a hydrostatic ocean like MOM5, the relevant kinetic energy per mass is 0.5* (u^2 + v^2). The vertical velocity component, w, does not appear in the mechanical energy budget. It is very much subdominant. But more fundamentally, it simply does not appear in the mechanical energy buget for a hydrostatic ocean. So there should be no w as part of the documentation.

2/ It is stated that the kinetic energy per mass is 0.5rho(u^2 + v^2 + w^2). In fact, the kinetic energy per mass has no rho factor (and the w should be dropped for hydrostatic).

Steve

Importing expts is broken

Seems like a lot of the diagnostic notebooks have a line like this:

from cosima_cookbook import build_index, expts

Which is now broken, as expts cannot be found

package cosima-cookbook

The cookbook needs to be packaged so that the right dependencies are in place.

Make list of cookbook dependencies
continue to flesh out setup.py
push to pypi or conda-forge (?)
add documentation on how to get cookbook up and running

Caching things in /tmp/joblib

I encountered a new problem today when running some simple cookbook scripts, which boiled down to this:

/g/data3/hh5/public/apps/miniconda3/envs/analysis3/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
218 return
219 try:
--> 220 mkdir(name, mode)
221 except OSError:
222 # Cannot rely on checking for EEXIST, since the operating system

PermissionError: [Errno 13] Permission denied: '/tmp/joblib/cosima_cookbook'

The issue here was that someone else (presumably using the same VDI node as me) had commandeered /tmp/joblib on that node:

drwxr-xr-x 2 cc7576 v14 4096 Mar 6 13:40 joblib

meaning that I couldn't write there.

I got around it by setting a different directory for my cache:

cachedir='/tmp/amh157'

but clearly this isn't ideal in the long run.

Any ideas for how we should approach this?

Collapsible code cells for readability?

Perhaps collapsible code cells might help with readability of the readthedocs output?

http://joergdietrich.github.io/notebook-collapse-input.html

build_index() cache is not refreshed when new output or experiments are added

Since the cache for build_index() depends only on the function arguments, the cache is not properly invalidated when new output directories or experiments are added.

Ertel potential vorticity

Does anyone have a Python kernel to compute the Ertel PV in a Boussinesq fluid?

Q = [f \hat{\bm z} + \nabla \wedge {\bm v} ] \cdot \nabla b

where

\nabla b = (g/\rho0)*(\alpha \nabla \theta - \beta \nabla S)

Develop single issue diagnostics

We need some very basic diagnostics that do only one thing.

get_nc_variable/build_index calendar issue

I've encountered an issue using get_nc_variable in combination with build_index when loading variables using an explicit path to an experiment directory. Loading variables in this way seems to omit calendar information, so leap years and month lengths are muddled up.

This is what I'm currently doing:

import cosima_cookbook as cc
exp_dir =  '/g/data/v45/rm2389/Freshwater_Experiments/025deg_tests/rcp45/'
cc.build_index(exp_dir_list = exp_dir)
runoff = cc.get_nc_variable(exp_dir, 'ocean_month.nc', 'runoff')

The variable loads fine, but the time information is incorrect. If I specify time_units = None in get_nc_variable the dates are closer the dates in the time_stamp.out file, but with an offset that seems to match a leap year issue. The nc file itself specifies no leap year calendar,

 double time(time) ;
                time:long_name = "time" ;
                time:units = "days since 0001-01-01 00:00:00" ;
                time:cartesian_axis = "T" ;
                time:calendar_type = "NOLEAP" ;
                time:calendar = "NOLEAP" ;
                time:bounds = "time_bounds" ;

but I think this isn't being included in the get_nc_variable step.

The issue doesn't occur when I load variables from experiments in the central cosima directory (i.e. just specifying an experiment name, not a path within which I've built an index) that have the same calendar information as above.

use regionmask package?

Looks like this could be very handy

http://regionmask.readthedocs.io/en/stable/index.html

Include time_bounds metadata in DataArrays

It would be helpful to include time_bounds metadata in the DataArrays returned by get_nc_variable and getvar so we know exactly what time period has been averaged over.

Should just be a matter of storing time_bounds in the DB and doing something like ds.attrs.update({'time_bounds': whatever}) on the DataArray returned by get_nc_variable and getvar?

"Memory" issues

In some cases the cookbook gives us random errors when accessing large datasets. A poitn in case is this script:

https://github.com/OceansAus/cosima-cookbook/blob/master/ContributedExamples/KDS75_Ice_Concentration.ipynb

If you try to run it, then you will find that this line works:

pert_i=cc.get_nc_variable('kds75_cp','ice_month.nc','CN',
                    chunks={'time':None,'ct':5,'xt':100,'yt':100},
                    time_units='days since 1860-01-01',n=50)

but when you use n=60 the kernel dies.

Paul & I have always thought this was memory, but the errors I get in the server log seem to imply something went wrong with "tornado", whatever that is.

Advice appreciated.

Bokeh does not yet support contour plots

https://stackoverflow.com/questions/33533047/how-to-make-a-contour-plot-in-python-using-bokeh-or-other-libs

Diff and merge of notebooks for revision control

This is an interesting approach, auto save the python source and track both the source and the notebook ...

http://stackoverflow.com/a/25765194

And this looks ... complicated, perhaps unnecessarily so?

kynan/nbstripout#44

Investigate other ocean diagnostics package

Are there standard diagnostics for ocean models that should be incorporated here?

ESMValTool

http://esmvaltool.org
https://github.com/ESMValGroup/ESMValTool ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP

CDFTOOLS

https://github.com/meom-group/CDFTOOLS A fortran package for analysis and diagnostics on NEMO ocean model output.

Remove dependency on v45

Currently users need to be in v45 to use the cookbook.

Should we change
https://github.com/OceansAus/cosima-cookbook/blob/master/cosima_cookbook/netcdf_index.py#L32
from
cosima_cookbook_dir = '/g/data1/v45/cosima-cookbook'
to something like
cosima_cookbook_dir = '/g/data3/hh5/tmp/cosima/cosima-cookbook'
so users don't need to be in v45?

Also move existing db and create symlink back to old location for backward compatibility

mkdir -p /g/data3/hh5/tmp/cosima/cosima-cookbook
mv /g/data1/v45/cosima-cookbook/cosima-cookbook.db /g/data3/hh5/tmp/cosima/cosima-cookbook/cosima-cookbook.db
ln -s /g/data3/hh5/tmp/cosima/cosima-cookbook/cosima-cookbook.db /g/data1/v45/cosima-cookbook/cosima-cookbook.db

Error messages from loading files

I'm getting lets of database-related errors when loading data. They don't seem to cause problems, except for obscuring results and worrying the naive user. An example is below. Would be good if we knew how to clean these up?

ERROR:sqlalchemy.pool.StaticPool:Exception during reset or similar
Traceback (most recent call last):
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.10/lib/python3.6/site-packages/sqlalchemy/pool.py", line 709, in _finalize_fairy
    fairy._reset(pool)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.10/lib/python3.6/site-packages/sqlalchemy/pool.py", line 880, in _reset
    pool._dialect.do_rollback(self)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.10/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 459, in do_rollback
    dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139787282700032 and this is thread id 139785390315264.
ERROR:sqlalchemy.pool.StaticPool:Exception closing connection <sqlite3.Connection object at 0x7f224a41ad50>
Traceback (most recent call last):
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.10/lib/python3.6/site-packages/sqlalchemy/pool.py", line 709, in _finalize_fairy
    fairy._reset(pool)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.10/lib/python3.6/site-packages/sqlalchemy/pool.py", line 880, in _reset
    pool._dialect.do_rollback(self)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.10/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 459, in do_rollback
    dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139787282700032 and this is thread id 139785390315264.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.10/lib/python3.6/site-packages/sqlalchemy/pool.py", line 314, in _close_connection
    self._dialect.do_close(connection)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.10/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 465, in do_close
    dbapi_connection.close()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139787282700032 and this is thread id 139785390315264.

Allow searching across multiple index databases

Following on from #82 (comment), it could be good to allow users to create their own index database so they can work with their own data without needing write access to a centralised index.
This would also allow users to keep their data private, which may or may not be a good thing.

Build sphinx documentation from Jupiter notebooks

Follow design of scipy-cookbook do that the documentation site http://cosima-cookbook.readthedocs.io is automatically populated with the notebook contents.

support for data discovery

I think it would be helpful to enhance the database to better support data discovery, e.g. to allow implementation of a cc.discover(pattern) function that would take a pattern and do a GLOB and/or LIKE SQL query to return any (variable, experiment, ncfile, longname, units) tuples that match (which could then be used to call get_nc_variable). It might also be handy to have optional experiment and ncfile arguments to restrict the search.

It would be most useful if this would also search longnames so that a user can discover what variable names are used for particular physical quantities. I guess for compactness this would be best stored as a separate table that just contains all the unique variable names and their associated longname, which would need to be updated by build_index.

Datadatabase has some entries without chunking, causes errors

It appears there are some netCDF files without chunking that have been slurped into the datadatabase which causes errors like this in many codes:

/short/v45/aph502/cosima-cookbook/cosima_cookbook/netcdf_index.py in get_nc_variable(expt, ncfile, variable, chunks, n, op, time_units)
    162 
    163     print ('chunking info', dimensions, chunking)
--> 164     default_chunks = dict(zip(dimensions, chunking))
    165 
    166     if chunks is not None:

TypeError: zip argument #2 must support iteration

Speed up database querying

It is possible that the database will be overhauled as this project progresses, but in the meantime we can gain significant improvements by indexing on key columns in the database. From my own testing, an easy fix is: CREATE INDEX experiment_id ON ncfiles (experiment);.

Before index creation, a sample query:

sqlite> select distinct ncfile, dimensions, chunking from ncfiles
  WHERE experiment = '1deg_jra55v13_iaf_spinup1_A'
    AND (basename_pattern = 'ocean.nc' OR basename GLOB 'ocean.nc')
    AND variable in ('u')
  ORDER BY ncfile;
...
CPU Time: user 9.605138 sys 18.436621

Index creation on the full database, post-creation:

sqlite> create index experiment_id on ncfiles (experiment);
CPU Time: user 84.730643 sys 139.632335

Same query, after index creation:

sqlite> select distinct ncfile, dimensions, chunking from ncfiles
  WHERE experiment = '1deg_jra55v13_iaf_spinup1_A'
    AND (basename_pattern = 'ocean.nc' OR basename GLOB 'ocean.nc')
    AND variable in ('u')
  ORDER BY ncfile;
...
CPU Time: user 0.168388 sys 0.589691

This is a few orders of magnitude faster. Obviously this isn't the most rigorous of tests (filesystem caching etc.) One thing I'm not sure about is the effect on build_index

Multiple Jupiter users on same VDI node

If there is already a user running Jupyter on a particular VDI node, then a second user will need to use a different port for Jupyter. Since the jupyter_vdi.py is hard coded to expect :8888 to be used, it will not work.

Todo: jupyter_vdi.py needs to actually read the port that Jupyter is using instead of assuming :8888.

Bug perhaps?

@Josue-Martinez-Moreno: I assume this is a bug, right?
https://github.com/OceansAus/cosima-cookbook/blob/63a660cfc8e630ad1d3730bcaaaadb0bb1f2baf2/scripts/connect_jvdi.sh#L28

More automatic handling of dask

Until recently, I have always used cc.start_cluster() to start up multiple cores. But @angus-g 's recent work has shown we can do a better job by starting a scheduler using the following protocol:

In a terminal on VDI (either over VNC or through SSH and inside screen/tmux), run: dask-scheduler
This should output the scheduler address, like tcp://10.0.64.24:8786.
Now, in another terminal (ensuring that the default conda module has cosima_cookbook installed, as all workers will need access to that), run: dask-worker tcp://10.0.64.24:8786 --memory-limit 4e9 --nprocs 6 --nthreads 1 --local-directory /local/g40/amh157
Then, make sure the following cell matches the scheduler address"

client = Client('tcp://10.0.64.2:8786', local_dir='/local/g40/amh157')

I have implemented this in a lot of the access-om2 report notebooks, but it is clunky, and requires a bit of intervention. For example - whenever I get allocated a different node I have to change the tcp address, and others will need to modify the local directory if they want to run it.

The ideal solution here is that we can write a cookbook function which can do this for us, and takes arguments such as memory-limit and nprocs. Is this possible? It would effectively be a replacement for start_cluster(), to be easily deployed to all.

get_scalar_variables has configuration as an argument

get_scalar_variables has configuration as an argument.
Should this argument be expt, to match get_variables?

need a function to remove non-existent files from database

Here's something Clothilde discovered at the tutorial yesterday:

import cosima_cookbook as cc
cc.build_index()
cc.get_nc_variable('mom-sis_jra-ryf','ocean_month.nc','eta_t')

yields

FileNotFoundError: [Errno 2] No such file or directory: b'/g/data3/hh5/tmp/cosima/mom-sis/mom-sis_jra-ryf/output021/ocean_month.nc'

The directory /g/data3/hh5/tmp/cosima/mom-sis/mom-sis_jra-ryf/output021 doesn't exist but is indexed in the database. Apparently somebody removed the directory but it has stayed in the database because build_index() only looks for unindexed run directories to add to the database.

This scenario is probably rare, but it would nevertheless be good to have a clean_index function to removed nonexistent runs from the database. Currently the only way to fix it is to trash the DB and rebuild from scratch, which becomes prohibitively slow as the amount of data increases.

better default paths for memory.cache

By default, memory.cache uses the local tmp directory. Investigate, making cached results persist even when the node you are running on changes. This is especially of concern with the different vdi nodes -- you can not control which node you have.

We can also investigate sharing the cache between users.

memory issues

As we already know, there are some memory issues when calculating some quantities from the 0.1° model. Here is a good example:

https://github.com/OceansAus/ACCESS-OM2-1-025-010deg-report/blob/master/figures/overturning_circulation/GlobalOverturningStreamfunction.ipynb

Note that the whole dataset can be loaded and processed for time series but not for zonal averages. Is there a smarter way to do this?

A possible solution is annual averaging of 3D files for 0.1° ... this is in train, but it would be nice to be able to handle without that.

extract first/last n years in get_nc_variable

It would be handy to be able extract the first/last n years in get_nc_variable so that we don't need to remember how many files per year when constructing climatologies etc.

There's a bit of code here that is relevant and deals with overflow issues in long (e.g 1deg) runs
https://github.com/OceansAus/ACCESS-OM2-1-025-010deg-report/blob/master/figures/surface_current/surface_current.ipynb

Index improvements

Currently the index DB is 6.1GB, which is large.

The DB consists of one large table:

CREATE TABLE ncfiles (
        id INTEGER NOT NULL, ncfile TEXT, rootdir TEXT, configuration TEXT, experiment TEXT, run TEXT, basename TEXT, basename_pattern TEXT, variable TEXT, dimensions TEXT, chunking TEXT, 
        PRIMARY KEY (id)
);

The first entry is:

1|/g/data3/hh5/tmp/cosima/mom01v5/KDS75/output205/rregionoceankerg__0054_276.nc|/g/data3/hh5/tmp/cosima|mom01v5|KDS75|output205|rregionoceankerg__0054_276.nc|rregionoceankerg__\d+_\d+.nc|xt_ocean_sub01|('xt_ocean_sub01',)|[3600]

Possible improvements:

Put rootdir in separate table
Don't save ncfile (name) as everything required to generate the full path is available from rootdir, configuration, experiment, run and base name
Ignore dimensions variables

Don't hide error messages in cc.build_index()

Error messages in cc.build_index() are promptly hidden by IPython.display.clear_output() making it hard to notice when files are unreadable etc.

https://github.com/OceansAus/cosima-cookbook/blob/6734e03709ecf15a20c0ee2c4c1a3ab26a02d2fc/cosima_cookbook/netcdf_index.py#L231

Is there any reason not to delete this line?

Make a list of MOM diagnostics that are already done online

Which diagnostics have been turned on for a particular model run?

Which diagnostics could have been turned on in the model?

Errors in plotting function

I've been using routines like

cc.plots.annual_scalar(esub, 'salt_global_ave')

for some of my plotting. These work very nicely - and are fast, but they also come with some strange behaviour. In some cases they don't update one variable with data from new runs, but do update another variable. I also had a few cases where the output from the ipywidget plot differed from the cc.plots.annual_scalar output. Unfortunately I can't reproduce any of these problems right now, but will post examples when I can find it.

allow indexing of files that aren't in hh5

Following on from #83, it may be good to add functions to get/set a list of paths to index so that files that aren't in hh5 can also be indexed (in a separate index database).

This might save a lot of data copying (though files must be on g/data* to be visible to VDI, and VDI can't see /short).

It also avoids everyone necessarily having access to everyone else's data (which undermines the data-sharing ethos but also makes the cookbook scalable to more users).

implement zonal averaging in tripolar region

offset interacts with time_units in get_nc_variable

The effect of time_units in get_nc_variable depends on whether offset=None, which can be confusing.

I've tried to document this behaviour as best I can. Not sure whether it's worth a code change to make the behaviour more intuitive.

Error using database

I'm getting some weird errors, which don't seem to affect the output of a script, but are still annoying:

ERROR:sqlalchemy.pool.StaticPool:Exception during reset or similar
Traceback (most recent call last):
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.07/lib/python3.6/site-packages/sqlalchemy/pool.py", line 709, in _finalize_fairy
    fairy._reset(pool)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.07/lib/python3.6/site-packages/sqlalchemy/pool.py", line 880, in _reset
    pool._dialect.do_rollback(self)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-18.07/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 459, in do_rollback
    dbapi_connection.rollback()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139635644978944 and this is thread id 139634064307968.

The best example is here:

https://github.com/OceansAus/ACCESS-OM2-1-025-010deg-report/blob/master/figures/strait_transports/strait_transports.ipynb

Any ideas on what I am doing wrong?

Proposal: split repository to infrastructure and examples

Hi all, we've mentioned this a few times before so I thought I'd outline it properly. Essentially, the repository used for examples will by its nature grow in size significantly, even if infrastructure isn't changed. A candidate workflow is to split the repository into two: a dedicated infrastructure/backend repository; and an examples repository. Developers can clone and work on the infrastructure repository as they wish, but users will simply be able to import a conda module (e.g. part of the curated environment on NCI).

The examples can be viewed primarily online, through either GitHub or Read the Docs. This allows users to see which techniques can be employed for analysis, without first cloning the repository to their local system. Contribution to this repository can be curated so that it serves as a reference.

I'll put down a few tasks that I think need to be done as part of this split, but I'd be happy to take feedback on any part of the proposal.

Cut out infrastructure/backend to a separate repository (this saves including the examples in the git history which would increase the size)
Document workflow for users
"Cookbookify" contributed examples and focus this repository/RTD on examples
Create conda/pypi packages for the infrastructure

Update instructions for users

The current instructions and readthedocs site are quite old. We need to update these to outline the simplest way to access the cookbook.

cosima / cosima-cookbook Goto Github PK

cosima-cookbook's Introduction

cosima-cookbook package

What now? Where should I go?

cosima-cookbook's People

Contributors

Stargazers

Watchers

Forkers

cosima-cookbook's Issues

Recommend Projects

Recommend Topics

Recommend Org