ocean-transport / floater Goto Github PK

View Code? Open in Web Editor NEW

12.0 4.0 17.0 9.38 MB

For working with lagrangian float data

Home Page: http://floater.readthedocs.io

Python 1.17% Jupyter Notebook 98.83%

floater's Introduction

Floater

Transcode MITgcm float output into:

PyTables HDF5 format
pandas HDF5 format
bcolz
NetCDF

Transcoding is done via the floater_convert script, which is installed with the package.

$ floater_convert
usage: floater_convert [-h] [--float_file_prefix PREFIX] [--float_buf_dim N]
                       [--progress] [--input_dir DIR] [--output_format FMT]
                       [--keep_fields FIELDS] [--ref_time RT] [--pkl_path PP]
                       [--output_dir OD] [--output_prefix OP]
                       output_file

Also generators and analysis tools for Lagrangian trajectories.

floater's People

Stargazers

Watchers

Forkers

nathanieltarshish geosciz anirban89 pittwolfe cspencerjones wendazhang33 liutongya msdevana roxyboy josuemtzmo yueyanglu miniufo liu-ran chanjeunlam miyaal

floater's Issues

MITgcm error when reading in float file

I am encountering an error when running the MITgcm that appears in the .e file for the job I submitted to my cluster (not in STDERR or STDOUT):

In source file mdsio_rd_rec_rl.f, at line number 1824
 at line number 1824
 File name = flt_CM2p6_global_hex.bin    unformatted, direct access   record = 37172161
 In source file mdsio_rd_rec_rl.f, at line number 1824
PGFIO-F-253/unformatted read/unit=9/attempt to read non-existent record (direct access).
 File name = flt_CM2p6_global_hex.bin    unformatted

The relevant section with line 1824 is shown below:

     IF ( debugLevel.GE.debLevC ) THEN
       WRITE(msgBuf,'(A,I9,2x,I9)')
    &      ' MDS_RD_REC_RL: iRec,Dim = ', iRec, nArr
       CALL PRINT_MESSAGE( msgBuf, standardMessageUnit,
    &                      SQUEEZE_RIGHT , myThid )
     ENDIF

     IF ( fPrec.EQ.precFloat32 ) THEN
       READ( dUnit, rec=iRec ) r4Buf <--------------------------------- LINE 1824 
       DO k=1,nArr
         arr(k) = r4Buf(k)
       ENDDO
     ELSEIF ( fPrec.EQ.precFloat64 ) THEN
       READ( dUnit, rec=iRec ) r8Buf
       DO k=1,nArr
         arr(k) = r8Buf(k)
       ENDDO
     ELSE
       WRITE(msgBuf,'(A,I9)')
    &        ' MDS_RD_REC_RL: illegal value for fPrec=',fPrec
       CALL PRINT_ERROR( msgBuf, myThid )
       STOP 'ABNORMAL END: S/R MDS_RD_REC_RL'
     ENDIF

     RETURN
     END

The float file I am attempting to use was generated with

fs = generators.FloatSet((-280,80), (-81,65.0), dx=.03125, dy=.03125)
model_grid = {'lon': mask_lon, 'lat': mask_lat, 'land_mask': mask_bath}
fs.to_mitgcm_format("flt_CM2p6_global_hex", model_grid=model_grid)

From visually inspection, I am fairly confident that the hexagonal tiling step is done correctly. However, given the recent changes to to_mitgcm_format() it is possible that the output formatting was broken. To test this, I will try to rebuild a float file with the new code that I know was previously working fine with the MITgcm.

compound dtypes issue

cc: @rabernat, @nathanieltarshish

I tried to use floater_convert on a MITgcm test run with rectangular mesh float initial positions, and got the following error:

NotImplementedError                       Traceback (most recent call last)
<ipython-input-7-813257f7c962> in <module>()
----> 1 dfc = df.compute()
      2 dfc.head()

/u/6/c/cz2397/miniconda3/envs/rclv/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs)
     93             Extra keywords to forward to the scheduler ``get`` function.
     94         """
---> 95         (result,) = compute(self, traverse=False, **kwargs)
     96         return result
     97 

/u/6/c/cz2397/miniconda3/envs/rclv/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
    200     dsk = collections_to_dsk(variables, optimize_graph, **kwargs)
    201     keys = [var._keys() for var in variables]
--> 202     results = get(dsk, keys, **kwargs)
    203 
    204     results_iter = iter(results)

/u/6/c/cz2397/miniconda3/envs/rclv/lib/python3.6/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, **kwargs)
     74     results = get_async(pool.apply_async, len(pool._pool), dsk, result,
     75                         cache=cache, get_id=_thread_get_id,
---> 76                         **kwargs)
     77 
     78     # Cleanup pools associated to dead threads

/u/6/c/cz2397/miniconda3/envs/rclv/lib/python3.6/site-packages/dask/async.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, dumps, loads, **kwargs)
    498                     _execute_task(task, data)  # Re-execute locally
    499                 else:
--> 500                     raise(remote_exception(res, tb))
    501             state['cache'][key] = res
    502             finish_task(dsk, key, state, results, keyorder.get)

NotImplementedError: compound dtypes are not implementedin the Series constructor

It seems that the compound dtypes specified in variable float_dtypes have caused this issue. I think we could fix this problem by removing the dtype option in df = dd.read_csv(input_path, names=float_columns, dtype=float_dtypes, header=None) and setting variable dtypes later.

It is weird that this error did not appear in test runs with hexagonal mesh float initial positions. Also, our test have failed to catch it.

deploy floaters in 2D x-z plane

@rabernat Hi, I am trying to use this package in my work.

I've setup a MITgcm run to simulate internal wave in 2D x-z plane, in which there is only one grid in y direction. I need to deploy floaters in different depths with a specified interval dz.

I tried to use the FloatSet:

flts = FloatSet(xlim=[0, X], ylim=[0, Y], dx=1000, dy=100, zvect=zdef)
flts.to_mitgcm_format('./flt_init.dat', iup=-1, read_binary_prec=32)

However, the FloatSet seems to be a horizontal 2D deployment and repeats for every zvect. It does not have a keyword argument like zlim and dz.

So I wonder how to deploy 2D float set in x-z plane with a given dz? I may have a try to modify the code. But I not sure I know exactly how to push my version back to this repo (I'm new to git).

errors with floater_convert

cc: @geosciz, @nathanieltarshish

I just tried running floater_convert with the new options on yeti with my Pacific sector data.

$ floater_convert --output_format netcdf --ref_time 1993-01-01 --step_time 86400 float_trajectories

I got this error:

Traceback (most recent call last):
  File "/u/6/r/ra2697/.conda/envs/xmitgcm_env/bin/floater_convert", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/hmt/sirius1/prv0/u/6/r/ra2697/floater/scripts/floater_convert", line 76, in <module>
    output_prefix=args.output_prefix)
  File "/hmt/sirius1/prv0/u/6/r/ra2697/floater/floater/utils.py", line 264, in floats_to_netcdf
    step_num = int(dfcs.time.values[0])//step_time
TypeError: unsupported operand type(s) for //: 'int' and 'str'

What is going on?

ls

Question: RCLV search with periodic domains and masked regions

With a global study of RCLV's in mind, I am wondering how compatible hexgrid.find_convex_regions is with domains that are periodic (about the East/West boundary) and also contain masked regions. If not currently compatible, how should we build out these features?

allow FloatSet to accept a land mask

We would like to do something like

fs = generators.FloatSet((140,150), (45,55), dx=0.25, dy=0.25)
fs.get_hexmesh(mask=some_mask)

This would blank out the regions where there is land and reduce memory usage, especially for global simulations.

rclv module needs tests!

there are currently none

messed up time variable with floater_convert

I did the following on habanero

floater_convert --input_dir /rigel/ocp/users/as4479/MITgcm/llc_agulhas/3dfloats_hourly --progress --output_format netcdf --output_dir /rigel/ocp/users/ra2697/float_visualization --step_time 25 --ref_time 2011-01-01 float_trajectories

This converts @anirban89's output data into netcdf. It works fine if I don't specify step_time or ref_date. But if I do, the time variable is just zero for all the netcdf files.

What could be going on here.

cc @geosciz

floater_convert data type mismatch

Currently floater_convert doesn't have the datatype as input. So It breaks when encountering 64bit data

fixing a hexagonal tiling bug

The recent PR #23 created a slight bug in the get_hexmesh function. It essentially called _subset_floats_from_mask twice by generating the uniform mask with get_rectmesh instead of xx, yy = np.meshgrid(self.x, self.y). I corrected this and made a few other cosmetic changes on the fix_get_hexmesh branch on my fork and will submit a PR.

installation of floater

Dear all,
As explained in #18, I'm trying to install floater but I encounter some problems. Could you help me?

I use conda, with python 2.7
I installed floater this way:
sudo su
/softs/anaconda2/bin/conda create -n envmitgcm python=2.7.13 anaconda
source /softs/anaconda2/bin/activate envmitgcm
/softs/anaconda2/bin/conda install -n envmitgcm xarray
/softs/anaconda2/bin/conda install -n envmitgcm netcdf4
cd /home/mazoyer/Documents/sources/pyMITGM/floater
ipython setup.py install

Then, when I try to use floater_convert with the MITgcm example outputs, nothing happens.
/home/mazoyer/Documents/sources/pyMITGM/floater/build/scripts-2.7/floater_convert --output_format netcdf --ref_time 2013-01-01 --step_time 13 float_trajectories --progress

If I ask for pandas output format, a h5 file is created.

Do you have an idea about what could be the problem with my installation?
Thanks a lot,
Camille

common Lagrangian data structure

I have been thinking if we need a common Lagrangian type data structure, like the xarray for coordinated n-dimensional dataset, to describe the large number of Lagrangian particles. These data generally involve a time series of positions and associated data along their Lagrangian tracks. Examples are the simulated Lagrangian trajectories here, GDP drifter dataset, Argo float dataset, as well as quasi-Lagrangian tropical cyclone best-track dataset and mesoscale eddy dataset.

So far as I know, pandas.dataframe is used to depict such data, with at least three columns of time, x_pos and y_pos. This is indeed efficient and clear. However, sometimes we need extra information to tie to the dataframe, such as ID, name, type, status etc. So I think we can design a common Lagrangian data structure that all these (quasi) Lagrangian data and associated dataset can be described, accessed, stored, and manipulated efficiently.

A scratch is to define a class of Particle, with ID, name, and records as its fields. Its records is a pandas.DataFrame that stores the Lagrangian data. Through overwritting some of the operators of Particle, we can feature a simple use of Particle like pandas.DataFrame. Through extends, we can further define Drifter, Float, TropicalCyclone subclasses to become more appropriate for each case.

Do you guys have any comment on this?

Compatibility with outputs from other lagrangian tools

Hello!

First of all, thank you for making this tool public! After reading a few papers that use the LAVD to detect and track RCLS (mainly the Abernathey and Haller, 2018), I became very interested in trying to apply them to my case study. If I run a lagrangian simulation with ROMS modelled outputs and OceanParcels for particle advection, is it possible to then run the Floater?

Cheers,
Cláudio

Documentation

At this point, with a growing number of users, we need documentation for floater.

We should use sphinx and readthedocs. We can copy the template from xgcm:
https://github.com/xgcm/xgcm/tree/master/doc

FloatSet object is sensitive to numpy version

A recent update of numpy package to version 1.13.0 changes the behavior of generators.py. See an example as follows.

import numpy as np
from floater.generators import FloatSet

lon = np.arange(0, 9, dtype=np.float32)
lat = np.arange(-4, 5, dtype=np.float32)
land_mask = np.full(81, True, dtype=bool)
land_mask.shape = (len(lat), len(lon))
land_mask[:,0:2] = False
land_mask
array([[False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True]], dtype=bool)
model_grid = {'lon': lon, 'lat': lat, 'land_mask': land_mask}
fs = FloatSet(xlim=(0, 9), ylim=(-4, 5), model_grid=model_grid)
fs.get_rectmesh()

np.__version__
'1.12.0'
fs.ocean_bools
array([False, False,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False, False,  True,  True,  True,  True,  True,  True,  True,
       False, False,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

np.__version__
'1.13.0'
fs.ocean_bools
array([False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

cc: @rabernat, @nathanieltarshish

Refactor workflow for pre/post processing

Right now we have one code path for preprocessing of float input data (via generators.py) and a different one for postprocessing (via input.py). But the postprocessing scripts (example) always end up having to manually re-create the FloatSet object which was used to generate the inputs. This seems inefficient and error prone.

It would be nice to have a more end-to-end workflow. I don't know exactly how this would look, but I am opening this issue so we can discuss it.

cc: @anirban89 @nathanieltarshish

use bisection search for outermost contour

A reviewer of @nathanieltarshish's recent paper rightly criticized our contour-search method for using a fixed step size. I have always been a bit worried about the sensitivity to this parameter. We make it small in order to avoid false negatives, but that makes the method very expensive.

Instead, we should use a more intelligent bisection method.

The relevant section of the code is here:
https://github.com/rabernat/floater/blob/master/floater/rclv.py#L322-L371

cc @anirban89, @geosciz

bug in countour_ji_to_get

If I do this

contour_latlon = rclv.contour_ji_to_geo(contour, ds.x0, ds.y0)

I get the following error:

NameErrorTraceback (most recent call last)
<ipython-input-81-ad88f7f185ba> in <module>()
      1 for ji, contour, area in rclvs:
----> 2     contour_latlon = rclv.contour_ji_to_geo(contour, ds.x0, ds.y0)
      3     plt.plot(*contour.T)

/home/rpa/floater/floater/rclv.py in contour_ji_to_geo(contour_ji, lon, lat)
    546     dlat = abs(abs(lat[1]) - abs(lat[0]))
    547 
--> 548     j,i  = countour_ji.T
    549 
    550     x = lon[0] + dlon*i

NameError: global name 'countour_ji' is not defined

use coherency index directly (instead of convexity deficiency) for RCLV boundary criterion

The current algorithm searches for a level curve of LAVD with a specified convexity-deficiency threshold to define the outer boundary of the RCLV.

As discussed with @pittwolfe and Wenda (can't find his github handle), it would be cool to have the option to use coherency index directly (instead of CD) for the boundary.

No tests for generators

This makes it hard to update the generators code

Correcting float longitudes generated by `get_oceancoords`

Looking over code I used to visualize with datashader reminded me of a problem that I had totally forgotten to fix in generators.py. Whoops!

Currently, get_oceancoords() returns the longitude of the floats in the interval (0,360) (due to the trig. functions used in mapping to xyz for the cKDtree). For my data and specific Floatset, I am using a longitude domain of (-280,80). So when I was visualizing the data, I realized that I had to remap the latitudes according to

#float data is between 0 and 360, has to be remapped to -280 and 80
float_x[float_x > 80] = 80 - float_x[float_x > 80]

In the generic case, the simple fix is

float_x[float_x > fs.xlim[1]] =  fs.xlim[1]  - float_x[float_x >  fs.xlim[1]]

since the only way floats can escape the longitude domain is due to this problem. I meant to include this in PR #14, but forgot to. Will make this change and submit a PR.

I believe that this issue is independent of #20. Having out of domain floats has not generated that error in the past when I ran the CM2p6 grid xlim = (-280,80) with your verification xlim = (180, 230) float file.

update readme and documentation

The readme and documentation of floater is out of date with how the package is currently used. We should endeavor to write a bit more about what floater is actually good for.

Issue with floating point arithmetic `Lx%dx` giving non zero values

xlim = [-60.0,-50.0]
Lx = 10.0
dx = 0.02
gives value error because if Lx%dx != 0.0: raise ValueError("Lx is not divisible evenly by dx")
since Lx%dx = 0.019999999999999792 in python.