Code Monkey home page Code Monkey logo

floater's Introduction

Floater

Build Status codecov.io

Transcode MITgcm float output into:

Transcoding is done via the floater_convert script, which is installed with the package.

$ floater_convert
usage: floater_convert [-h] [--float_file_prefix PREFIX] [--float_buf_dim N]
                       [--progress] [--input_dir DIR] [--output_format FMT]
                       [--keep_fields FIELDS] [--ref_time RT] [--pkl_path PP]
                       [--output_dir OD] [--output_prefix OP]
                       output_file

Also generators and analysis tools for Lagrangian trajectories.

floater's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

floater's Issues

MITgcm error when reading in float file

I am encountering an error when running the MITgcm that appears in the .e file for the job I submitted to my cluster (not in STDERR or STDOUT):

In source file mdsio_rd_rec_rl.f, at line number 1824
 at line number 1824
 File name = flt_CM2p6_global_hex.bin    unformatted, direct access   record = 37172161
 In source file mdsio_rd_rec_rl.f, at line number 1824
PGFIO-F-253/unformatted read/unit=9/attempt to read non-existent record (direct access).
 File name = flt_CM2p6_global_hex.bin    unformatted

The relevant section with line 1824 is shown below:

     IF ( debugLevel.GE.debLevC ) THEN
       WRITE(msgBuf,'(A,I9,2x,I9)')
    &      ' MDS_RD_REC_RL: iRec,Dim = ', iRec, nArr
       CALL PRINT_MESSAGE( msgBuf, standardMessageUnit,
    &                      SQUEEZE_RIGHT , myThid )
     ENDIF

     IF ( fPrec.EQ.precFloat32 ) THEN
       READ( dUnit, rec=iRec ) r4Buf <--------------------------------- LINE 1824 
       DO k=1,nArr
         arr(k) = r4Buf(k)
       ENDDO
     ELSEIF ( fPrec.EQ.precFloat64 ) THEN
       READ( dUnit, rec=iRec ) r8Buf
       DO k=1,nArr
         arr(k) = r8Buf(k)
       ENDDO
     ELSE
       WRITE(msgBuf,'(A,I9)')
    &        ' MDS_RD_REC_RL: illegal value for fPrec=',fPrec
       CALL PRINT_ERROR( msgBuf, myThid )
       STOP 'ABNORMAL END: S/R MDS_RD_REC_RL'
     ENDIF

     RETURN
     END

The float file I am attempting to use was generated with

fs = generators.FloatSet((-280,80), (-81,65.0), dx=.03125, dy=.03125)
model_grid = {'lon': mask_lon, 'lat': mask_lat, 'land_mask': mask_bath}
fs.to_mitgcm_format("flt_CM2p6_global_hex", model_grid=model_grid)

From visually inspection, I am fairly confident that the hexagonal tiling step is done correctly. However, given the recent changes to to_mitgcm_format() it is possible that the output formatting was broken. To test this, I will try to rebuild a float file with the new code that I know was previously working fine with the MITgcm.

compound dtypes issue

cc: @rabernat, @nathanieltarshish

I tried to use floater_convert on a MITgcm test run with rectangular mesh float initial positions, and got the following error:

NotImplementedError                       Traceback (most recent call last)
<ipython-input-7-813257f7c962> in <module>()
----> 1 dfc = df.compute()
      2 dfc.head()

/u/6/c/cz2397/miniconda3/envs/rclv/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs)
     93             Extra keywords to forward to the scheduler ``get`` function.
     94         """
---> 95         (result,) = compute(self, traverse=False, **kwargs)
     96         return result
     97 

/u/6/c/cz2397/miniconda3/envs/rclv/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
    200     dsk = collections_to_dsk(variables, optimize_graph, **kwargs)
    201     keys = [var._keys() for var in variables]
--> 202     results = get(dsk, keys, **kwargs)
    203 
    204     results_iter = iter(results)

/u/6/c/cz2397/miniconda3/envs/rclv/lib/python3.6/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, **kwargs)
     74     results = get_async(pool.apply_async, len(pool._pool), dsk, result,
     75                         cache=cache, get_id=_thread_get_id,
---> 76                         **kwargs)
     77 
     78     # Cleanup pools associated to dead threads

/u/6/c/cz2397/miniconda3/envs/rclv/lib/python3.6/site-packages/dask/async.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, raise_on_exception, rerun_exceptions_locally, callbacks, dumps, loads, **kwargs)
    498                     _execute_task(task, data)  # Re-execute locally
    499                 else:
--> 500                     raise(remote_exception(res, tb))
    501             state['cache'][key] = res
    502             finish_task(dsk, key, state, results, keyorder.get)

NotImplementedError: compound dtypes are not implementedin the Series constructor

It seems that the compound dtypes specified in variable float_dtypes have caused this issue. I think we could fix this problem by removing the dtype option in df = dd.read_csv(input_path, names=float_columns, dtype=float_dtypes, header=None) and setting variable dtypes later.

It is weird that this error did not appear in test runs with hexagonal mesh float initial positions. Also, our test have failed to catch it.

deploy floaters in 2D x-z plane

@rabernat Hi, I am trying to use this package in my work.

I've setup a MITgcm run to simulate internal wave in 2D x-z plane, in which there is only one grid in y direction. I need to deploy floaters in different depths with a specified interval dz.

I tried to use the FloatSet:

flts = FloatSet(xlim=[0, X], ylim=[0, Y], dx=1000, dy=100, zvect=zdef)
flts.to_mitgcm_format('./flt_init.dat', iup=-1, read_binary_prec=32)

However, the FloatSet seems to be a horizontal 2D deployment and repeats for every zvect. It does not have a keyword argument like zlim and dz.

So I wonder how to deploy 2D float set in x-z plane with a given dz? I may have a try to modify the code. But I not sure I know exactly how to push my version back to this repo (I'm new to git).

errors with floater_convert

cc: @geosciz, @nathanieltarshish

I just tried running floater_convert with the new options on yeti with my Pacific sector data.

$ floater_convert --output_format netcdf --ref_time 1993-01-01 --step_time 86400 float_trajectories

I got this error:

Traceback (most recent call last):
  File "/u/6/r/ra2697/.conda/envs/xmitgcm_env/bin/floater_convert", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/hmt/sirius1/prv0/u/6/r/ra2697/floater/scripts/floater_convert", line 76, in <module>
    output_prefix=args.output_prefix)
  File "/hmt/sirius1/prv0/u/6/r/ra2697/floater/floater/utils.py", line 264, in floats_to_netcdf
    step_num = int(dfcs.time.values[0])//step_time
TypeError: unsupported operand type(s) for //: 'int' and 'str'

What is going on?

Question: RCLV search with periodic domains and masked regions

With a global study of RCLV's in mind, I am wondering how compatible hexgrid.find_convex_regions is with domains that are periodic (about the East/West boundary) and also contain masked regions. If not currently compatible, how should we build out these features?

allow FloatSet to accept a land mask

We would like to do something like

fs = generators.FloatSet((140,150), (45,55), dx=0.25, dy=0.25)
fs.get_hexmesh(mask=some_mask)

This would blank out the regions where there is land and reduce memory usage, especially for global simulations.

messed up time variable with floater_convert

I did the following on habanero

floater_convert --input_dir /rigel/ocp/users/as4479/MITgcm/llc_agulhas/3dfloats_hourly --progress --output_format netcdf --output_dir /rigel/ocp/users/ra2697/float_visualization --step_time 25 --ref_time 2011-01-01 float_trajectories

This converts @anirban89's output data into netcdf. It works fine if I don't specify step_time or ref_date. But if I do, the time variable is just zero for all the netcdf files.

What could be going on here.

cc @geosciz

fixing a hexagonal tiling bug

The recent PR #23 created a slight bug in the get_hexmesh function. It essentially called _subset_floats_from_mask twice by generating the uniform mask with get_rectmesh instead of xx, yy = np.meshgrid(self.x, self.y). I corrected this and made a few other cosmetic changes on the fix_get_hexmesh branch on my fork and will submit a PR.

installation of floater

Dear all,
As explained in #18, I'm trying to install floater but I encounter some problems. Could you help me?

I use conda, with python 2.7
I installed floater this way:
sudo su
/softs/anaconda2/bin/conda create -n envmitgcm python=2.7.13 anaconda
source /softs/anaconda2/bin/activate envmitgcm
/softs/anaconda2/bin/conda install -n envmitgcm xarray
/softs/anaconda2/bin/conda install -n envmitgcm netcdf4
cd /home/mazoyer/Documents/sources/pyMITGM/floater
ipython setup.py install

Then, when I try to use floater_convert with the MITgcm example outputs, nothing happens.
/home/mazoyer/Documents/sources/pyMITGM/floater/build/scripts-2.7/floater_convert --output_format netcdf --ref_time 2013-01-01 --step_time 13 float_trajectories --progress

If I ask for pandas output format, a h5 file is created.

Do you have an idea about what could be the problem with my installation?
Thanks a lot,
Camille

common Lagrangian data structure

I have been thinking if we need a common Lagrangian type data structure, like the xarray for coordinated n-dimensional dataset, to describe the large number of Lagrangian particles. These data generally involve a time series of positions and associated data along their Lagrangian tracks. Examples are the simulated Lagrangian trajectories here, GDP drifter dataset, Argo float dataset, as well as quasi-Lagrangian tropical cyclone best-track dataset and mesoscale eddy dataset.

So far as I know, pandas.dataframe is used to depict such data, with at least three columns of time, x_pos and y_pos. This is indeed efficient and clear. However, sometimes we need extra information to tie to the dataframe, such as ID, name, type, status etc. So I think we can design a common Lagrangian data structure that all these (quasi) Lagrangian data and associated dataset can be described, accessed, stored, and manipulated efficiently.

A scratch is to define a class of Particle, with ID, name, and records as its fields. Its records is a pandas.DataFrame that stores the Lagrangian data. Through overwritting some of the operators of Particle, we can feature a simple use of Particle like pandas.DataFrame. Through extends, we can further define Drifter, Float, TropicalCyclone subclasses to become more appropriate for each case.

Do you guys have any comment on this?

Compatibility with outputs from other lagrangian tools

Hello!

First of all, thank you for making this tool public! After reading a few papers that use the LAVD to detect and track RCLS (mainly the Abernathey and Haller, 2018), I became very interested in trying to apply them to my case study. If I run a lagrangian simulation with ROMS modelled outputs and OceanParcels for particle advection, is it possible to then run the Floater?

Cheers,
Cláudio

FloatSet object is sensitive to numpy version

A recent update of numpy package to version 1.13.0 changes the behavior of generators.py. See an example as follows.

import numpy as np
from floater.generators import FloatSet

lon = np.arange(0, 9, dtype=np.float32)
lat = np.arange(-4, 5, dtype=np.float32)
land_mask = np.full(81, True, dtype=bool)
land_mask.shape = (len(lat), len(lon))
land_mask[:,0:2] = False
land_mask
array([[False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True],
       [False, False,  True,  True,  True,  True,  True,  True,  True]], dtype=bool)
model_grid = {'lon': lon, 'lat': lat, 'land_mask': land_mask}
fs = FloatSet(xlim=(0, 9), ylim=(-4, 5), model_grid=model_grid)
fs.get_rectmesh()
np.__version__
'1.12.0'
fs.ocean_bools
array([False, False,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False, False,  True,  True,  True,  True,  True,  True,  True,
       False, False,  True,  True,  True,  True,  True,  True,  True], dtype=bool)
np.__version__
'1.13.0'
fs.ocean_bools
array([False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
       False,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

cc: @rabernat, @nathanieltarshish

Refactor workflow for pre/post processing

Right now we have one code path for preprocessing of float input data (via generators.py) and a different one for postprocessing (via input.py). But the postprocessing scripts (example) always end up having to manually re-create the FloatSet object which was used to generate the inputs. This seems inefficient and error prone.

It would be nice to have a more end-to-end workflow. I don't know exactly how this would look, but I am opening this issue so we can discuss it.

cc: @anirban89 @nathanieltarshish

use bisection search for outermost contour

A reviewer of @nathanieltarshish's recent paper rightly criticized our contour-search method for using a fixed step size. I have always been a bit worried about the sensitivity to this parameter. We make it small in order to avoid false negatives, but that makes the method very expensive.

Instead, we should use a more intelligent bisection method.

The relevant section of the code is here:
https://github.com/rabernat/floater/blob/master/floater/rclv.py#L322-L371

cc @anirban89, @geosciz

bug in countour_ji_to_get

If I do this

contour_latlon = rclv.contour_ji_to_geo(contour, ds.x0, ds.y0)

I get the following error:

NameErrorTraceback (most recent call last)
<ipython-input-81-ad88f7f185ba> in <module>()
      1 for ji, contour, area in rclvs:
----> 2     contour_latlon = rclv.contour_ji_to_geo(contour, ds.x0, ds.y0)
      3     plt.plot(*contour.T)

/home/rpa/floater/floater/rclv.py in contour_ji_to_geo(contour_ji, lon, lat)
    546     dlat = abs(abs(lat[1]) - abs(lat[0]))
    547 
--> 548     j,i  = countour_ji.T
    549 
    550     x = lon[0] + dlon*i

NameError: global name 'countour_ji' is not defined

Correcting float longitudes generated by `get_oceancoords`

Looking over code I used to visualize with datashader reminded me of a problem that I had totally forgotten to fix in generators.py. Whoops!

Currently, get_oceancoords() returns the longitude of the floats in the interval (0,360) (due to the trig. functions used in mapping to xyz for the cKDtree). For my data and specific Floatset, I am using a longitude domain of (-280,80). So when I was visualizing the data, I realized that I had to remap the latitudes according to

#float data is between 0 and 360, has to be remapped to -280 and 80
float_x[float_x > 80] = 80 - float_x[float_x > 80]

In the generic case, the simple fix is

float_x[float_x > fs.xlim[1]] =  fs.xlim[1]  - float_x[float_x >  fs.xlim[1]]

since the only way floats can escape the longitude domain is due to this problem. I meant to include this in PR #14, but forgot to. Will make this change and submit a PR.

I believe that this issue is independent of #20. Having out of domain floats has not generated that error in the past when I ran the CM2p6 grid xlim = (-280,80) with your verification xlim = (180, 230) float file.

update readme and documentation

The readme and documentation of floater is out of date with how the package is currently used. We should endeavor to write a bit more about what floater is actually good for.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.