Code Monkey home page Code Monkey logo

echopype's People

Contributors

amarburg avatar anantmittal avatar anujsinha3 avatar b-reyes avatar bnwkeys avatar chuck-a avatar ctuguinay avatar cyrf0006 avatar dash-uvic avatar dependabot[bot] avatar emiliom avatar emma-ozanich avatar erinann avatar fletcherft avatar gavinmacaulay avatar imranmaj avatar leewujung avatar lsetiawan avatar marianpena avatar mbdunn avatar ngkavin avatar ocefpaf avatar praneethratna avatar prarobinson avatar pre-commit-ci[bot] avatar rhtowler avatar sohambutala avatar valentina-s avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

echopype's Issues

Add TS calculation in `ModelEK60`

ModelAZFP already has the method to calculate TS. We need this for ModelEK60 as well just as we already have for calculating Sv via .calibrate() for data from both echosounders.

Combine functions from PyEcholab

PyEchoLab and echopype were based off the same code (EchoLab in Matlab), so we want to merge them along with the changes to make use of xarray and pandas.

Tasks:

  • Identify functions that are missing in either of them
  • Decided which functions/methods to keep for unpacking
  • Decided which functions/methods to keep for data manipulation and visualization
  • Change function as in #10 to use xarray/pandas as the underlying packages instead of numpy.

Revise command line tool using uniform interface

Need to update the command line tool now that the convert and model modules have uniform interface for AZFP and EK60 data (#53).

Specifically, need to change the input optional parameter to accommodate the additional .XML needed for AZFP unpacking.

The goal is to be able to do:

  • $ echopype_converter -s ek60 path_to_data_files(allow wildcard)
  • $ echopype_converter -s azfp -x xml_file path_to_data_files(allow wildcard)

In the above:

  • -s/--system indicate sonar system
  • -x/--xml-file xml file to use to parse azfp data
  • issue warning “An XML file is needed for converting AZFL echosounder data” if -s azfp is passed but xml file not specified
  • issue warning “The XML file is ignored as it is not needed for converting

AZFP Unpack Improvement

  • (major) unpacked AZFP data is not being saved
    • will require debugging script to interrupt before plotting routine and write out data
    • unpacked AZFP data needs to be packed into netCDF format
  • (major) refactor code, get rid of object-oriented paradigm
    • this is literally MacGyvered together from object-oriented code
    • to properly create a standalone utility:
      • break script during file read routines
      • extract necessary config information, metadata, and parsing instructions
      • replicate only parsing routine in a new script and copy in static configs and meta
  • (major) rev-engineer matlab scripts to check for compatibility
    • test current script against other sample data
    • check for flexibility of code to handle all use cases reading all AZFP data
      • make sensor/config agnostic
      • validate against vendor code (matlab)

Sound speed update not consistent

Right now sound speed is recalculated if environmental parameters are changed, but in modelbase.recalculate_environment() the default uwa option (Mackenzie 1981) is used while for AZFP the formula supplied by the manufacturer should be used.

A consistent way to go is to call .get_sound_speed() for each of the child classes in .recalculate_environment() . Currently this method is complete for AZFP, but only reads from the values stored in .raw file for EK60.

Action items:

  • Update .get_sound_speed() in ModelEK60
  • Change to update self.sound_speed using .get_sound_speed() so that methods for child classes are called.

Add solar radiation data to notebook

Note to:

  • add fetching solar radiation data to the demo OOI notebook
  • for direct comparison with sonar output.

This is for showing what I had here (and obviously by restructuring echopype and moving it around github I have broken the link in this post... 😬)

New release

@leewujung are you planning on using this package during the OHW19? If so can e have a new release so I can package it and make it easy to install with conda?

Parallelize calibration in model module

  • Enable reading multiple converted .nc files and calibrate them in parallel
    • need to revise model/modelbase.py to allow passing in multiple files into the ModelBase and all derived classes; right now the objects only accepts passing in 1 file
  • Enable parallel computation when calling .calibrate when multiple .nc data files are opened at the same time
  • Enable parallel saving calibrated Sv data. User will have the option to:
    • save calibrated Sv into multiple files with filenames corresponding to the uncalibrated .nc files (for example, if fname1.nc and fname2.nc were read together into a model object, the calibrated Sv data will be save into fname1_Sv.nc and fname1_Sv.nc)
    • save calibrated Sv from multiple uncalibrated data files into 1 _Sv.nc file. User need to provide output filename as an input argument as below:
      echo_data = EchoData(FILENAME_LIST)  # open a list of files
      # save all calibrated Sv into 1 file if filename_out is provided
      echo_data.calibrate(save_opt=True,
                          filename_out='COMBINED_NAME.nc')  # default filename_out=None

Update to be compatible with xarray=0.14.1

Current xarray version is fixed at 0.13.0. Need to fix the following problem:

echopype/model/modelbase.py:426: in get_MVBS
    drop({'add_idx', 'range_bin_bins'})
.tox/py37/lib/python3.7/site-packages/xarray/core/dataarray.py:1938: in drop
    ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
.tox/py37/lib/python3.7/site-packages/xarray/core/dataset.py:3643: in drop
    return self.drop_sel(labels, errors=errors)
.tox/py37/lib/python3.7/site-packages/xarray/core/dataset.py:3689: in drop_sel
    labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pos_kwargs = {'add_idx', 'range_bin_bins'}, kw_kwargs = {}, func_name = 'drop'
    def either_dict_or_kwargs(
        pos_kwargs: Optional[Mapping[Hashable, T]],
        kw_kwargs: Mapping[str, T],
        func_name: str,
    ) -> Mapping[Hashable, T]:
        if pos_kwargs is not None:
            if not is_dict_like(pos_kwargs):
                raise ValueError(
>                   "the first argument to .%s must be a dictionary" % func_name
                )
E               ValueError: the first argument to .drop must be a dictionary

No pip wheels

Overview

I notice that there is no pip wheel in pypi. https://pypi.org/project/echopype/#files

I suggest creating and adding pip wheel for this package for ease of install for the user through pip.

Though once you update the package, I can go ahead and create a conda recipe, so that this package can be installed through conda, which would make it super easy! 😄

Plus, I'd like to use it in my yodapy package possibly so it would be good to make the install easy! 😉

Whitespace in path breaks EK60 conversions

Example path: "/home/user/source/oops i spaced again/filename.raw"

Code to reproduce: very first file conversion example in the docs

from echopype.convert import Convert
data_tmp = Convert('/a/path/that has spaces/FILENAME.raw')
data_tmp.raw2nc()

Traceback after running my test (and crashing):

  File "/Users/user/source/oops i spaced again/test.py", line 7, in <module>
    ek60_converter.raw2nc()
  File "/Users/user/source/echopype/venv/lib/python3.7/site-packages/echopype/convert/ek60.py", line 888, in raw2nc
    grp.set_toplevel(_set_toplevel_dict())  # top-level group
  File "/Users/user/source/echopype/venv/lib/python3.7/site-packages/echopype/convert/ek60.py", line 695, in _set_toplevel_dict
    out_dict['date_created'] = dt.strptime(fm.group('date') + '-' + fm.group('time'),
AttributeError: 'NoneType' object has no attribute 'group'

Tested on (and failing on) Mac OS 10.14.6 and Windows 10 Pro 1903.

Solution: if I remove the spaces in my test paths, files are parsed correctly.

SerializationWarning when saving EK60 xarray variables with dtype=object

Some data variables in the beam and NMEA groups for EK60 raise a SerializationWarning when saved. These variables are channel_id and gpt_software_version in the beam group and NMEA_datagram in the NMEA group.

The full warning:
SerializationWarning: variable channel_id has data in the form of a dask array with dtype=object, which means it is being loaded into memory to determine a data type that can be safely stored on disk. To avoid this, coerce this variable to a fixed-size dtype with astype() before saving it.

Allow combining multiple raw files when converting to nc/zarr

Add option for both netcdf and zarr to combine multiple input raw files into 1 big file.

  • under this option, default to combining all files passed in as a list into 1 big .nc file
  • give user the option to save all data within each month to individual big .nc files
  • something like below:
    fileconvert = Convert(FILE_LIST)  # pass in a list of files to be converted
    
    # default to convert each raw file to 1 .nc file
    fileconvert.raw2nc(combine_opt=None)  
    
    # convert all data in input raw files to 1 big .nc file
    fileconvert.raw2nc(combine_opt='all')

Currently this is partially implemented: we can convert multiple files into one output file at once,
BUT the converted file use the filename of the first file. Need to change it such that user can specify the combined output filename. The filename part is related to #88 .

Make uniform convert and data model interface

We need a wrapper for the convert and data model classes so that users do not have to explicitly call the class corresponding to specific sonar instrument to convert and manipulate data.

The command line tool echopype_converter also needs to be updated accordingly.

Add zarr support to Convert module

Add zarr as an output option for output in addition to netcdf via .raw2zarr.

Probably the most straightforward is to make the current .raw2nc into a private method .raw_unpack(output_type) where output_type='nc' or 'zarr', and call function called from the new .raw2nc and .raw2zarr

Allow interoperating calibration files from Simrad and EchoView

It will be useful to allow reading and exporting calibration files that are commonly used in the community. These include the .xml file produces by Simrad and the .ecs file produced by EchoView after calibration data is read into the software. The Simrad .xml contains ping by ping measurements which should really be saved into a data file, but at the end of the file are the derived calibration coefficients which we should parse.

Parallelize MVBS calculation and noise removal

This is related to #74 as parallelization and chunking in the dask array under the hood in xarray after open_mfdataset are locked together, see here and this issue.

What we need is to:

  • Parallelize MVBS (mean volume backscattering strength) calculation
    • The actual MBVS operation is to calculate mean values of each calibrated Sv tile (x-axis: ping or time, y-axis: depth or range)
  • Parallelize noise removal -- the operation is almost identical to MVBS computation
    • The noise removal operation uses the minimum of the same tiled mean as in MVBS along the same vertical averaged column (across a range of pings) as the noise threshold.
    • It's the algorithm described in this paper.

This is related to #54.

Memory usage when combining multiple files

When unpacking and combining data from multiple files, currently all unpacked into memory and written at once to the output .nc or .zarr file. This obviously is prone to memory usage issues. For example, the EK60 eclipse example notebook runs locally if memory is large enough but the kernel would die on Binder.

@ngkavin is working on doing this more efficiently by creating a file and then appending data from subsequent files. This is currently supported by xarray for zarr but not yet supported for netCDF.

MVBS calculation based on either time or ping number

The get_MVBS method should have options to average based on either ping number (current) or ping time (not implemented yet), like below:

# Average based on time
EchoData.get_MVBS(average_type='time', MVBS_time_bin=time)  # time : timedelta
# Average based on ping numbers
EchoData.get_MVBS(average_type='ping', MVBS_ping_bin=ping_size)  # ping_size : int

It should also accommodate averaging across the file boundaries when many consecutive files were collected in a mission. This requires using xarray.open_mfdataset. Most likely also need to look into efficiency issue re. chunking and invoking dask correctly.

Redundant data in NMEA group

At the moment the entire NMEA group is copied for all files that are split due to changes in range_bin in the middle of the raw file. The NMEA datagrams should be split into each of the part01, part02, ... files. Let's take care of this once we refactor the code that combines/appends data from multiple files.

Create netCDF checker and file naming convention

We need to make a file checker that, during the initial file parsing (.raw or .01a to netCDF), checks to see if certain acoustic parameters change. If any one of these parameters change in the middle of a .raw or .01a file, writing to the netCDF should stop, the netCFD should be saved, and a new netCDF should be opened and written to.

  • Power
  • Pulse duration
  • Sample unit (ping rate)
  • Receiver bandwidth
  • Sound speed
  • Start/Stop of a file

FILE NAMING. From The SONAR-netCDF4 convention for sonar data, Version 1.0.
"SONAR-netCDF4 files should always end with a “.nc” suffix to indicate that they are
a netCDF file. It is recommended that the filename should sort alphanumerically into
chronological order (e.g. date and time of the first ping in the file; thus: YYYYMMDDHHMMSS.
nc). This facilitates file management and use in analysis systems."

numcodecs wheels problem

tox -r errors out in python 3.7 due to problem in building the wheels for numcodecs, which is required in zarr>=2.3.2 (latest version at the moment).

This seems related to issues #70 and #210 in zarr.

For now I changed the default requirement to be python==3.8, but it would be nice to be able to run tox for py37.

Allow updating calibration coefficient

Currently users can update environment parameters and the procedure is documented. However the same needs to be added to update other calibration-related coefficients needed for EK60 and AZFP.

Revise/subclass `SetGroups`

Currently the class SetGroups under convert/set_nc_groups.py is written only for EK60 data and will likely error out when saving data unpacked from other echosounders.

We can just add switches in the current class to accommodate differences arising from different raw data formats, or we can write subclass for different formats.

@SvenGastauer @valentina-s: Thoughts?

Correct split-beam angle parsing

Currently the angle data are not parsed correctly for EK60 and EK80 CW mode files.

Changes need to be make in convert/utils/ek_raw_parsers.py for both RAW0 and RAW3 non-complex parsing to be:

data['angle'] = np.frombuffer(raw_string[indx:indx + block_size], dtype='int8')
data['angle'] = data['angle'].reshape((-1,2))

The first column is athwardship angle, the second column is alongship angle, both in "count" -- i.e., angle conversion has to be implemented as a method in the model class.

This will have cascading changes in returning data with different range_bin lengths and in convert/utils/set_groups_ek60.py.

For the latter, change the split_beam_angle into two data variables angle_alongship and angle_athwartship, each with coordinates ['frequency', 'ping_time', 'range_bin'].

@ngkavin I added this as an issue so that we can keep track of it.

Merge the two README files

There are currently two README files:

  • The .rst file contains installation and usage info that are displayed on PyPI.
  • The .md file contains more project background and format info.

These two files need to be merged and the ideal outcome will be:

  • One README (use .rst for PyPI) containing: project background, installation, 1 quick usage example.
  • A separate doc file to explain the converted netCDF format. This should probably be structured within docs.

Resolve zarr convert error in next release

There is a bug in setting the Provenance group when converting to zarr. This bug is fixed in the repo and will be included in the next release. As a result it is not possible to install echopype from conda at the moment since the feedstock build currently fails.

Install using pip from the repo if you want to convert files into zarr.

Allow over-writing existing nc file

Users sometimes need to over-write an existing .nc file that has been converted before. Currently the convert module will issue a warning and abort if there is already an existing file with the same filename as the raw binary file. This is a good default behavior but should give user this option.

Something like:

tmp = Convert(DATA_FILE)
tmp.raw2nc(overwrite=True)   # default overwrite=False

Basic echogram plots

We'll add a thin wrapper to the xarray plotting function to plot basic echograms (Sv or MVBS).

  • x-axis:
    • actual time stamp (coordinate ping_time) or
    • ping number (just plain numbers 1-n)
  • y-axis: three options here:
    • range in meters
    • depth in meters (corrected for tilt if known)
    • time in milliseconds
  • Efficiency: we'll need to decide when the echogram becomes too large to be plotted directly using matplotlib and need to use tools like datashader. Need to experiment with loading/plotting many files. With the reduced size of MVBS it's likely not an issue, but for plotting the calibrated Sv or raw data, it can be slow depending on the ping rate and data time span.
  • Colormap: colormap is an issue here. The fisheries acoustics community is used the so called EK500 colormap (below), and Matlab users are very much used to jet. We should promote colorblind-friendly colormaps. Magma seems to work well among the matplotlib choices.

  • EK500 colormap RGB values, from IMOS BASOOP. We should make this an option for plotting echogram.

    function [EK500cmap]=EK500colourmap()
      % EK500colourmap is the colour map used by EK500
          EK500cmap = [255   255   255   % white
              159   159   159   % light grey
              95    95    95   % grey
              0     0   255   % dark blue
              0     0   127   % blue
              0   191     0   % green
              0   127     0   % dark green
              255   255     0   % yellow
              255   127     0   % orange
              255     0   191   % pink
              255     0     0   % red
              166    83    60   % light brown
              120    60    40]./255;  % dark brown
      end

    something like this:
    Screen Shot 2019-12-10 at 10 36 05 PM

Transfer issues over from OWH/echopype repo

When echopype repo was transferred from OWH2018 the issues did not come with it. They need to be manually transferred over and checked against the current issues. I also need to figure out how I can assign myself the issue.

remove_noise for AZFP data

I receive the following error trying to use the remove_noise function with AZFP data: ValueError: conflicting sizes for dimension 'frequency': length 838 on 'range' and length 4 on 'frequency'

I am currently using Echopype version: 0.3.0+2.gd494fd0 but also had this issue using 0.2.0

To reproduce error, please see attached code which is based on code from an echopype jupyter notebook.
TestNoise2.txt

Thanks!

Add echo metrics to process

Add calculation of "echo metrics" to process.

Original code from the paper is here.

My thought on this is that the analysis module should be able to operate based on the data model/netCDF so that it's independent from the data format. In theory the metrics are based on the calibrated Sv data (and maybe time) only so hopefully this is relatively straightforward.

Save generated files at user-specified path

Currently all files generated from raw binary file to netCDF conversion as well as calibration and MVBS calculation are saved in the same folder as the raw binary files. This is not very convenient to use. We should allow users to specify the path to save generated files.

For conversion, this can simply be:

tmp = Convert(RAW_FILE)
tmp.raw2nc(save_path='PATH_TO_SAVE_NC_FILE')

For Sv and MVBS calculation, this can probably be:

tmp = EchoData(NC_FILE)
tmp.calibrate(save=True, save_path='PATH_TO_SAVE_SV_FILE')
tmp.get_MVBS(save=True, save_path='PATH_TO_SAVE_MVBS_FILE')

Use xarray/pandas to read/manipulate EK60 data

Currently numpy is used for read/manipulate EK60 data, but it's not as efficient nor convenient to use compared to xarray and pandas.

Task:

  • Identify functions that need to be changed
  • Check variable/attribute naming along with #6
  • Change functions so that the unpacked data are read in as xarray object and directly saved into the netCDF file

Produce test data set

Produce small data sets that contains only a couple pings for testing purposes (#4 travis-ci)

Need data from:

  • EK60
  • EK80
  • AZFP

Plot data cube

We should implement this:
Screen Shot 2019-12-10 at 10 42 31 PM
source: https://aslenv.com/AZFP-data.html

The axes are depth, time of day, and dates -- just one more dimension than the flat echogram.

Let's also see how to relate this to multi-frequency and broadband data. :)

Install requires doesn't actually do anything.

Overview

There's an issue with install requires, which is not really listing the dependencies for this package. So when performing pip install, it assumes you have all the dependencies already on your python environment. Otherwise, I can't use the package right away!

install_requires=INSTALL_REQUIRES,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.