osoceanacoustics / echopype Goto Github PK

View Code? Open in Web Editor NEW

94.0 19.0 74.0 97.97 MB

Enabling interoperability and scalability in ocean sonar data analysis

Home Page: https://echopype.readthedocs.io/

License: Apache License 2.0

Python 99.33% Dockerfile 0.01% CSS 0.42% HTML 0.09% Jinja 0.14%

ocean acoustics echosounder sonar netcdf ek60 azfp xarray ek80 zarr

echopype's People

Contributors

Stargazers

Watchers

Forkers

bnwkeys amarburg leviner kuoth86 luisr-ramirez luis-ramirez-r valentina-s lsetiawan svengastauer lsxinh2 cyrf0006 emiliom glien doron2402 nilsolav ngkavin leewujung timryan99 rhtowler ocefpaf fletcherft yiliu-coding dlwangsan alejandro-ariza kevinboswell tubbz-alt ericjmartin emlynjdavies imranmaj nickmortimer gavinmacaulay mbdunn glazejimmy ze-sys b-reyes cxzhangqi thomas-boehrer zhixionggong sessink kstebb15 liliaguillet erinann sindrevatnehol smohid26 oceanstreamio nathan-rousselot praneethratna sohambutala anantmittal ctuguinay erlingdevold valschmidt xtor3 mohamednasser8 skald1311 akashkumar7902 kshitijpatil16 yilovezy hq786 anujsinha3 uw-farlab oftfrfbf patel999jay oceanstreamio dash-uvic parvatijay2901 dimasikcrypto spacetimeengineer

echopype's Issues

Add TS calculation in `ModelEK60`

ModelAZFP already has the method to calculate TS. We need this for ModelEK60 as well just as we already have for calculating Sv via .calibrate() for data from both echosounders.

Error when appending data to a group under a Zarr file

Calling Dataset.to_zarr() raises a KeyError when appending to a .zarr group. This happens when combining raw files into a single Zarr file

Combine functions from PyEcholab

PyEchoLab and echopype were based off the same code (EchoLab in Matlab), so we want to merge them along with the changes to make use of xarray and pandas.

Tasks:

Identify functions that are missing in either of them
Decided which functions/methods to keep for unpacking
Decided which functions/methods to keep for data manipulation and visualization
Change function as in #10 to use xarray/pandas as the underlying packages instead of numpy.

Revise command line tool using uniform interface

Need to update the command line tool now that the convert and model modules have uniform interface for AZFP and EK60 data (#53).

Specifically, need to change the input optional parameter to accommodate the additional .XML needed for AZFP unpacking.

The goal is to be able to do:

$ echopype_converter -s ek60 path_to_data_files(allow wildcard)
$ echopype_converter -s azfp -x xml_file path_to_data_files(allow wildcard)

In the above:

-s/--system indicate sonar system
-x/--xml-file xml file to use to parse azfp data
issue warning “An XML file is needed for converting AZFL echosounder data” if -s azfp is passed but xml file not specified
issue warning “The XML file is ignored as it is not needed for converting

Set up for packaging and continuous integration

Change general folder structure for packaging
Set up Travis-ci for continuous integration in the package development.

Resources:

Shablona - A template for small scientific python projects
Travis Python Template repo

AZFP Unpack Improvement

(major) unpacked AZFP data is not being saved
- will require debugging script to interrupt before plotting routine and write out data
- unpacked AZFP data needs to be packed into netCDF format
(major) refactor code, get rid of object-oriented paradigm
- this is literally MacGyvered together from object-oriented code
- to properly create a standalone utility:
  - break script during file read routines
  - extract necessary config information, metadata, and parsing instructions
  - replicate only parsing routine in a new script and copy in static configs and meta
(major) rev-engineer matlab scripts to check for compatibility
- test current script against other sample data
- check for flexibility of code to handle all use cases reading all AZFP data
  - make sensor/config agnostic
  - validate against vendor code (matlab)

Sound speed update not consistent

Right now sound speed is recalculated if environmental parameters are changed, but in modelbase.recalculate_environment() the default uwa option (Mackenzie 1981) is used while for AZFP the formula supplied by the manufacturer should be used.

A consistent way to go is to call .get_sound_speed() for each of the child classes in .recalculate_environment() . Currently this method is complete for AZFP, but only reads from the values stored in .raw file for EK60.

Action items:

Update .get_sound_speed() in ModelEK60
Change to update self.sound_speed using .get_sound_speed() so that methods for child classes are called.

Add solar radiation data to notebook

Note to:

add fetching solar radiation data to the demo OOI notebook
for direct comparison with sonar output.

This is for showing what I had here (and obviously by restructuring echopype and moving it around github I have broken the link in this post... 😬)

New release

@leewujung are you planning on using this package during the OHW19? If so can e have a new release so I can package it and make it easy to install with conda?

Clean up file ek80 code

Parallelize calibration in model module

Enable reading multiple converted .nc files and calibrate them in parallel
- need to revise model/modelbase.py to allow passing in multiple files into the ModelBase and all derived classes; right now the objects only accepts passing in 1 file
Enable parallel computation when calling .calibrate when multiple .nc data files are opened at the same time
Enable parallel saving calibrated Sv data. User will have the option to:
- save calibrated Sv into multiple files with filenames corresponding to the uncalibrated .nc files (for example, if fname1.nc and fname2.nc were read together into a model object, the calibrated Sv data will be save into fname1_Sv.nc and fname1_Sv.nc)
- save calibrated Sv from multiple uncalibrated data files into 1 _Sv.nc file. User need to provide output filename as an input argument as below:
```
echo_data = EchoData(FILENAME_LIST)  # open a list of files
# save all calibrated Sv into 1 file if filename_out is provided
echo_data.calibrate(save_opt=True,
                    filename_out='COMBINED_NAME.nc')  # default filename_out=None
```

Write function to define xarray in ICES netCDF4 format

This function will create a common framework for unpacking functions of different file formats to save data into.

Ref:

ICES SONAR-netCDF4 convention

Add NMEA parser to EK60 converter

Add NMEA parser to EK60 .raw converter. Can reference PyEchoLab as mentionedin #11.

Update to be compatible with xarray=0.14.1

Current xarray version is fixed at 0.13.0. Need to fix the following problem:

echopype/model/modelbase.py:426: in get_MVBS
    drop({'add_idx', 'range_bin_bins'})
.tox/py37/lib/python3.7/site-packages/xarray/core/dataarray.py:1938: in drop
    ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
.tox/py37/lib/python3.7/site-packages/xarray/core/dataset.py:3643: in drop
    return self.drop_sel(labels, errors=errors)
.tox/py37/lib/python3.7/site-packages/xarray/core/dataset.py:3689: in drop_sel
    labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pos_kwargs = {'add_idx', 'range_bin_bins'}, kw_kwargs = {}, func_name = 'drop'
    def either_dict_or_kwargs(
        pos_kwargs: Optional[Mapping[Hashable, T]],
        kw_kwargs: Mapping[str, T],
        func_name: str,
    ) -> Mapping[Hashable, T]:
        if pos_kwargs is not None:
            if not is_dict_like(pos_kwargs):
                raise ValueError(
>                   "the first argument to .%s must be a dictionary" % func_name
                )
E               ValueError: the first argument to .drop must be a dictionary

Find a way to handle test files that are growing larger

Currently the test files sit with the repo and the size of these files are becoming bigger now that EK80 data come into play. Seems like enabling Git LFS and caching would work with Travis?

Some refs:

Sound speed and absorption coeff calculation

Directly import existing package or have a separate module for calculating sound speed and absorption coefficient, for easier maintenance.

No pip wheels

Overview

I notice that there is no pip wheel in pypi. https://pypi.org/project/echopype/#files

I suggest creating and adding pip wheel for this package for ease of install for the user through pip.

Though once you update the package, I can go ahead and create a conda recipe, so that this package can be installed through conda, which would make it super easy! 😄

Plus, I'd like to use it in my yodapy package possibly so it would be good to make the install easy! 😉

Whitespace in path breaks EK60 conversions

Example path: "/home/user/source/oops i spaced again/filename.raw"

Code to reproduce: very first file conversion example in the docs

from echopype.convert import Convert
data_tmp = Convert('/a/path/that has spaces/FILENAME.raw')
data_tmp.raw2nc()

Traceback after running my test (and crashing):

  File "/Users/user/source/oops i spaced again/test.py", line 7, in <module>
    ek60_converter.raw2nc()
  File "/Users/user/source/echopype/venv/lib/python3.7/site-packages/echopype/convert/ek60.py", line 888, in raw2nc
    grp.set_toplevel(_set_toplevel_dict())  # top-level group
  File "/Users/user/source/echopype/venv/lib/python3.7/site-packages/echopype/convert/ek60.py", line 695, in _set_toplevel_dict
    out_dict['date_created'] = dt.strptime(fm.group('date') + '-' + fm.group('time'),
AttributeError: 'NoneType' object has no attribute 'group'

Tested on (and failing on) Mac OS 10.14.6 and Windows 10 Pro 1903.

Solution: if I remove the spaces in my test paths, files are parsed correctly.

SerializationWarning when saving EK60 xarray variables with dtype=object

Some data variables in the beam and NMEA groups for EK60 raise a SerializationWarning when saved. These variables are channel_id and gpt_software_version in the beam group and NMEA_datagram in the NMEA group.

The full warning:
SerializationWarning: variable channel_id has data in the form of a dask array with dtype=object, which means it is being loaded into memory to determine a data type that can be safely stored on disk. To avoid this, coerce this variable to a fixed-size dtype with astype() before saving it.

Allow combining multiple raw files when converting to nc/zarr

Add option for both netcdf and zarr to combine multiple input raw files into 1 big file.

under this option, default to combining all files passed in as a list into 1 big .nc file
give user the option to save all data within each month to individual big .nc files

something like below:

fileconvert = Convert(FILE_LIST)  # pass in a list of files to be converted

# default to convert each raw file to 1 .nc file
fileconvert.raw2nc(combine_opt=None)  

# convert all data in input raw files to 1 big .nc file
fileconvert.raw2nc(combine_opt='all')

Currently this is partially implemented: we can convert multiple files into one output file at once,
BUT the converted file use the filename of the first file. Need to change it such that user can specify the combined output filename. The filename part is related to #88 .

Make uniform convert and data model interface

We need a wrapper for the convert and data model classes so that users do not have to explicitly call the class corresponding to specific sonar instrument to convert and manipulate data.

The command line tool echopype_converter also needs to be updated accordingly.

Add zarr support to Convert module

Add zarr as an output option for output in addition to netcdf via .raw2zarr.

Probably the most straightforward is to make the current .raw2nc into a private method .raw_unpack(output_type) where output_type='nc' or 'zarr', and call function called from the new .raw2nc and .raw2zarr

Allow interoperating calibration files from Simrad and EchoView

It will be useful to allow reading and exporting calibration files that are commonly used in the community. These include the .xml file produces by Simrad and the .ecs file produced by EchoView after calibration data is read into the software. The Simrad .xml contains ping by ping measurements which should really be saved into a data file, but at the end of the file are the derived calibration coefficients which we should parse.

Parallelize MVBS calculation and noise removal

This is related to #74 as parallelization and chunking in the dask array under the hood in xarray after open_mfdataset are locked together, see here and this issue.

What we need is to:

Parallelize MVBS (mean volume backscattering strength) calculation
- The actual MBVS operation is to calculate mean values of each calibrated Sv tile (x-axis: ping or time, y-axis: depth or range)
Parallelize noise removal -- the operation is almost identical to MVBS computation
- The noise removal operation uses the minimum of the same tiled mean as in MVBS along the same vertical averaged column (across a range of pings) as the noise threshold.
- It's the algorithm described in this paper.

This is related to #54.

Memory usage when combining multiple files

When unpacking and combining data from multiple files, currently all unpacked into memory and written at once to the output .nc or .zarr file. This obviously is prone to memory usage issues. For example, the EK60 eclipse example notebook runs locally if memory is large enough but the kernel would die on Binder.

@ngkavin is working on doing this more efficiently by creating a file and then appending data from subsequent files. This is currently supported by xarray for zarr but not yet supported for netCDF.

MVBS calculation based on either time or ping number

The get_MVBS method should have options to average based on either ping number (current) or ping time (not implemented yet), like below:

# Average based on time
EchoData.get_MVBS(average_type='time', MVBS_time_bin=time)  # time : timedelta
# Average based on ping numbers
EchoData.get_MVBS(average_type='ping', MVBS_ping_bin=ping_size)  # ping_size : int

It should also accommodate averaging across the file boundaries when many consecutive files were collected in a mission. This requires using xarray.open_mfdataset. Most likely also need to look into efficiency issue re. chunking and invoking dask correctly.

Redundant data in NMEA group

At the moment the entire NMEA group is copied for all files that are split due to changes in range_bin in the middle of the raw file. The NMEA datagrams should be split into each of the part01, part02, ... files. Let's take care of this once we refactor the code that combines/appends data from multiple files.

Create netCDF checker and file naming convention

We need to make a file checker that, during the initial file parsing (.raw or .01a to netCDF), checks to see if certain acoustic parameters change. If any one of these parameters change in the middle of a .raw or .01a file, writing to the netCDF should stop, the netCFD should be saved, and a new netCDF should be opened and written to.

Power
Pulse duration
Sample unit (ping rate)
Receiver bandwidth
Sound speed
Start/Stop of a file

FILE NAMING. From The SONAR-netCDF4 convention for sonar data, Version 1.0.
"SONAR-netCDF4 files should always end with a “.nc” suffix to indicate that they are
a netCDF file. It is recommended that the filename should sort alphanumerically into
chronological order (e.g. date and time of the first ping in the file; thus: YYYYMMDDHHMMSS.
nc). This facilitates file management and use in analysis systems."

numcodecs wheels problem

tox -r errors out in python 3.7 due to problem in building the wheels for numcodecs, which is required in zarr>=2.3.2 (latest version at the moment).

This seems related to issues #70 and #210 in zarr.

For now I changed the default requirement to be python==3.8, but it would be nice to be able to run tox for py37.

Allow updating calibration coefficient

Currently users can update environment parameters and the procedure is documented. However the same needs to be added to update other calibration-related coefficients needed for EK60 and AZFP.

Revise/subclass `SetGroups`

Currently the class SetGroups under convert/set_nc_groups.py is written only for EK60 data and will likely error out when saving data unpacked from other echosounders.

We can just add switches in the current class to accommodate differences arising from different raw data formats, or we can write subclass for different formats.

@SvenGastauer @valentina-s: Thoughts?

Correct split-beam angle parsing

Currently the angle data are not parsed correctly for EK60 and EK80 CW mode files.

Changes need to be make in convert/utils/ek_raw_parsers.py for both RAW0 and RAW3 non-complex parsing to be:

data['angle'] = np.frombuffer(raw_string[indx:indx + block_size], dtype='int8')
data['angle'] = data['angle'].reshape((-1,2))

The first column is athwardship angle, the second column is alongship angle, both in "count" -- i.e., angle conversion has to be implemented as a method in the model class.

This will have cascading changes in returning data with different range_bin lengths and in convert/utils/set_groups_ek60.py.

For the latter, change the split_beam_angle into two data variables angle_alongship and angle_athwartship, each with coordinates ['frequency', 'ping_time', 'range_bin'].

@ngkavin I added this as an issue so that we can keep track of it.

Merge the two README files

There are currently two README files:

The .rst file contains installation and usage info that are displayed on PyPI.
The .md file contains more project background and format info.

These two files need to be merged and the ideal outcome will be:

One README (use .rst for PyPI) containing: project background, installation, 1 quick usage example.
A separate doc file to explain the converted netCDF format. This should probably be structured within docs.

MVBS calculation documentation and provenance

Documentation on how to set bin size or slicing along range and ping time for get_MVBS and remove_noise is needed.

RAW data parser (for ek80 code)

Resolve zarr convert error in next release

There is a bug in setting the Provenance group when converting to zarr. This bug is fixed in the repo and will be included in the next release. As a result it is not possible to install echopype from conda at the moment since the feedstock build currently fails.

Install using pip from the repo if you want to convert files into zarr.

Allow over-writing existing nc file

Users sometimes need to over-write an existing .nc file that has been converted before. Currently the convert module will issue a warning and abort if there is already an existing file with the same filename as the raw binary file. This is a good default behavior but should give user this option.

Something like:

tmp = Convert(DATA_FILE)
tmp.raw2nc(overwrite=True)   # default overwrite=False

Basic echogram plots

We'll add a thin wrapper to the xarray plotting function to plot basic echograms (Sv or MVBS).

x-axis:
- actual time stamp (coordinate ping_time) or
- ping number (just plain numbers 1-n)
y-axis: three options here:
- range in meters
- depth in meters (corrected for tilt if known)
- time in milliseconds
Efficiency: we'll need to decide when the echogram becomes too large to be plotted directly using matplotlib and need to use tools like datashader. Need to experiment with loading/plotting many files. With the reduced size of MVBS it's likely not an issue, but for plotting the calibrated Sv or raw data, it can be slow depending on the ping rate and data time span.

Colormap: colormap is an issue here. The fisheries acoustics community is used the so called EK500 colormap (below), and Matlab users are very much used to jet. We should promote colorblind-friendly colormaps. Magma seems to work well among the matplotlib choices.

EK500 colormap RGB values, from IMOS BASOOP. We should make this an option for plotting echogram.

function [EK500cmap]=EK500colourmap()
  % EK500colourmap is the colour map used by EK500
      EK500cmap = [255   255   255   % white
          159   159   159   % light grey
          95    95    95   % grey
          0     0   255   % dark blue
          0     0   127   % blue
          0   191     0   % green
          0   127     0   % dark green
          255   255     0   % yellow
          255   127     0   % orange
          255     0   191   % pink
          255     0     0   % red
          166    83    60   % light brown
          120    60    40]./255;  % dark brown
  end

something like this:

Fix loading for XML header in EK80 data

Just work on "unpack_ek60.py", but fix load_ek60_raw() function so it succeeds with EK80 (define it as new function)

Decide license

Apache or MIT?

Transfer issues over from OWH/echopype repo

When echopype repo was transferred from OWH2018 the issues did not come with it. They need to be manually transferred over and checked against the current issues. I also need to figure out how I can assign myself the issue.

remove_noise for AZFP data

I receive the following error trying to use the remove_noise function with AZFP data: ValueError: conflicting sizes for dimension 'frequency': length 838 on 'range' and length 4 on 'frequency'

I am currently using Echopype version: 0.3.0+2.gd494fd0 but also had this issue using 0.2.0

To reproduce error, please see attached code which is based on code from an echopype jupyter notebook.
TestNoise2.txt

Thanks!

Add echo metrics to process

Add calculation of "echo metrics" to process.

Original code from the paper is here.

My thought on this is that the analysis module should be able to operate based on the data model/netCDF so that it's independent from the data format. In theory the metrics are based on the calibrated Sv data (and maybe time) only so hopefully this is relatively straightforward.

Save generated files at user-specified path

Currently all files generated from raw binary file to netCDF conversion as well as calibration and MVBS calculation are saved in the same folder as the raw binary files. This is not very convenient to use. We should allow users to specify the path to save generated files.

For conversion, this can simply be:

tmp = Convert(RAW_FILE)
tmp.raw2nc(save_path='PATH_TO_SAVE_NC_FILE')

For Sv and MVBS calculation, this can probably be:

tmp = EchoData(NC_FILE)
tmp.calibrate(save=True, save_path='PATH_TO_SAVE_SV_FILE')
tmp.get_MVBS(save=True, save_path='PATH_TO_SAVE_MVBS_FILE')

read XML structure into desired variables (environment, etc)

Use xarray/pandas to read/manipulate EK60 data

Currently numpy is used for read/manipulate EK60 data, but it's not as efficient nor convenient to use compared to xarray and pandas.

Task:

Identify functions that need to be changed
Check variable/attribute naming along with #6
Change functions so that the unpacked data are read in as xarray object and directly saved into the netCDF file

Produce test data set

Produce small data sets that contains only a couple pings for testing purposes (#4 travis-ci)

Need data from:

EK60
EK80
AZFP

Plot data cube

We should implement this:

source: https://aslenv.com/AZFP-data.html

The axes are depth, time of day, and dates -- just one more dimension than the flat echogram.

Let's also see how to relate this to multi-frequency and broadband data. :)

Flexibility for converting data with different number of bins per ping or frequency

Potential solutions:

Saving echo data from different frequency channels into different groups when the range_bin dimension is different?
Padding ping-by-ping echo data with NaNs if range_bin is different for each ping?

This issue is from #6.

Install requires doesn't actually do anything.

Overview

There's an issue with install requires, which is not really listing the dependencies for this package. So when performing pip install, it assumes you have all the dependencies already on your python environment. Otherwise, I can't use the package right away!

echopype/setup.py

Line 29 in e8b735c

install_requires=INSTALL_REQUIRES,

osoceanacoustics / echopype Goto Github PK

echopype's People

Contributors

Stargazers

Watchers

Forkers

echopype's Issues

Overview

Overview

Recommend Projects

Recommend Topics

Recommend Org