osoceanacoustics / echopype Goto Github PK
View Code? Open in Web Editor NEWEnabling interoperability and scalability in ocean sonar data analysis
Home Page: https://echopype.readthedocs.io/
License: Apache License 2.0
Enabling interoperability and scalability in ocean sonar data analysis
Home Page: https://echopype.readthedocs.io/
License: Apache License 2.0
ModelAZFP
already has the method to calculate TS. We need this for ModelEK60
as well just as we already have for calculating Sv via .calibrate()
for data from both echosounders.
Calling Dataset.to_zarr()
raises a KeyError
when appending to a .zarr group. This happens when combining raw files into a single Zarr file
PyEchoLab and echopype were based off the same code (EchoLab in Matlab), so we want to merge them along with the changes to make use of xarray and pandas.
Tasks:
Need to update the command line tool now that the convert and model modules have uniform interface for AZFP and EK60 data (#53).
Specifically, need to change the input optional parameter to accommodate the additional .XML
needed for AZFP unpacking.
The goal is to be able to do:
In the above:
Resources:
Right now sound speed is recalculated if environmental parameters are changed, but in modelbase.recalculate_environment()
the default uwa option (Mackenzie 1981) is used while for AZFP the formula supplied by the manufacturer should be used.
A consistent way to go is to call .get_sound_speed()
for each of the child classes in .recalculate_environment()
. Currently this method is complete for AZFP, but only reads from the values stored in .raw file for EK60.
Action items:
.get_sound_speed()
in ModelEK60self.sound_speed
using .get_sound_speed()
so that methods for child classes are called.Note to:
This is for showing what I had here (and obviously by restructuring echopype and moving it around github I have broken the link in this post... 😬)
@leewujung are you planning on using this package during the OHW19? If so can e have a new release so I can package it and make it easy to install with conda?
.calibrate
when multiple .nc data files are opened at the same timefname1.nc
and fname2.nc
were read together into a model object, the calibrated Sv data will be save into fname1_Sv.nc
and fname1_Sv.nc
)echo_data = EchoData(FILENAME_LIST) # open a list of files
# save all calibrated Sv into 1 file if filename_out is provided
echo_data.calibrate(save_opt=True,
filename_out='COMBINED_NAME.nc') # default filename_out=None
This function will create a common framework for unpacking functions of different file formats to save data into.
Ref:
Add NMEA parser to EK60 .raw converter. Can reference PyEchoLab as mentionedin #11.
Current xarray version is fixed at 0.13.0. Need to fix the following problem:
echopype/model/modelbase.py:426: in get_MVBS
drop({'add_idx', 'range_bin_bins'})
.tox/py37/lib/python3.7/site-packages/xarray/core/dataarray.py:1938: in drop
ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
.tox/py37/lib/python3.7/site-packages/xarray/core/dataset.py:3643: in drop
return self.drop_sel(labels, errors=errors)
.tox/py37/lib/python3.7/site-packages/xarray/core/dataset.py:3689: in drop_sel
labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pos_kwargs = {'add_idx', 'range_bin_bins'}, kw_kwargs = {}, func_name = 'drop'
def either_dict_or_kwargs(
pos_kwargs: Optional[Mapping[Hashable, T]],
kw_kwargs: Mapping[str, T],
func_name: str,
) -> Mapping[Hashable, T]:
if pos_kwargs is not None:
if not is_dict_like(pos_kwargs):
raise ValueError(
> "the first argument to .%s must be a dictionary" % func_name
)
E ValueError: the first argument to .drop must be a dictionary
Currently the test files sit with the repo and the size of these files are becoming bigger now that EK80 data come into play. Seems like enabling Git LFS and caching would work with Travis?
Some refs:
Directly import existing package or have a separate module for calculating sound speed and absorption coefficient, for easier maintenance.
I notice that there is no pip wheel in pypi. https://pypi.org/project/echopype/#files
I suggest creating and adding pip wheel for this package for ease of install for the user through pip.
Though once you update the package, I can go ahead and create a conda recipe, so that this package can be installed through conda, which would make it super easy! 😄
Plus, I'd like to use it in my yodapy package possibly so it would be good to make the install easy! 😉
Example path: "/home/user/source/oops i spaced again/filename.raw"
Code to reproduce: very first file conversion example in the docs
from echopype.convert import Convert
data_tmp = Convert('/a/path/that has spaces/FILENAME.raw')
data_tmp.raw2nc()
Traceback after running my test (and crashing):
File "/Users/user/source/oops i spaced again/test.py", line 7, in <module>
ek60_converter.raw2nc()
File "/Users/user/source/echopype/venv/lib/python3.7/site-packages/echopype/convert/ek60.py", line 888, in raw2nc
grp.set_toplevel(_set_toplevel_dict()) # top-level group
File "/Users/user/source/echopype/venv/lib/python3.7/site-packages/echopype/convert/ek60.py", line 695, in _set_toplevel_dict
out_dict['date_created'] = dt.strptime(fm.group('date') + '-' + fm.group('time'),
AttributeError: 'NoneType' object has no attribute 'group'
Tested on (and failing on) Mac OS 10.14.6 and Windows 10 Pro 1903.
Solution: if I remove the spaces in my test paths, files are parsed correctly.
Some data variables in the beam and NMEA groups for EK60 raise a SerializationWarning when saved. These variables are channel_id
and gpt_software_version
in the beam group and NMEA_datagram
in the NMEA group.
The full warning:
SerializationWarning: variable channel_id has data in the form of a dask array with dtype=object, which means it is being loaded into memory to determine a data type that can be safely stored on disk. To avoid this, coerce this variable to a fixed-size dtype with astype() before saving it.
Add option for both netcdf and zarr to combine multiple input raw files into 1 big file.
fileconvert = Convert(FILE_LIST) # pass in a list of files to be converted
# default to convert each raw file to 1 .nc file
fileconvert.raw2nc(combine_opt=None)
# convert all data in input raw files to 1 big .nc file
fileconvert.raw2nc(combine_opt='all')
Currently this is partially implemented: we can convert multiple files into one output file at once,
BUT the converted file use the filename of the first file. Need to change it such that user can specify the combined output filename. The filename part is related to #88 .
We need a wrapper for the convert and data model classes so that users do not have to explicitly call the class corresponding to specific sonar instrument to convert and manipulate data.
The command line tool echopype_converter
also needs to be updated accordingly.
Add zarr as an output option for output in addition to netcdf via .raw2zarr
.
Probably the most straightforward is to make the current .raw2nc
into a private method .raw_unpack(output_type)
where output_type='nc' or 'zarr'
, and call function called from the new .raw2nc
and .raw2zarr
It will be useful to allow reading and exporting calibration files that are commonly used in the community. These include the .xml
file produces by Simrad and the .ecs
file produced by EchoView after calibration data is read into the software. The Simrad .xml
contains ping by ping measurements which should really be saved into a data file, but at the end of the file are the derived calibration coefficients which we should parse.
This is related to #74 as parallelization and chunking in the dask array under the hood in xarray after open_mfdataset
are locked together, see here and this issue.
What we need is to:
This is related to #54.
When unpacking and combining data from multiple files, currently all unpacked into memory and written at once to the output .nc or .zarr file. This obviously is prone to memory usage issues. For example, the EK60 eclipse example notebook runs locally if memory is large enough but the kernel would die on Binder.
@ngkavin is working on doing this more efficiently by creating a file and then appending data from subsequent files. This is currently supported by xarray for zarr but not yet supported for netCDF.
The get_MVBS
method should have options to average based on either ping number (current) or ping time (not implemented yet), like below:
# Average based on time
EchoData.get_MVBS(average_type='time', MVBS_time_bin=time) # time : timedelta
# Average based on ping numbers
EchoData.get_MVBS(average_type='ping', MVBS_ping_bin=ping_size) # ping_size : int
It should also accommodate averaging across the file boundaries when many consecutive files were collected in a mission. This requires using xarray.open_mfdataset
. Most likely also need to look into efficiency issue re. chunking and invoking dask
correctly.
At the moment the entire NMEA group is copied for all files that are split due to changes in range_bin
in the middle of the raw file. The NMEA datagrams should be split into each of the part01
, part02
, ... files. Let's take care of this once we refactor the code that combines/appends data from multiple files.
We need to make a file checker that, during the initial file parsing (.raw or .01a to netCDF), checks to see if certain acoustic parameters change. If any one of these parameters change in the middle of a .raw or .01a file, writing to the netCDF should stop, the netCFD should be saved, and a new netCDF should be opened and written to.
FILE NAMING. From The SONAR-netCDF4 convention for sonar data, Version 1.0.
"SONAR-netCDF4 files should always end with a “.nc” suffix to indicate that they are
a netCDF file. It is recommended that the filename should sort alphanumerically into
chronological order (e.g. date and time of the first ping in the file; thus: YYYYMMDDHHMMSS.
nc). This facilitates file management and use in analysis systems."
tox -r
errors out in python 3.7 due to problem in building the wheels for numcodecs, which is required in zarr>=2.3.2 (latest version at the moment).
This seems related to issues #70 and #210 in zarr.
For now I changed the default requirement to be python==3.8, but it would be nice to be able to run tox
for py37.
Currently users can update environment parameters and the procedure is documented. However the same needs to be added to update other calibration-related coefficients needed for EK60 and AZFP.
Currently the class SetGroups
under convert/set_nc_groups.py
is written only for EK60 data and will likely error out when saving data unpacked from other echosounders.
We can just add switches in the current class to accommodate differences arising from different raw data formats, or we can write subclass for different formats.
@SvenGastauer @valentina-s: Thoughts?
Currently the angle data are not parsed correctly for EK60 and EK80 CW mode files.
Changes need to be make in convert/utils/ek_raw_parsers.py
for both RAW0 and RAW3 non-complex parsing to be:
data['angle'] = np.frombuffer(raw_string[indx:indx + block_size], dtype='int8')
data['angle'] = data['angle'].reshape((-1,2))
The first column is athwardship angle, the second column is alongship angle, both in "count" -- i.e., angle conversion has to be implemented as a method in the model class.
This will have cascading changes in returning data with different range_bin
lengths and in convert/utils/set_groups_ek60.py
.
For the latter, change the split_beam_angle
into two data variables angle_alongship
and angle_athwartship
, each with coordinates ['frequency', 'ping_time', 'range_bin']
.
@ngkavin I added this as an issue so that we can keep track of it.
There are currently two README files:
These two files need to be merged and the ideal outcome will be:
Documentation on how to set bin size or slicing along range and ping time for get_MVBS
and remove_noise
is needed.
There is a bug in setting the Provenance group when converting to zarr. This bug is fixed in the repo and will be included in the next release. As a result it is not possible to install echopype from conda at the moment since the feedstock build currently fails.
Install using pip from the repo if you want to convert files into zarr.
Users sometimes need to over-write an existing .nc file that has been converted before. Currently the convert module will issue a warning and abort if there is already an existing file with the same filename as the raw binary file. This is a good default behavior but should give user this option.
Something like:
tmp = Convert(DATA_FILE)
tmp.raw2nc(overwrite=True) # default overwrite=False
We'll add a thin wrapper to the xarray plotting function to plot basic echograms (Sv or MVBS).
ping_time
) orColormap: colormap is an issue here. The fisheries acoustics community is used the so called EK500 colormap (below), and Matlab users are very much used to jet. We should promote colorblind-friendly colormaps. Magma seems to work well among the matplotlib choices.
EK500 colormap RGB values, from IMOS BASOOP. We should make this an option for plotting echogram.
function [EK500cmap]=EK500colourmap()
% EK500colourmap is the colour map used by EK500
EK500cmap = [255 255 255 % white
159 159 159 % light grey
95 95 95 % grey
0 0 255 % dark blue
0 0 127 % blue
0 191 0 % green
0 127 0 % dark green
255 255 0 % yellow
255 127 0 % orange
255 0 191 % pink
255 0 0 % red
166 83 60 % light brown
120 60 40]./255; % dark brown
end
Just work on "unpack_ek60.py", but fix load_ek60_raw() function so it succeeds with EK80 (define it as new function)
Apache or MIT?
When echopype repo was transferred from OWH2018 the issues did not come with it. They need to be manually transferred over and checked against the current issues. I also need to figure out how I can assign myself the issue.
I receive the following error trying to use the remove_noise function with AZFP data: ValueError: conflicting sizes for dimension 'frequency': length 838 on 'range' and length 4 on 'frequency'
I am currently using Echopype version: 0.3.0+2.gd494fd0 but also had this issue using 0.2.0
To reproduce error, please see attached code which is based on code from an echopype jupyter notebook.
TestNoise2.txt
Thanks!
Add calculation of "echo metrics" to process
.
Original code from the paper is here.
My thought on this is that the analysis
module should be able to operate based on the data model/netCDF so that it's independent from the data format. In theory the metrics are based on the calibrated Sv data (and maybe time) only so hopefully this is relatively straightforward.
Currently all files generated from raw binary file to netCDF conversion as well as calibration and MVBS calculation are saved in the same folder as the raw binary files. This is not very convenient to use. We should allow users to specify the path to save generated files.
For conversion, this can simply be:
tmp = Convert(RAW_FILE)
tmp.raw2nc(save_path='PATH_TO_SAVE_NC_FILE')
For Sv and MVBS calculation, this can probably be:
tmp = EchoData(NC_FILE)
tmp.calibrate(save=True, save_path='PATH_TO_SAVE_SV_FILE')
tmp.get_MVBS(save=True, save_path='PATH_TO_SAVE_MVBS_FILE')
Currently numpy is used for read/manipulate EK60 data, but it's not as efficient nor convenient to use compared to xarray and pandas.
Task:
Produce small data sets that contains only a couple pings for testing purposes (#4 travis-ci)
Need data from:
We should implement this:
source: https://aslenv.com/AZFP-data.html
The axes are depth, time of day, and dates -- just one more dimension than the flat echogram.
Let's also see how to relate this to multi-frequency and broadband data. :)
Potential solutions:
range_bin
dimension is different?range_bin
is different for each ping?This issue is from #6.
There's an issue with install requires, which is not really listing the dependencies for this package. So when performing pip install, it assumes you have all the dependencies already on your python environment. Otherwise, I can't use the package right away!
Line 29 in e8b735c
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.