dtscalibration / python-dts-calibration Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 17.0 57.89 MB

A Python package to load raw Distributed Temperature Sensing (DTS) files, perform a calibration, and plot the result.

Home Page: https://python-dts-calibration.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 99.89% Jupyter Notebook 0.11%

calibration dts python temperature

python-dts-calibration's People

Contributors

Stargazers

Watchers

Forkers

bschilperoort bdestombe bvdscheer mankoff antonchernutsky bozhang96 nushakoff fprice111 zrxdaly caizikun klapo paulomazzo paulo-mazzo davechris xiaohangji

python-dts-calibration's Issues

Updating the documentation with the newly published article

Describe the bug
Updating the documentation with the newly published article. Most likely, it contains better explanations than currently provided in the documentation. The calibration in this package is consistent with what is published in the article.

Are we allowed to copy full sections from the article, given the proper citation?

Additional context
des Tombe, B., Schilperoort, B., & Bakker, M. (2020). Estimation of Temperature and Associated Uncertainty from Fiber-Optic Raman-Spectrum Distributed Temperature Sensing. Sensors, 20(8), 2235. https://doi.org/10.3390/s20082235

Fix gamma not working for double ended calibration

When attempting to fix only the gamma parameter, an error is returned. When fixing no parameters, the calibration runs as expected.

The error claims there is a size conflict between 'alpha' and 'ST', but no fixed diff. attenuation was provided.

Metadata following netCDF CF-conventions

Is your feature request related to a problem? Please describe.
Stored data should be self-describing. The ability to store coordinates with a projection complying with netCDF conventions CF.

Following the standard allows for easy interaction with other programs

Describe the solution you'd like
Change the DataStore class to account for CF conventions.
See: http://cfconventions.org/cf-conventions/cf-conventions.html

Each measurement point should have a local coordinate (length along the cable) and a world coordinate (e.g., WGS84 coordinates)

Vertical temperature profiles can use 'timeSeriesProfile', chp 5.2

Describe alternatives you've considered
Adding metadata might be out of the scope of this project

Additional context
The load and save functions should continue to work.

Precision of the storage of world coordinates in the to_netcdf function should be increased.

Improve noise variance of (anti-) Stokes estimate

Describe the solution you'd like
Before taking the standard deviation of all residuals, subtract the ST.mean(dim='time') from ST.

Additional context
The temporal constant deviation from the decaying exponential is then not included in the estimation of the std.

Sensornet data inconsistent backward channel direction

For double ended Sensornet .ddf datafiles (Halo DTS v1.0), our test data has the backward channel reversed (strong signal near 0), and thus we flip the backward channel data.

For other devices (at least Oryx v4), the backward channel is already properly aligned, and is now needlessly flipped.

For proper support we will have to check the file versions, and use that to either flip the backward channel or not.

Add diverging colormap with 0 bias to residual plots

Suggestion from Karl. Change the colormap of the residuals to a diverging one, with a 0 value as center.

    # Normalize the color scale to have 0 be the center
    divnorm = colors.DivergingNorm(vmin=np.min(bath_lims),
                                   vcenter=0,
                                   vmax=np.max(bath_lims))

...
        # Use the normalized colormap with a zero value at the center.
        im = ax_map.pcolormesh(val.time, val.LAF, val.T,
                               cmap=plt.get_cmap('RdBu'),
                               norm=divnorm)

OLS calibration outputs calibration matrix to terminal

When calibrating single ended OLS, the calibration matrix and other info of the LSQR is printed to the terminal. This is not done for other calibrations.

Double-ended measurements made by an Oryx are not reversed

Describe the bug
The reversed Stokes and anti Stokes are not aligned with the Stokes and anti-Stokes of the forward direction.

A figure of Johannes showing the Stokes of a forward and of a backward measurement.

Expected behavior
It is expected that the reverse Stokes are strongest at the far end of the fiber.

Least significant saved digit is too strict

Describe the bug
Currently only the first three digits are saved when saving to a netCDF. This is too strict, as not only the stokes signals are cropped, but also the saved variables. Variables such as $\alpha$ and alpha (small values) are affected most.

To Reproduce

Expected behavior
set the scale and least significant digit per variable in dtscalibration.get_default_encoding(). Remain on the safe side. Only encode predefined variables, which have been checked to work. Don't apply encoding to the other variables.

Additional context
The encoding saves me a factor 5 in disk space (but causing problems) but worth it to investigate.

Reduce coefficient matrix size for double ended calibration.

Is your feature request related to a problem? Please describe.
Currently the shifted integrated differential attenuation is put into the coefficient matrix. Instead only do this for the shifted integrated differential attenuation along the reference sections. The other points along the fiber are not correlated to the other parameters and can be estimated separately.

Take the temporal mean of the shifted integrated differential attenuation as estimate of shifted integrated differential attenuation. Use the diagonal matrix of the variance of the shifted integrated differential attenuation as correlation matrix.

Check for NaNs matrix X in calibration

Is your feature request related to a problem? Please describe.
I am wondering whether it would be nice to have a check for NaNs in the matrix X when attempting to calibrate. One of my reference bath probes didn't quite reach the end of period for which I had DTS data. When I tried to calibrate, it gave me a vague error in scipy lsqr, which I eventually followed back to there being NaNs in my probe data. This took me a while, so I thought maybe it would help to warn the user before the crash and burn inside lsqr.

I saw that there were a bunch of checks already in wls_sparse for checking NaNs in the observations, the weights and x0 but not for the matrix X, so I figured why not also check for NaNs there?

Describe the solution you'd like
Check for NaNs in the matrix X in wls_sparse or some other way.

Describe alternatives you've considered
Be a better user, and actually check all your data before attempting calibration.

Example notebook where a DataStore is created from scratch

Creating a DataStore object from scratch (e.g. from numpy arrays) is currently possible, but it is unclear for people unfamiliar to the code how to do this. An example notebook going through this would be useful.

Based on Robert's question #52

Inconsistent behaviour of sections in single ended ols/wls

Describe the bug
When calibrating single ended OLS, the sections are passed as an argument in the function.
When calibrating single ended WLS, the sections have to be defined in the DataStore ('ds.sections').

Expected behavior
A consistent behaviour of how sections have to be passed to the calibration

Export to netCDF fails when Silixa xml files do not have a configuration description

Describe the bug
Export of a calibrated dataset to netCDF fails on an attribute. "TypeError: Invalid value for attr: None must be a number string, ndarray or a list/tuple of numbers/strings for serialization to netCDF files".

To Reproduce
Steps to reproduce the behavior:

Have a Silixa xml file without anything in the configuration description ""
Load the xml files
Attempt to export to netcdf

Expected behavior
'None' types should be replaced with an empty string, to allow export of dataset.

Temperature reduction by applying a user-defined function over a timeseries

Describe the solution you'd like
Something like:

def ufunc(time, temp, *args, **kwargs):
    # super complicated function
    ...
    return velocity

ds['velocity'] = ds.apply_user_func(func, time_label='time', temp_label='TMPW', ufunc_args=None, ufunc_kwargs=None)

where ds['velocity'] is a DataArray with dimension x (no time dimension). ufuncs_args and ufuncs_kwargs are directly passed to ufuncs. Normally one would do this by iteration using a for loop, and apply the ufunc to each point along the cable. Instead we could implement something like that, but in parallel, using chunks with Dask.

My question is, is this of interrest? And what additional features would you need yourself?

Docs are failing

Describe the bug
It is not possible to run sphinx on the Read-the-Docs server since 5 days.

Could not import extension matplotlib.sphinxext.only_directives (exception: No module named 'matplotlib.sphinxext.only_directives')

Coincidently, the error occurs since the use of matplotlib 3.0.0

To Reproduce
https://readthedocs.org/projects/python-dts-calibration/builds/7822893/

The timezone keyword argument in read_silixa_files is not working

Describe the bug
The timezone_input_files of read_silixa_files is overruled by the timezone used in xml-files.

To Reproduce
Steps to reproduce the behavior:
Run the first example notebook with a different timezone

Expected behavior
Honour the timezone argument

Additional context
The solution should maybe be sought in either deleting the argument.

keyword error when attempting plot_residuals_reference_sections

Describe the bug
When running dtscalibration.plot.plot_residuals_reference_sections(resid, sections=sections), python returns an 'unexpected keyword' error.

To Reproduce
Steps to reproduce the behavior:

Run plot_residuals_reference_sections
See error

Additional context
OS: Windows 7
Package version: just downloaded new version from github
xarray version: 0.10.9
matplotlib version: 2.1.2

Traceback (most recent call last):

  File "<ipython-input-28-e13673e52739>", line 1, in <module>
    calplot.plot_residuals_reference_sections(resid, sections=sections)

  File "D:\github\dtscal\dtscalibration\plot.py", line 67, in plot_residuals_reference_sections
    ax=main_ax, cbar_ax=cbar_ax, cbar_kwargs={'aspect': 20}, robust=robust)

  File "D:\Continuum\anaconda3\lib\site-packages\xarray\plot\plot.py", line 469, in __call__
    return plot(self._da, **kwargs)

  File "D:\Continuum\anaconda3\lib\site-packages\xarray\plot\plot.py", line 185, in plot
    return plotfunc(darray, **kwargs)

  File "D:\Continuum\anaconda3\lib\site-packages\xarray\plot\plot.py", line 771, in newplotfunc
    cbar = plt.colorbar(primitive, **cbar_kwargs)

  File "D:\Continuum\anaconda3\lib\site-packages\matplotlib\pyplot.py", line 2201, in colorbar
    ret = gcf().colorbar(mappable, cax = cax, ax=ax, **kw)

  File "D:\Continuum\anaconda3\lib\site-packages\matplotlib\figure.py", line 1864, in colorbar
    cb = cbar.colorbar_factory(cax, mappable, **kw)

  File "D:\Continuum\anaconda3\lib\site-packages\matplotlib\colorbar.py", line 1368, in colorbar_factory
    cb = Colorbar(cax, mappable, **kwargs)

  File "D:\Continuum\anaconda3\lib\site-packages\matplotlib\colorbar.py", line 946, in __init__
    ColorbarBase.__init__(self, ax, **kw)

TypeError: __init__() got an unexpected keyword argument 'aspect'

Varying length of Silixa .xml files

Sorry for the lengthy title. My problem might provide more context for this.

Due to cable stretch throughout the experiment the overall number of Stokes, aStokes etc. measurements increases with time (only very slightly, e.g. from 4000 to 4001). Whilst this is almost guaranteed in glaciological applications I can see it happening in longer term DTS installations in other environmental projects.

Unsurprisingly the Dask module throws an error when an increase in array dimensions occurs and I am not sure how to get around it. The optimal solution (feature 1) would be allowance for input of data files of varying length as part of e.g. read_silixa_files . A second solution (feature 2) is to create an empty DataStore of the required dimensions and fill in the data from np.arrays (e.g. ds['time'].values = date_np). In my case I have all the data in np.arrays from my previous processing script before I found out about this library so this would be easy to do. The dtscalibration library functions could then be used on the modified DataStore.

A quick fix to create a DataStore of required dimensions would be to fill a folder full of the correct number of .xml files of maximum length and then amend the values following this, but the other options would be of greater use to other users!

I am happy to put a lot of work into this, but I don't know quite where to begin.

Many thanks,

Rob

Sequential calibration with support for a priori estimates

Is your feature request related to a problem? Please describe.

Calibration of a large number of measurements can still requires a large amount of memory. The current suggested approach is to calibrate the parameters that remain constant over time on a subset of the measurements, and in step two, fix those constants and calibrate the time variant parameters in chunks.
Currently we are able to fix parameters with a certain variance, but all covariances to other parameters are neglected.

Describe the solution you'd like

Sequential calibration that allows for calibration in chunks
Bayesian flavoured least squares optimisation
In practice, this would be a sparse least squares solver that supports a p0_sol, and a p0_cov as a priori arguments

Describe alternatives you've considered
Fixing parameters works well, but neglecting covariance has downsides.

Additional context
Chapter 1 and 2 of John L. Crassidis and John L. Junkins. 2011. Optimal Estimation of Dynamic Systems, Second Edition (Chapman & Hall/CRC Applied Mathematics & Nonlinear Science) (2nd. ed.). Chapman & Hall/CRC.

Include more tests

Is your feature request related to a problem? Please describe.
Moving forward to include more tests. After correctly configuring codecov we have better insight in which lines of our code is tested and what not.
https://codecov.io/gh/dtscalibration/python-dts-calibration/tree/master/src/dtscalibration

I would suggest to test the plot functions in example notebooks. Maybe we should introduce a gallery section in the documentation?

Tox netcdf4/HDF5 issue

Describe the bug
When attempting to run tox (on any version), the tests fail due to a HDF5 issue related to installing netcdf4 version 1.5.0. When setting the version of netcf4 to <=1.4.3, the tests run fine.

This issue is showing up now due to netcf4 being version bumped to 1.5.0 a few days ago.

Additional context
OS: Windows 7
dtscalibration version: any

Possible fix
Set the version of netcf4 to <=1.4.3

change in align double-ended result between v0.8 and v1.0?

Describe the bug
I'm not sure whether this is a bug or just a question. I'm attempting to align a double-ended measurement. Something seems to have changed between versions 0.8 and 1.0 and I'm not sure what it is exactly. The following two plots were made using identical scripts (besides the label names for the stokes and anti-stokes data). Something weird seems to be happening in the function suggesting the best shifts to align the data. But I'm not really sure where to start looking for the issue. So any tips/pointers/help is greatly appreciated!

v0.8:

v1.0:

To Reproduce
Steps to reproduce the behavior (but this might be specific to my dataset):

Read in Sensornet files.
Shift data using suggested shift (with suggest_cable_shift_double_ended and shift_double_ended.
Plot two lines to check lining up of reverse and forward channels: (anti-)stokes / reverse (anti-)stokes.

Expected behavior
I'm thinking the result of the older version looks better. Also I think the new result is probably leading to other issues during calibration.

Additional context
dtscalibration versions: 0.8 vs 1.0
xarray: 1.15.1
OS: Ubuntu 20.04 (though the exact same results are achieved on Windows)

Acquisition time may be checking the wrong variable

Describe the bug
The example notebooks all have a perfectly uniform acquisition time. i.e., always a perfect round number of seconds. However, the Ultima and XT both have acquisition times that vary with something inside the instrument. That suggests to me that you are using the acquisitionTime reported in the channelConfiguration and not the acquisitionTime reported immediately after the data, which reports the actual duration of the measurement. If that is the case, then the acquisitionTime will always be uniform for a given configuration.

The reason this seems like a bug is that the datastore_utils.check_timestep_allclose check used by some functions allows for a 1% variation. The variation is not possible with the channelConfiguration acquistionTime.

If this is the intended behavior, then the check_timestep_allclose should actually be looking for non-zero differences in the acquisition time dimension as that would mean the user adjusted the time step of the configuration file.

To Reproduce
Not relevant for this case (I think?).

Expected behavior
I put the expected acquisition time into our xarray Datasets as a pandas time string, e.g., '3s' since the finalized datasets are interpolated to a uniform time stamp. I would love to be able to skip creating an acquisition time dimension filled with uniform numpy.timedeltas by providing this scalar. Similarly, I would like to be able to skip the check_timestep_allclose since this has posed some integration issues for me.

Alternatively, since the confidence intervals are sensitive to variations in the observing interval would it make more sense to check the time varying acquisitionTime?

I understand if this is not something you want to adjust.

shift_double_ended throws away unexpected data variables

Is your feature request related to a problem? Please describe.
I find this printed message unnecessary and clutters output.
shift_double_ended prints the message:
I dont know what to do with the following data [data_var]

An example of this behavior can be seen in the last cell of the example notebook 11Merge_single_measurements_into_double.ipynb

ds = ds.sel(x=slice(-10, cable_length + 10))

shift1, shift2 = suggest_cable_shift_double_ended(ds.isel(time=[0,-1]).compute(),
                                                  np.arange(-10, 10, 1, dtype=int))

ds = shift_double_ended(ds, shift1)

which then prints the above message a handful of times.

Describe the solution you'd like
Requiring the user to sanitize datastore objects to only have expected fields is totally acceptable, but since the documentation states that it will only return the back scatter intensities the printed message is unnecessary.

Describe alternatives you've considered
My preference would be that shift_double_ended accepts a list of forward and reverse datavariables or allows the use of a dimension labeling the channel so I can shift an arbitrary variable. For instance aligning the single-ended calibrated temperatures for a double-ended configuration could have some utility.

Additional context
I can reproduce this behavior on version 1.0.0. for my own examples.

Drop 32-bit Windows tests

Describe the bug
Currently, the automated 32-bit tests on the Windows platform, ran by Appveyor, are failing, #80 . It is quite annoying to debug because I don't have any 32-bit computer laying around.

I am not sure of any computer currently being sold that is 32-bit, except for the raspberry pi. Furthermore, since we are using a coding language that is quite far away from all the bits and bytes, 32-bit / 64-bit should be not of our concern.

What do you think?

To Reproduce
What I think is going on:

Run 32-bit Python on windows 64-bit, on a 64-bit machine.
Your linear algebra libraries are then running natively in 64 bit
Python works in 32-bit
Scipy get confused

Expected behavior
See 64-bit Python behavior

Additional context
Seems to be related to scipy/scipy#4524

ds.variance_stokes_exponential returns empty array

When trying out the different variance estimations, variance_stokes_exponential seems to return an array with shape () for var_I. The residual DataArray is of the shape of ds.st, and does contain correct data.

Cable shift suggestion uncertain with 'smooth' Stokes data

If the Stokes signal is too smooth, the cable shift suggestion does not seem to work properly.
~~Perhaps using the Anti-Stokes data would help with accurately suggesting a cable shift.~~

An image to illustrate the problem is shown below.

Loading the calibration configuration from the .xml files

Do we want to load the calibration configuration from the xml files? Or should we discourage it, and leave it to the user to come up with a sensible configuration?

Subaxes in residuals plot function do not share axes

Describe the bug

The left axes do not share the same x-axis. Furthermore, my preference would be to flip the x-axis, so that it increases to the left.

To Reproduce
With the following function: from dtscalibration.plot import plot_residuals_reference_sections

CI tests fail (even for older versions)

The Travis & AppVeyor tests currently fail for all commits. When re-running a previously passing test, it now fails.

I am unable to reproduce the bugs locally, and tox tells me nothing is wrong.

Superfluous/non functioning code in calibration_single_ended

@bdestombe I was looking through the single-ended calibration routine to add a check for st_var and ast_var (if they're not None if the calibration is wls), and I noticed this block of code.

As far as I know it does nothing as i_var is not used anywhere else. Has this code been superseded or is this a bug? I think it's not functioning and not needed, but I'm just want to be sure.

python-dts-calibration/src/dtscalibration/datastore.py

Lines 1520 to 1528 in c96247b

    
           for i_var, i_label in zip( 
        
                   [st_var, ast_var], ['st', 'ast']): 
        
               if callable(i_var): 
        
                   ix_sec = self.ufunc_per_section(x_indices=True, 
        
                                                   calc_per='all') 
        
                   i_var = i_var(self[i_label].isel(x=ix_sec)).values 
        
               else: 
        
                   i_var = np.asarray(i_var, dtype=float)

Allow support for calibrating chunks of files

When calibrating larger datasets (>1000 files), RAM is an issue. With the single mode WLS memory errors occur quite quickly.

To avoid this, data could be calibrated in chunks. However, the read_xml_dir function asks for a path to the data, and not a selection of files. With support for a selection of files, people can write their own routine to calibrate chunks of files.

Support for Ultima 4.x xml files

Add support for older software versions of the Ultima-DTS (specifically v4.x)

Error on file read with read_silixa_files

Describe the bug
When attempting to load the double-ended test files, the following error is returned;

TypeError: Cannot convert tz-naive timestamps, use tz_localize to localize

To Reproduce
Steps to reproduce the behavior:

Install new enviroment
Try to use read_silixa_files

Additional context
Might be related to the pandas version...
Versions;

Python: 3.7.1
Pandas: 0.23.4
xarray: 0.11.3
dtscalibration: 0.6.2

resample_dataset switches dimensions around

After loading in measurement data, the order of the dimensions is (x, time).
When running resample_dataset, this changes to (time, x).

Not having the dimensions in the right order breaks the following functions;

ds.temperature_residuals
ds.variance_stokes

Notebook 04 on stokes variance is outdated

Since the addition of the different types of estimations of the stokes variance (constant/exponential/linear) the notebook on providing signal variance estimates for the calibration remained unchanged.

I think it has to be updated and clearly explain the differences, and give advice to the users for which estimation to pick.

Binder: Error in getting sections

When trying to run the WLS double ended calibration notebook in Binder, the residual calculation fails.
There seems to be a yaml problem. Trying to request 'ds.sections' returns an error.

Many of the Binder notebooks seem to be affected by this as well.

Error in doubleended calibration with wls solver

Describe the bug
I can successfully calibrate my double-ended dataset using the ols solver but not the wls solver. Here is the double-ended Datastore object:

Sections:
    coldprobe              (  9.31 +/- 0.01°C)	16.70 - 18.50
    warmprobe              (  6.72 +/- 0.00°C)	12.90 - 14.68 and 913.91 - 915.82
Dimensions:                (time: 197, x: 14432)
Coordinates:
  * time                   (time) datetime64[ns] 2020-03-10T04:00:06 ... 2020-03-10T04:10:00
    channel                int64 1
  * x                      (x) float64 0.068 0.195 0.322 ... 1.834e+03 1.834e+03
    cold                   (x) object dask.array<chunksize=(14432,), meta=np.ndarray>
    calibration            (x) object dask.array<chunksize=(14432,), meta=np.ndarray>
    warm                   (x) object dask.array<chunksize=(14432,), meta=np.ndarray>
Data variables:
    warmprobe              (time) float64 dask.array<chunksize=(97,), meta=np.ndarray>
    coldprobe              (time) float64 dask.array<chunksize=(97,), meta=np.ndarray>
    userAcquisitionTimeFW  (time) datetime64[ns] 2020-03-10T04:00:06 ... 2020-03-10T04:10:00
    userAcquisitionTimeBW  (time) datetime64[ns] 2020-03-10T04:00:00 ... 2020-03-10T04:09:54
    st                     (x, time) float64 dask.array<chunksize=(14432, 97), meta=np.ndarray>
    ast                    (x, time) float64 dask.array<chunksize=(14432, 97), meta=np.ndarray>
    rst                    (x, time) float64 1.517e+03 1.52e+03 ... 4.169e+03
    rast                   (x, time) float64 958.5 956.9 ... 2.906e+03 2.908e+03
Attributes:
    _sections:      coldprobe:\n- !!python/object/apply:builtins.slice\n  - 1...
    isDoubleEnded:  1

I can successfully execute

double.calibration_double_ended(sections=sections,
                                store_tmpw='tmpw',
                                method='ols')

to yield

Sections:
    coldprobe              (  9.31 +/- 0.01°C)	16.70 - 18.50
    warmprobe              (  6.72 +/- 0.00°C)	12.90 - 14.68 and 913.91 - 915.82
Dimensions:                (mc: 50, params1: 14827, time: 197, x: 14432)
Coordinates:
  * time                   (time) datetime64[ns] 2020-03-10T04:00:06 ... 2020-03-10T04:10:00
    channel                int64 1
  * x                      (x) float64 0.068 0.195 0.322 ... 1.834e+03 1.834e+03
    cold                   (x) object dask.array<chunksize=(14432,), meta=np.ndarray>
    calibration            (x) object dask.array<chunksize=(14432,), meta=np.ndarray>
    warm                   (x) object dask.array<chunksize=(14432,), meta=np.ndarray>
  * mc                     (mc) int64 0 1 2 3 4 5 6 7 ... 43 44 45 46 47 48 49
Dimensions without coordinates: params1
Data variables:
    warmprobe              (time) float64 dask.array<chunksize=(97,), meta=np.ndarray>
    coldprobe              (time) float64 dask.array<chunksize=(97,), meta=np.ndarray>
    userAcquisitionTimeFW  (time) datetime64[ns] 2020-03-10T04:00:06 ... 2020-03-10T04:10:00
    userAcquisitionTimeBW  (time) datetime64[ns] 2020-03-10T04:00:00 ... 2020-03-10T04:09:54
    st                     (x, time) float64 dask.array<chunksize=(14432, 97), meta=np.ndarray>
    ast                    (x, time) float64 dask.array<chunksize=(14432, 97), meta=np.ndarray>
    rst                    (x, time) float64 1.517e+03 1.52e+03 ... 4.169e+03
    rast                   (x, time) float64 958.5 956.9 ... 2.906e+03 2.908e+03
    gamma                  float64 444.2
    alpha                  (x) float64 0.007955 0.005639 ... -0.1873 -0.2025
    df                     (time) float64 1.288 1.288 1.287 ... 1.286 1.287
    db                     (time) float64 1.195 1.195 1.196 ... 1.193 1.192
    gamma_var              float64 2.946
    alpha_var              (x) float64 7.136e-06 5.804e-06 ... 7.185e-06
    df_var                 (time) float64 3.77e-05 3.77e-05 ... 3.769e-05
    db_var                 (time) float64 3.77e-05 3.77e-05 ... 3.769e-05
    tmpf                   (x, time) float64 dask.array<chunksize=(14432, 97), meta=np.ndarray>
    tmpb                   (x, time) float64 -3.249 -3.983 ... -20.17 -19.99
    gamma_mc               (mc) float64 444.1 444.7 445.1 ... 442.9 444.6 443.4
    df_mc                  (mc, time) float64 1.288 1.288 1.286 ... 1.284 1.283
    db_mc                  (mc, time) float64 1.194 1.196 1.196 ... 1.19 1.19
    alpha_mc               (mc, x) float64 0.008277 0.000818 ... -0.1901 -0.2043
    tmpw                   (x, time) float64 dask.array<chunksize=(14432, 97), meta=np.ndarray>
    p_val                  (params1) float64 444.2 1.288 ... -0.1873 -0.2025
Attributes:
    _sections:      coldprobe:\n- !!python/object/apply:builtins.slice\n  - 1...
    isDoubleEnded:  1

However, when I use the wls solver:

double.calibration_double_ended(sections=sections,
                                store_tmpw='tmpw',
                                method='wls')

I get the error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-41-8e89c7f1507f> in <module>
     40 double.calibration_double_ended(sections=sections,
     41                                 store_tmpw='tmpw',
---> 42                                 method='wls')
     43 
     44 print('===========================================')

~/anaconda3/envs/dts-calibration/lib/python3.7/site-packages/dtscalibration/datastore.py in calibration_double_ended(self, sections, st_var, ast_var, rst_var, rast_var, store_df, store_db, store_gamma, store_alpha, store_ta, store_tmpf, store_tmpb, store_tmpw, tmpw_mc_size, store_p_cov, store_p_val, variance_suffix, method, solver, p_val, p_var, p_cov, remove_mc_set_flag, reduce_memory_usage, transient_asym_att_x, fix_gamma, fix_alpha, matching_sections, matching_indices, verbose, **kwargs)
   2590                 da_random_state=None,
   2591                 remove_mc_set_flag=remove_mc_set_flag,
-> 2592                 reduce_memory_usage=reduce_memory_usage)
   2593 
   2594         elif store_tmpw and method == 'ols':

~/anaconda3/envs/dts-calibration/lib/python3.7/site-packages/dtscalibration/datastore.py in conf_int_double_ended(self, p_val, p_cov, store_ta, st_var, ast_var, rst_var, rast_var, store_tmpf, store_tmpb, store_tmpw, store_tempvar, conf_ints, mc_sample_size, var_only_sections, da_random_state, remove_mc_set_flag, reduce_memory_usage, **kwargs)
   3541 
   3542             else:
-> 3543                 st_vari_da = da.from_array(st_vari, chunks=memchunk[1:])
   3544 
   3545             self[k] = (

~/anaconda3/envs/dts-calibration/lib/python3.7/site-packages/dask/array/core.py in from_array(x, chunks, name, lock, asarray, fancy, getitem, meta)
   2746 
   2747     chunks = normalize_chunks(
-> 2748         chunks, x.shape, dtype=x.dtype, previous_chunks=previous_chunks
   2749     )
   2750 

AttributeError: 'NoneType' object has no attribute 'shape'

To Reproduce
Steps to reproduce the behavior:

Example notebook 11.
Create a new cell at the end of the notebook after running it
Insert:

# Just make up some sections. LAFs Copied from a random notebook.
sections = {
    'probe1Temperature': [slice(7.5, 17.), slice(70., 80.)],  # cold bath
    'probe2Temperature': [slice(24., 34.), slice(85., 95.)],  # warm bath
    }
ds.sections = sections

st_var, resid = ds.variance_stokes(st_label='st')
ast_var, _ = ds.variance_stokes(st_label='ast')
rst_var, _ = ds.variance_stokes(st_label='rst')
rast_var, _ = ds.variance_stokes(st_label='rast')

st_var=st_var,
ast_var=ast_var,
rst_var=rst_var,
rast_var=rast_var,

ds.calibration_double_ended(sections=sections,
                                st_var=st_var,
                                ast_var=ast_var,
                                rst_var=rst_var,
                                rast_var=rast_var,
                                store_tmpw='tmpw',
                                method='wls')

See error

Expected behavior
I am unsure if I am doing something fundamentally wrong for the wls method.

Additional context
The ols method works using the code described in the steps to reproduce.

(edited to refer to example notebook 11)

"uncontrolled noise" check in calibration a little overzealous?

When attempting to run the calibration_single_ended on the same dataset briefly shown in issue #81, I trigger the uncontrolled noise error:

ultima_flyfox_raw_ds.calibration_single_ended(st_label='Ps',
                                              ast_label='Pas',
                                              method='ols')

yields

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-44-99ef32957281> in <module>
      1 ultima_flyfox_raw_ds.calibration_single_ended(st_label='Ps',
      2                                               ast_label='Pas',
----> 3                                               method='ols')

/anaconda/envs/dts-calibration/lib/python3.7/site-packages/dtscalibration/datastore.py in calibration_single_ended(self, sections, st_label, ast_label, st_var, ast_var, store_c, store_gamma, store_dalpha, store_alpha, store_tmpf, store_p_cov, store_p_val, variance_suffix, method, solver, p_val, p_var, p_cov, fix_gamma, fix_dalpha)
   1423         assert not np.any(
   1424             self[st_label] <= 0.), \
-> 1425             'There is uncontrolled noise in the ST signal'
   1426         assert not np.any(
   1427             self[ast_label] <= 0.), \

AssertionError: There is uncontrolled noise in the ST signal

The issue is that the internal coil of the ultima allows for non-positive stokes and anti-stokes values, as shown below.

I would suggest changing the check to only look at positive LAF (or x) values so that someone doesn't naively trigger the error and the phrase "uncontrolled noise" would be more descriptive.

Loading in files fails when an array of files is provided instead of a list

Providing an array of filepaths used to work. Not it fails with;
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

as in the script this test is run;
if not filepathlist:
filepathlist = sorted(glob.glob(os.path.join(directory, file_ext)))
which does work for lists, but not for arrays.

Expected behavior
Arrays and lists should both be accepted as input of filepaths

Performing WLS calibration on a DataStore that has been averaged across time

I think this would be most accurately described as a bug, but I could be misunderstanding. Basically, whilst ds.calibration_double_ended and ds.conf_int_double_ended function correctly when the DataStore is comprised of only one DTS file, an error is thrown if using a DataStore that has previously been average through ds.mean(dim='time', keep_attrs=True).

As far as I can tell, this is because the DataStore looses its time dimension, though it does still represent a time even if just an average one. I have a work around where I take everything out of the averaged DataStore and create a new one with a time dimension which does work, but built in capability might help others avoid this problem.

Kind regards,

Rob

Adding supported filetypes/DTS sensors to the readme

To make it clear for people who are linked or stumble upon the package, it could be nice to list all the machines currently supported. e.g.:

Devices currently supported:

Silixa Ltd.
- Ultima .xml files (up to version 7.0)
- XT-DTS .xml files (up to version 7.0)
Sensornet Ltd.
- Oryx .ddf files
- Halo .ddf files
AP Sensing GmbH
- single ended .xml files
- .tra files can not supported as they lack the raw signal

Varying gain parameter between double ended channels.

In double ended measurements with a longer integration time per channel, the measurement device's properties can vary between the measurement of each channel. E.g. the gain used in the amplification circuit varies between the forward and backward measurement.

This variation is currently assumed to be 0, and the same gain correction is used for both channels. This results in a differing and incorrect calibrated temperature.

See residual plots below, the difference in 'gain' occurs at ~4:30

Cannot calculate conf. ints. in a double-ended calibration with trans.att.

After successfully calibrating a double-ended dataset with a transient attenuation location added, there is an assertion error when trying to calculate the confidence intervals.

Apparently p_val is not of the expected size

Calculating the confidence intervals works fine when not adding the trans. att.

vmin, vmax in plot_residuals_reference_sections

Describe the bug
In plot_residuals_reference_sections there is a bit of code that sets vmin to -vmax if vmin is less than 0:

python-dts-calibration/src/dtscalibration/plot.py

Lines 170 to 172 in c85d615

    
           # vmin cannot be below 0 for the diverging cmap to work 
        
           if vmin < 0.: 
        
               vmin = -vmax

This causes an error when both vmin and vmax are negative (I'm sure that also means the result isn't too good, but anyway). In that situation, vmin becomes positive and vmax is negative causing an error in colors.DivergingNorm. It probably also fails if both are positive...

And another thing, wouldn't it make more sense to use the largest of the two to set up your colormap? Now vmax determines what the maximum value should be. Maybe something along these lines would be nicer:

maxv = np.max(np.abs([vmin, vmax]))
vmin = -maxv
vmax = maxv

To Reproduce
Plot using plot_residuals_reference_sections with residuals in which the quantiles 0.02 and 0.98 are both negative.

Expected behavior
Don't crash when someone tries that?

Calibration fails when not removing 0's from stokes/antistokes values

In the first dozens of meters of measurements (from -80) with the Ultima-S DTS, the xml files contain 0 values for stokes/antistokes intensity. This will cause the calibration to fail even though these values are not in calibration sections.

requires updated xarray version

Describe the bug
The hasattr(self, '_sections') assertion in datastore.py generates an error for older versions of xarray that work just fine in version 0.15.1. I've tested that this for sure happens in version 0.12 but don't know at which version it works again.

To Reproduce
Steps to reproduce the behavior:
First: conda install xarray=0.12 and downgrade
Here is a minimum working example

import dtscalibration
ds_raw  = xr.open_dataset('original_data.nc') # This is just a single xml converted to nc
dstore = dtscalibration.DataStore(ds_raw)
dstore = dstore.rename({'LAF': 'x'})
# The error only occurs if you assign something to attrs before setting the sections
dstore.attrs = {'dt': '3s'}
dstore.sections = {'probe1Temperature': [slice(0, 10)]}
print(dstore)

with version 0.12.1 of xarray this yields AssertionError: first set the sections. With version 0.15.1 of xarray this code executes without a problem and builds the _sections attribute.

Expected behavior
I'm really not sure. I think the order of operations I used should have generated an error regardless of xarray version.

Intensity-dependent variance of the noise in (anti-) Stokes measurements

In the estimation of the variance of the noise in the measurements, the noise is not assumed to be intensity-dependent. Due to the avalance photodiode, there is an intensity dependent component of the noise in the Stokes & Anti-Stokes measurements.
This improvement will mainly be necessary for long fiber optic cable setups.
(See dissertation Bas des Tombe, Appendix A)

DataStore object assumes dimension names, creating errors

I am inserting externally observed bath temperatures into a DataStore object. The DataStore object looks like this:

# Convert the existing xarray object to a DataStore object.
ultima_flyfox_raw_ds = dtscal.DataStore(ultima_flyfox_raw)
print(ultima_flyfox_raw_ds)

which gives

<dtscalibration.DataStore>
Sections:                  ()
Dimensions:  (LAF: 17956, time: 3857)
Coordinates:
  * LAF      (LAF) float64 -82.28 -82.16 -82.03 ... 2.2e+03 2.2e+03 2.2e+03
  * time     (time) datetime64[ns] 2019-07-18T03:10:02.892000 ... 2019-07-18T06:53:58.880000
Data variables:
    Ps       (time, LAF) float64 dask.array<shape=(3857, 17956), chunksize=(86, 17956)>
    Pas      (time, LAF) float64 dask.array<shape=(3857, 17956), chunksize=(86, 17956)>
    temp     (time, LAF) float64 dask.array<shape=(3857, 17956), chunksize=(86, 17956)>
Attributes:
    _sections:  null\n...\n

Then I try to assign the reference sections, following example notebook#3

# Example for github
probe_section = [slice(154.336, 158.343),
                   slice(2161.0, 2164.0)]
sections = {
            'warmprobe':    probe_section,  # warm bath
            }
ultima_flyfox_raw_ds.sections = sections

This results in the following error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/anaconda/envs/dts-calibration/lib/python3.7/site-packages/xarray/core/dataset.py in _construct_dataarray(self, name)
    975         try:
--> 976             variable = self._variables[name]
    977         except KeyError:

KeyError: 'x'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-8-c17af5536441> in <module>
     17             probe2_name:    probe2_sections,  # cold bath
     18             }
---> 19 ultima_flyfox_raw_ds.sections = sections

/anaconda/envs/dts-calibration/lib/python3.7/site-packages/xarray/core/common.py in __setattr__(self, name, value)
    191                     "style assignment (e.g., `ds['name'] = ...`) instead to "
    192                     "assign variables." % (name, type(self).__name__))
--> 193         object.__setattr__(self, name, value)
    194 
    195     def __dir__(self):

/anaconda/envs/dts-calibration/lib/python3.7/site-packages/dtscalibration/datastore.py in sections(self, sections)
    238 
    239                     x_dim = self.get_x_dim()
--> 240                     assert self[x_dim].sel(x=vi).size > 0, \
    241                         f'Better define the {k} section. You tried {vi}, ' \
    242                         'which is out of reach'

/anaconda/envs/dts-calibration/lib/python3.7/site-packages/xarray/core/dataset.py in __getitem__(self, key)
   1056 
   1057         if hashable(key):
-> 1058             return self._construct_dataarray(key)
   1059         else:
   1060             return self._copy_listed(np.asarray(key))

/anaconda/envs/dts-calibration/lib/python3.7/site-packages/xarray/core/dataset.py in _construct_dataarray(self, name)
    977         except KeyError:
    978             _, name, variable = _get_virtual_variable(
--> 979                 self._variables, name, self._level_coords, self.dims)
    980 
    981         needed_dims = set(variable.dims)

/anaconda/envs/dts-calibration/lib/python3.7/site-packages/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes)
     80         ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
     81     else:
---> 82         ref_var = variables[ref_name]
     83 
     84     if var_name is None:

KeyError: 'x'

To Reproduce
Steps to reproduce the behavior:

Go to 03Define_sections.ipynb
Go to the fourth cell (the one where you assign sections to ds)
Insert this line prior to assigning sections: ds = ds.rename({'x': 'LAF'})
Run all cells

Expected behavior
I would expect to be able to use arbitrary dimension names or to have a check warning the user that the dimension name must be x. It looks like to do this I just need to change
assert self[x_dim].sel(x=vi).size > 0 to assert self[x_dim].sel({x_dim: vi}).size > 0

Additional context
I just updated to version 0.8, so this should be a current issue.

	for i_var, i_label in zip(
	[st_var, ast_var], ['st', 'ast']):
	if callable(i_var):
	ix_sec = self.ufunc_per_section(x_indices=True,
	calc_per='all')
	i_var = i_var(self[i_label].isel(x=ix_sec)).values

	else:
	i_var = np.asarray(i_var, dtype=float)

	# vmin cannot be below 0 for the diverging cmap to work
	if vmin < 0.:
	vmin = -vmax

dtscalibration / python-dts-calibration Goto Github PK

python-dts-calibration's People

Contributors

Stargazers

Watchers

Forkers

python-dts-calibration's Issues

Recommend Projects

Recommend Topics

Recommend Org