cschoel / nolds Goto Github PK
View Code? Open in Web Editor NEWNonlinear measures for dynamical systems (based on one-dimensional time series)
License: MIT License
Nonlinear measures for dynamical systems (based on one-dimensional time series)
License: MIT License
The CI pipeline is currently broken. This concerns Python 2.7 specifically and apparently some of the actions we use.
@v2
). They might need to be updated.I'm not sure what is wrong with this code but if you perform
nolds.hurst_rs(numpy.random.normal(0,1,1000))
you consistently get hurst exponents larger than .5 which should not be the case...
The code calculating metrics for the Lorenz system of ODEs in examples.py
already contains examples for all major nolds algorithms. However, I want to double-check if all the results make sense or if there is still something to tweak with the parameters. For example, #25 has shown that some changes might be required for corr_dim
.
lag
for corr_dim
.tau
parameter for Lorenz system closer to values in Grassberger Procaccia 1983.Hi Christopher,
Thanks for doing nolds package.
In nolds 0.5.1, corr_dim ignores fit argument when it does poly_fit (line 1373). Fit argument should be passed to poly_fit call to match with function description.
When using the Hurst Exponent function, I got the following, numpy related warning:
FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead
I'm trying to compute the entropy and the fractal dimension of human physiological signals (EEG and ECG). However, I don't know on what basis to choose the embedding dimension. Especially in corr_dim
where emb_dim has no defaults.
What defaults do you suggest? Or on what basis shoud we chose those values? Thanks!
Hi Christopher,
Thanks very much for the nolds package.
I'm very new to DFA and I am trying to interpret the return value. I have seen others use DFA and report what they call an "exponent" that ranges from 0-1.
For example, in this Frontiers article you can see the following figure that applies DFA to EEG.
Do you happen to know how to get from the value that nolds.dfa
returns, to this so-called "exponent"? It seems to be the slope within a selected window. I'm sure I'm missing something straightforward but I'm still trying to wrap my head around this measure.
Nolds already supports more algorithms than are advertised in the README.
mfhurst_b
.mfhurst_dm
.The function lyap_e
is reliable when it comes to the sign of the largest Lyapunov exponent, but not so much when it comes to the absolute size. Changing the parameters emb_dim
, matrix_dim
, min_nb
, and min_tsep
can give wildly different ranges of values.
This could be due to an error in the normalization regarding the number of matrices created, dimensions, and tau. I went through the code and paper again, but could not find anything that would support that. The other explanation could be that the higher exponents that turn up are actually "spurious" exponents that are also described in the original paper.
To investigate this, one would first need a good reference using the algorithm that mentions explicit settings for all the above parameters as well as the input data. The examples in the original paper unfortunately omit some of these. ๐
โน๏ธ I don't plan to take this issue on myself, but only leave it here in case someone is interested.
lyap_e
and lists all parameters used.Since the example using the Lorenz system in examples.py
has reference values for all major nolds algorithms, it would be great to use those in the unit tests.
sampen
.corr_dim
.lyap_r
.lyap_e
.hurst_rs
.dfa
.Hi,
I have a time series (data files attached below) that I suspect could be chaotic (plot below), and I'm trying to use nolds to compute its Lyapunov exponent. However, it gives me a -inf (with runtime warnings). Am I doing something silly here?
Code to produce the plot:
import numpy as np
time = np.loadtxt('times_for_test_time_series.txt')
signal = np.loadtxt('test_time_series.txt')
pl.plot(time, signal - np.mean(signal))
pl.ylabel("Voltage")
pl.xlabel("Time (ps)")
Code to compute Lyapunov exponent:
import nolds
nolds.lyap_r(signal)
Hi,
I checked three implement include of nodls, the result are quite differents.
My target is labeling the mean-reversion
and trending
ranges, I have no clue for the output of nolds.hurst_rs (all > 0.9).
testing code
np.random.seed(42)
random_changes = 1. + np.random.randn(99999) / 1000.
series = np.cumprod(random_changes) # create a random walk from random changes
# 1
from hurst import compute_Hc, random_walk
H, c, data = compute_Hc(series, kind='price', simplified=True)
print(H)
# 2
def genhurst(S,q=2):
'''
from https://github.com/PTRRupprecht/GenHurst/blob/master/genhurst.py
'''
L=len(S)
# if L < 100:
# warnings.warn('Data series very short!')
H = np.zeros((len(range(5,20)),1))
k = 0
for Tmax in range(5,20):
x = np.arange(1,Tmax+1,1)
mcord = np.zeros((Tmax,1))
for tt in range(1,Tmax+1):
dV = S[np.arange(tt,L,tt)] - S[np.arange(tt,L,tt)-tt]
VV = S[np.arange(tt,L+tt,tt)-tt]
N = len(dV) + 1
X = np.arange(1,N+1,dtype=np.float64)
Y = VV
mx = np.sum(X)/N
SSxx = np.sum(X**2) - N*mx**2
my = np.sum(Y)/N
SSxy = np.sum( np.multiply(X,Y)) - N*mx*my
cc1 = SSxy/SSxx
cc2 = my - cc1*mx
ddVd = dV - cc1
VVVd = VV - np.multiply(cc1,np.arange(1,N+1,dtype=np.float64)) - cc2
mcord[tt-1] = np.mean( np.abs(ddVd)**q )/np.mean( np.abs(VVVd)**q )
mx = np.mean(np.log10(x))
SSxx = np.sum( np.log10(x)**2) - Tmax*mx**2
my = np.mean(np.log10(mcord))
SSxy = np.sum( np.multiply(np.log10(x),np.transpose(np.log10(mcord)))) - Tmax*mx*my
H[k] = SSxy/SSxx
k = k + 1
mH = np.mean(H)/q
return mH
print(genhurst(series, 2))
# 3
import nolds
print(nolds.hurst_rs(series))
I also tested with rolling window 200 , second one seems most correctly ( contain values below 0.5 )
def hurst1(arr):
return genhurst(arr)
def hurst2(arr):
return compute_Hc(arr, kind='price', simplified=True)[0]
def hurst3(arr):
return nolds.hurst_rs(arr, fit='poly')
Hi Christopher,
I'd like to extend my gratitude for developing such a library. I've chosen to raise a topic in the issues section, anticipating that others may also find it relevant. I aim to utilize your package to calculate the correlation dimension of various dynamical systems. However, to have confidence in the outcomes, I need to be able to replicate the commonly referenced dimension value of
d=2.05ยฑ0.01 for the Lorenz attractor with (sigma, rho, beta)=(10, 28, 8/3). Ideally, my goal is to generate a figure akin to the one attached. I find it perplexing that most packages, despite claims of this capability, seldom demonstrate this value in their documentation or examples. Could you provide any guidance or examples that successfully achieve this benchmark?
Thanks for your time,
Ricardo
Hi. I installed your library (python3) and wrote a basic script to calculate the correlation dimension, plugging in some random parameters that came into my mind
import numpy as np
from nolds.measures import corr_dim
data = np.random.normal(0,3,1000))
corr_dim(data, emb_dim=5, debug_plot=True)
I get an answer 0.01. What does that mean? I thought that the correlation dimension is supposed to estimate the true dimension in which the points are located. Since an uncorrelated gaussian blob should have full rank, I expected a result somewhere close to the embedding dimension, namely, 5 in this case. Am I doing something wrong? When I try higher embedding dimensions, like 10, I get answers 10^{-15}, is this expected? Also, do you have a version of this algorithm that works for arbitary dimensions?
Below I provide an algorithm I have written myself to calculate correlation dimension for arbitrary dimension inputs (no embedding dimension, just providing 2D arrays as inputs). It behaves more like what I expect, but still underestimates the true dimension by a factor ~2. How stable are such estimators in general?
# Return flat 1D array of matrix off-diagonal entries
def offdiag_1D(M):
idxs = ~np.eye(M.shape[0], dtype=bool)
return M[idxs]
def corr_dim_homebrew(data2D):
# Get all pairwise euclidean distances in N-dimensional scale
nPoint, nDim = data2D.shape
data3D = np.repeat(data2D[..., None], nPoint, axis=2).transpose((0,2,1))
diff = offdiag_1D(np.linalg.norm(data3D - data3D.transpose((1,0,2)), axis=2))
# Sort distances, calculate correlation integral
diffSorted = np.sort(diff)
corrIntegral = np.arange(1, len(diff)+1) / nPoint**2
# Convert to log scale
diffSortedLog = np.log10(diffSorted)
corrIntegralLog = np.log10(corrIntegral)
# Fit polynomial
coeff = np.polyfit(diffSortedLog, corrIntegralLog, 1)
plin = lambda x, c: (x, c[1] + c[0]*x)
plt.figure()
plt.plot(diffSortedLog, corrIntegralLog)
plt.plot(*plin(diffSortedLog, coeff), 'r--' )
print("Estimated dimension", coeff[0])
plt.show()
corr_dim_homebrew(aaa[...])
Best,
Aleksejs
I get some errors for time series of different lengths. As far as I can see, there aren't any warnings in the documentation about these.
For example, for lag=5
and min_tsep=10
, any time series that are shorter than 45 elements long gets the following error:
>>> nolds.lyap_r(np.asarray(time_series[:45]), lag=5, min_tsep=10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 280, in lyap_r
orbit = delay_embedding(data, emb_dim, lag)
File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 106, in delay_embedding
raise ValueError(msg.format(len(data), emb_dim, lag))
ValueError: cannot embed data of length 45 with embedding dimension 10 and lag 5
However, until 54 elements, a different error is given:
>>> nolds.lyap_r(np.asarray(time_series[:54]), lag=5, min_tsep=10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 293, in lyap_r
nb_idx = np.argmin(dists[:ntraj, :ntraj], axis=1)
File "/pyenv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1019, in argmin
return _wrapfunc(a, 'argmin', axis=axis, out=out)
File "/pyenv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: attempt to get argmin of an empty sequence
For lag=4
and min_tsep=8
, the thresholds are 36 to 55, however 45 generates no errors. Also, for some time-series lengths, I get another error:
>>> nolds.lyap_r(np.asarray(time_series[:54]), lag=4, min_tsep=8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 300, in lyap_r
div_traj_k = dists[indices]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (17,)
What are the allowed timeseries lenghts?
Also, how is the lyapunov exponent defined for a timeseries which is [263, 267, 267, 267, ...]
? Currently I get this value:
>>> nolds.lyap_r(np.asarray(time_series[60:-60]), lag=4, min_tsep=8)
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/nolds/measures.py:75: RuntimeWarning: RANSAC did not reach consensus, using numpy's polyfit
RuntimeWarning)
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/numpy/lib/polynomial.py:584: RuntimeWarning: invalid value encountered in true_divide
lhs /= scale
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/nolds/measures.py:76: RankWarning: Polyfit may be poorly conditioned
coef = np.polyfit(x, y, degree)
nan
Finally, kind of unrelated, when a pandas series is passed into the function, I get another error, thats why I use the np.asarray
. I can open a separate issue for that if necessary:
>>> nolds.lyap_r(time_series, lag=4, min_tsep=8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 280, in lyap_r
orbit = delay_embedding(data, emb_dim, lag)
File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 110, in delay_embedding
return data[indices]
File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 642, in __getitem__
return self._get_with(key)
File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 674, in _get_with
return self.reindex(key)
File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 2426, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "/pyenv/lib/python3.6/site-packages/pandas/core/generic.py", line 2515, in reindex
fill_value, copy).__finalize__(self)
File "/pyenv/lib/python3.6/site-packages/pandas/core/generic.py", line 2528, in _reindex_axes
tolerance=tolerance, method=method)
File "/pyenv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2882, in reindex
tolerance=tolerance)
File "/pyenv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2598, in get_indexer
indexer = self._engine.get_indexer(target._values)
File "pandas/_libs/index.pyx", line 306, in pandas._libs.index.IndexEngine.get_indexer (pandas/_libs/index.c:7518)
File "pandas/_libs/hashtable_class_helper.pxi", line 808, in pandas._libs.hashtable.Int64HashTable.lookup (pandas/_libs/hashtable.c:14720)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
>>> time_series.shape
(619,)
The following warning checks for small frequencies in the estimation of lyap_r; and then assigns an arbitrary period based on the length of the time-series.
This make the analysis dependent on the length of the signal.
I have very long time series with a proper frequency, but which get into the condition because the relationship with the length of the signal (gait time series from accelerometers).
How to take this into consideration?
Line 237 in b52530a
Hello, I've been using a modified version of your code to calculate fractal dimension, lyapunov exponents, etc based on a custom embedding method. Would this be something valuable to include in the main repo? If so, I can put together a PR.
The basic structure is pretty simple: if the passed time series appears to be multivariate, then skip creating a lagged embedding.
Thanks very much for this amazing library.
Hi Christopher.
I would like to use nolds for some neuroscience applications/papers and it looks very useful.
Do you have a published version of this package or the original measures that are implemented? Journals sometimes ask about peer reviewed toolboxes...
Thanks for this great package.
catubc
EDIT: I see the https://cschoel.github.io/nolds/ link that has more documentation and citations.
Q: Is nolds in process of publication as a toolbox?
In #20, I noted that my implementation of corr_dim
deviates from the specification in two details:
Apparently, I just forgot to implement the required changes. This issue is here for reminding me to not do that again. ๐
Following the discussion in NeuroKit2 (neuropsychology/NeuroKit#206) it seems that there are multiple ways to implement DFA. => Check that we really use the correct implementation according to Peng 1995 and update the reference list accordingly.
by @Ziaeemehr:
make html does not work
Encoding error: 'ascii' codec can't decode byte 0xc3 in position 21: ordinal not in range(128) The full traceback has been saved in /tmp/sphinx-err-oRRyfk.log, if you want to report the issue to the >developers. Makefile:56: recipe for target 'html' failed
hi,
When i call nolds.lay_e with my datas, i got "cannot convert float infinity to integer" because of "diffs[i] = float('inf')" in the code. Do you tell me how to fix it ?
Thanks !
setuptools
to Poetry.Hi Christopher,
Great module! I'm just starting to play around with it. I'm calculating a rolling sampen on a single vector using a window 60 points wide, and I'm noticing that on some of the windows, the method returns inf. I've confirmed that there are no nan or inf values in my data array. The odd thing is that if I expand/contract the window slightly in the regions where infs are generated it returns a normal value. What might be the reason for this?
my emb_dim is 2 and an example array producing inf is below:
array([ 3.97897359, 2.57013282, 1.45842405, 2.68826392, 7.31447341,
5.78199552, 6.56846706, 4.87548617, 6.06460367, 5.98447964,
5.10990579, 3.98096388, 7.59746302, -0.07659857, 3.16797671,
5.77983908, 3.02430642, 4.95546237, 6.07352899, 8.09528219,
5.68092534, 5.57796569, 5.78986204, 6.45694346, 5.51082336,
3.62068115, 5.45598895, 4.73611244, 6.61318103, 5.00210034,
5.06841458, 4.11860764, 5.08174585, 4.93897692, 7.28100181,
4.73825745, 4.68800652, 6.07565328, 5.53210218, 5.16863708,
2.71207448, 3.51531164, 6.94629671, 3.8649315 , -0.41799253,
2.6010292 , 2.50768146, 4.78871296, -3.20448453, 4.32893052,
4.97199359, 6.514833 , 2.73325493, 1.86146005, 5.80707187,
3.37129316, 0.34194481, 3.44398483, 0.75766866, 0.81707214])
Thanks!
I'm getting a weird segfault when using the hurst_rs function with data from a HDF5 store. I am attempting to calculate the hurst exponent for stock time series data stored in the HDF5 store (approx 2500 stocks, with c2-3000 datapoints).
Edit: using poly, not RANSAC. have not tried with ransac.
If I remove the hurst_rs call from my code it goes on to complete as expected.
I have tried to replicate the problem with the hurst_rs function using just random data but it happily processes arrays of length 5000 many thousands of times. The problem only occurs when I am pulling data from the hdf5 store.
So it would appear that some weird interaction between nolds and hdf5 store is happening.
Summary of backtrace below:
*** Error in
python': corrupted double-linked list (not small): 0x0000000007722eb0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f5717dd47e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7d814)[0x7f5717dda814]
/lib/x86_64-linux-gnu/libc.so.6(+0x8215e)[0x7f5717ddf15e]
/lib/x86_64-linux-gnu/libc.so.6(+0x82850)[0x7f5717ddf850]
/lib/x86_64-linux-gnu/libc.so.6(realloc+0x179)[0x7f5717de0c89]
/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/tables/../../../libhdf5.so.10(H5MM_realloc+0x11)[0x7f56ece04781]
/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/tables/../../../libhdf5.so.10(+0x2510d4)[0x7f56ecf350d4]
/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/tables/../../../libhdf5.so.10(H5Z_pipeline+0x3a8)[0x7f56ecf34a48]
/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/tables/../../../libhdf5.so.10(H5D__chunk_lock+0x7d7)[0x7f56ecd5e047]
/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/tables/../../../libhdf5.so.10(+0x7be75)[0x7f56ecd5fe75]
/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/tables/../../../libhdf5.so.10(H5D__read+0x527)[0x7f56ecd6f4f7]
/home/jedwards/anaconda3/envs/py36/lib/python3.6/site-packages/tables/../../../libhdf5.so.10(H5Dread+0xfb)[0x7f56ecd6f99b]`
Hello,
I am interested in using nolds for the analysis of N-dimensional times-series.
I was wondering if there is a case for developping the support of N-dimensional data in the algorithms you have implemented (I'll be doing that work and providing a pull request down the line.) Or if running the algo on each dimensions separately and analysing the results separately makes more sense.
What do you think?
I test my installation by running some sample code with:
python -m nolds.examples all. However, I met some wrong problems. How to solve this problem.
/home/.local/lib/python2.7/site-packages/nolds/measures.py:45: RuntimeWarning: fitting mode 'RANSAC' requires the package sklearn, using'poly' instead
RuntimeWarning)
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/.local/lib/python2.7/site-packages/nolds/examples.py", line 75, in
plot_lyap()
File "/home/.local/lib/python2.7/site-packages/nolds/examples.py", line 40, in plot_lyap
trajectory_len=20)
File "/home/.local/lib/python2.7/site-packages/nolds/measures.py", line 349, in lyap_r
poly = poly_fit(ks[fit_offset:], div_traj[fit_offset:], 1, fit=fit)
File "/home/.local/lib/python2.7/site-packages/nolds/measures.py", line 51, in poly_fit
model = sklin.RANSACRegressor(sklin.LinearRegression(fit_intercept=False))
UnboundLocalError: local variable 'sklin' referenced before assignment
I use Lorenz system data for validation of the Rosenstein code but did not validate the results in Rosenstein's paper. can you explain how this nold is close to Rosenstein's paper results?
Might be as easy as running DFA multiple times. Reference: https://github.com/neuropsychology/NeuroKit/blob/cb37d83ee20d6a13a91c4848aa435f41e979e203/neurokit2/complexity/fractal_dfa.py#L8
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.