Code Monkey home page Code Monkey logo

singularspectrumanalysis.jl's Introduction

SingularSpectrumAnalysis

CI codecov

A package for performing Singular Spectrum Analysis (SSA) https://en.wikipedia.org/wiki/Singular_spectrum_analysis

Simple Usage

The example below creates a simulated signal that has two strong seasonal components. The main entry function is analyze(y,L) that returns the trend and seasonal components. y is the signal to decompose and L is a window length to use for the internal embedding.

using SingularSpectrumAnalysis, Plots
# generate some data
L   = 20                      # Window length
K   = 100
N   = K*L;                    # number of datapoints
t   = 1:N;                    # Time vector
T   = 20;                     # period of main oscillation
y   = sin.(2pi/T*t);          # Signal
y .+= (0.5sin.(2pi/T*4*t)).^2 # Add another frequency
e   = 0.1randn(N);            # Add some noise
yn = y+e;
# plot(ys)

yt, ys = analyze(yn, L, robust=true) # trend and seasonal components
plot(yt, lab="Trend")
plot!(ys, lab="Season")

The robust keyword makes the analysis robust against large, sparse outliers, at the expense of longer computational time.

ESPRIT

  • esprit(x, L, r; fs=1, robust=false) Estimates r (positive) frequencies present in signal x using a lag-correlation matrix of size L.

Advanced usage

Internally a Hankel matrix is formed and the SVD of this is calculated. The singular values of the SVD can be plotted to manually determine which singular value belongs to the trend, and which pairs belong to seasonal components (these are always pairs).

USV = hsvd(yn,L,robust=false) # Perform svd on the trajectory matrix, robust uses a robust version of svd, resistant to outliers
plot(USV, cumulative=false) # Plot normalized singular values

window

seasonal_groupings = [1:2, 4:5] # Determine pairs of singular values corresponding to seasonal components
trend_i = 3 # If some singular value lacks a buddy, this is a trend component
# trend_i, seasonal_groupings = autogroup(USV) # This uses a heuristic
pairplot(USV,seasonal_groupings) # plot phase plots for all seasonal components
yrt, yrs = reconstruct(USV, trend_i, seasonal_groupings) # Reconstruct the underlying signal without noise, based on all identified components with significant singular values
yr = sum([yrt yrs],dims = 2) # Form full reconstruction
plot([y ys yr], lab=["y" "ys" "ys" "yr"])

Forecasting

We provide the function fit_trend(yt, order) to fit an n:th order polynomial to the trend:

yt, ys = analyze(yn, L)
A,x = fit_trend(yt, 1)

This returns the regressor matrix A and the polynomial coefficients x. This fit can be used to forecast the trend. To forecast the seasonal components, we make use of the package ControlSystemIdentification.jl to fit AR(na) models. We create a simulated signal to test with:

using Random
Random.seed!(0)
L = 20
K = 10
N = K*L;
t = 1:N;
T = 20;
y = sin.(2pi/T*t);            # Add seasons
y .+= (0.5sin.(2pi/T*4*t)).^2 # Add seasons
y .+= LinRange(0,1,N)         # Add trend
e = 0.1randn(N);
yn = y+e;                     # Add noise

Next, we use SSA to find the trend and the seasonal components

yt, ys = analyze(yn, L) # trend and seasons
using ControlSystemIdentification
pd  = PredictionData(yt, ys, trend_order=1, ar_order=2)
yth = trend(pd)
ysh = seasons(pd)

Next, we visualize the trends and seasonal components estimated by both SSA and AR models.

plot(ys, layout=size(ys,2), lab="SSA", title="Estimated seasonal components")
plot!(ysh, lab="AR")

window

plot(yt, lab="SSA", title="Estimated trend")
plot!(yth, lab="Polyfit")

window

yr = yt+sum(ys, dims=2)
plot(yn, lab="Measured", title="Full reconstructions")
plot!(yr, lab="SSA")
plot!(+(yth, ysh...), subplot=1, lab="AR", l=(:dash,))

window

To perform n-step prediction, use the function pred:

pd = pred(pd,2) # Predict two steps
yth = trend(pd)
ysh = seasons(pd)

The example above is implemented in forecast.jl.

Missing data / outliers

See the keyword argument robust. The robust estimation is handled by TotalLeastSquares.jl which performs a robust PCA of the Hankel matrix. This factorization handles large but sparse outliers very well. To indicate that a value is missing, you can set it to some large value that is very far from the other values and it will be identified as an outlier by the robust factorization. To obtain the inferred values for the missing data, call the low-level function directly

X = hankel(y,L) # Form trajectory matrix
X̂, E = rpca(X)
ŷ = unhankel(X̂)

Where is a clean version of the signal. The sparse matrix E contains the estimated noise values. See also function lowrankfilter which packages this procedure.

See further documentation and examples here.

Advanced low-level usage

See the implementation of functions hsvd and reconstruct

Reading

See http://www.jds-online.com/files/JDS-396.pdf for an easy-to-read introduction to SSA

singularspectrumanalysis.jl's People

Contributors

baggepinnen avatar github-actions[bot] avatar jiangxl avatar juliatagbot avatar staticfloat avatar tkelman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

singularspectrumanalysis.jl's Issues

ERROR: OutOfMemoryError()

I deal with recorded data with a sampling speed above 99kHz, if a signal comprises ten periods @0.1Hz the number of points is quite huge (these big data files is the reason for Julia instead of Octave).
Now my idea was, to utilize the esprit-function to calculate the initial guess for the frequencies in a signal, unfortunately Julia v1.6.7 throws the error message above, you can reproduce this error if you generate a signal with e.g. 19999901 points :-(

Multichannel SSA

Does your SingularSpectrumAnalysis also cover Multivariate/multichannel SSA as in https://arxiv.org/pdf/1309.5050 ?

I see that you follow DifferentialEquations.jl and other items in nonlinear dynamics. You may be interested to see JuliaDynamics where there is an implementation of Bloomhead and King (SSA) in nlts.jl applied to the gissinger model.

Example fails

julia> pd  = PredictionData(yt,ys, trend_order=1, ar_order=2)
ERROR: MethodError: no method matching ar(::Int64, ::Array{Float64,1}, ::Int64)
Closest candidates are:
  ar(::ControlSystemIdentification.AbstractIdData, ::Any; λ, estimator, scaleB, stochastic) at /Users/rusty/.julia/packages/ControlSystemIdentification/5Qu5M/src/arx.jl:154
Stacktrace:
 [1] (::SingularSpectrumAnalysis.var"#18#21"{Int64,Array{Float64,2}})(::Int64) at ./none:0
 [2] iterate at ./generator.jl:47 [inlined]
 [3] collect(::Base.Generator{UnitRange{Int64},SingularSpectrumAnalysis.var"#18#21"{Int64,Array{Float64,2}}}) at ./array.jl:686
 [4] PredictionData(::Array{Float64,1}, ::Array{Float64,2}; trend_order::Int64, ar_order::Int64) at /Users/rusty/.julia/packages/SingularSpectrumAnalysis/08sHl/src/SingularSpectrumAnalysis.jl:225
 [5] top-level scope at REPL[61]:1
 [6] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1088

Modification so that "analyze" works for complex-valued signal

Hello, thanks for the great work on this package!
I am working with complex valued time series and I noticed that using the analyse function with a complex-valued signal yields an "Inexact" error which comes from the reconstruct function. I think a simple way to make the method adapted to both real and complex valued signals would be to replace the line 101 of your code in the reconstruct function:

" yr = zeros(N,M)"
could be changed for
" yr = Array{Complex{Float64}}(undef,N,M)"

Cheers,
Jeanne.

[Future Plan] missing data treatment or gap filling and data forecasting

This package cannot deal with time series with missing data, would that be the future plan?
Also, SSA is an important methods in gap filling and data forecasting. will this be in future plan?

As I know, there is a such package in R, which is very powerful yet the speed is not as fast as this package in Julia.

Thanks!

esprit() fails to predict / estimate the frequencies inside a signal

Hi,
1.)
if I specify that two frequencies should be calculated (field "f"), it returns three.
2.)
Two frequencies are embedded but only one is calculated correctly

Attached my sample code:

using Printf, RobustModels
using DSP, SingularSpectrumAnalysis # both provide implementations of esprit()
# using MLJ # provides rms()

## --- compose signals: clean and disturbed --------------------------------------------------------------------------------
sampling_rate   = 10000                    # Sampling frequency
n_signal        = 3000                     # Length of signal
off_set_sign    = 1.0                      # vertical offset of signal
frequ_A         = 50.0                     # unit = Hz
frequ_B         = 120.0                    # unit = Hz
ampl_A          = 0.7                      # Amplitude @f1
ampl_B          = 1.0                      # Amplitude @f2
noise_ampl      = 0.3;                     # max noise amplitude
b_symmetry_on   = false                    # substract arithmetic mean before decomposition

# --- time vector: vec_t
delta_t         = 1.0 / sampling_rate     # Time step in sec
vec_t           = collect(range(0, step = delta_t, length = n_signal))  # Time vector

# create signal with sinus at frequ_A and frequ_B:
signal_clean     = off_set_sign .+ ampl_A .* sin.(2 * pi * frequ_A .* vec_t)  +  ampl_B .* sin.(2 * pi * frequ_B .* vec_t)
signal_disturbed = signal_clean + noise_ampl * randn(size(vec_t))
if b_symmetry_on
    signal_disturbed    = signal_disturbed .- mean(signal_disturbed)
    signal_clean        = signal_clean     .- mean(signal_clean)
end

## --- main ----------------------------------------------------------------------------------------------------------------
# --- check if values for amplitude higth are meaningful and figure out, if one or two amplitudes are larger then 0
if ampl_A > 0       &&  ampl_B == 0
    num_of_frequ = 1
elseif ampl_A == 0  &&  ampl_B > 0
    num_of_frequ = 1
elseif ampl_A > 0   &&  ampl_B > 0 
    num_of_frequ = 2
else
    error("Either amplitude A or B needs to be positive!")
end
# --- calculate frequency via function "esprit()"
# source esprit function A: https://docs.juliadsp.org/stable/estimation/
# source esprit function B: https://github.com/baggepinnen/SingularSpectrumAnalysis.jl/blob/master/src/SingularSpectrumAnalysis.jl#L158
# --- size of correlation matrix: equation is according to van der Veen and Leus: (n_signal + 1)/3.
lag_embedding_dimension = round(Int, (n_signal + 1)/3) # Size of lag embedding and the covariance matrix used (size of correlation matrix).
esprit_frequ_methode_A = DSP.Estimation.esprit(signal_clean, lag_embedding_dimension, 2*num_of_frequ, sampling_rate)
esprit_result = SingularSpectrumAnalysis.esprit(signal_clean, lag_embedding_dimension, num_of_frequ, fs = sampling_rate, robust = false)
esprit_frequ_methode_B = esprit_result.f
# ---
if size(esprit_frequ_methode_B)[1] > num_of_frequ
    error("Singal does not seem to be symetric around zero!")
end
# ---
if num_of_frequ == 2
    println(@sprintf("Frequency A: %.1f Hz, Frequ B: %.1f Hz, esprit result[1]: %.1f Hz, esprit_result[2]: %.1f Hz", 
    frequ_A, frequ_B, esprit_result.f[1], esprit_result.f[2] ))
end

Error when changing ar_order

When I change the ar_order argument in PredictionData(yt, ys, trend_order=1, ar_order=2) to anything more than 2, I get the following error (when using 6, for example):

ERROR: BoundsError: attempt to access 3-element Array{Float64,1} at index [7]

I’m not sure which 3-element Array it is trying to access, but it seems to always contain 3 elements, no matter what I do.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

streaming SSA

hi

thanks for putting this together, great package. qq: do we know if there is a way SSA or this library can handle streaming data? I have a lot of streaming data and performance is quite key.

thanks
Roh

Performance significantly slower on v1.9.0

Great package.

I've upgraded to 1.9.0 and have noticed that the performance is now significantly slower compared to 1.7.3?

I'm using analyze(TimeSeries, n, robust=Robust).

Any particular reason for this?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.