Code Monkey home page Code Monkey logo

nla-group / fabba Goto Github PK

View Code? Open in Web Editor NEW
38.0 2.0 7.0 440.02 MB

A Python library for the fast symbolic approximation of time series

License: BSD 3-Clause "New" or "Revised" License

Python 95.55% Cython 3.45% Shell 0.07% TeX 0.92%
machine-learning dimensionality-reduction time-series time-series-analysis time-series-forecasting time-series-classification time-series-clustering symbolic-aggregate-approximation symbolic-representation

fabba's Introduction

fABBA

An efficient symbolic aggregate approximation for temporal data 🐬

Build Status !azure !pypi Documentation Status Download Status DOI DOI

fABBA is a fast and accurate symbolic representation method for temporal data. It is based on a polygonal chain approximation of the time series followed by an aggregation of the polygonal pieces into groups. The aggregation process is sped up by sorting the polygonal pieces and exploiting early termination conditions. In contrast to the ABBA method [S. Elsworth and S. GΓΌttel, Data Mining and Knowledge Discovery, 34:1175-1200, 2020], fABBA avoids repeated within-cluster-sum-of-squares computations which reduces its computational complexity significantly. Furthermore, fABBA is fully tolerance-driven and does not require the number of time series symbols to be specified by the user. [![DOI]

πŸš€ Install

fABBA supports Linux, Windows, and MacOS operating system.

Anaconda-Server Badge

fABBA has the following essential dependencies for its functionality:

* cython (>= 0.29.7)
* numpy (>= 1.19.5)
* scipy (>=1.2.1)
* requests
* scikit-learn (>=0.17.1)
* threadpoolctl (>= 2.0.0)
* matplotlib

To ensure successful Cython compiling, please update your NumPy to the latest version>= 1.22.0.

To install the current release via PIP use:

pip install fabba

Download this repository:

git clone https://github.com/nla-group/fABBA.git

It also supports conda-forge install: Anaconda-Server Badge

To install this package via conda-forge, run the following: conda install -c conda-forge fabba

🏁 Examples

⭐ Compress and reconstruct a time series

The following example approximately transforms a time series into a symbolic string representation (transform) and then converts the string back into a numerical format (inverse_transform). fABBA essentially requires two parameters tol and alpha. The tolerance tol determines how closely the polygonal chain approximation follows the original time series. The parameter alpha controls how similar time series pieces need to be in order to be represented by the same symbol. A smaller tol means that more polygonal pieces are used and the polygonal chain approximation is more accurate; but on the other hand, it will increase the length of the string representation. A smaller alpha typically results in a larger number of symbols.

The choice of parameters depends on the application, but in practice, one often just wants the polygonal chain to mimic the key features in time series and not to approximate any noise. In this example the time series is a sine wave and the chosen parameters result in the symbolic representation BbAaAaAaAaAaAaAaC. Note how the periodicity in the time series is nicely reflected in repetitions in its string representation.

import numpy as np
import matplotlib.pyplot as plt
from fABBA import fABBA

ts = [np.sin(0.05*i) for i in range(1000)]  # original time series
fabba = fABBA(tol=0.1, alpha=0.1, sorting='2-norm', scl=1, verbose=0)

string = fabba.fit_transform(ts)            # string representation of the time series
print(string)                               # prints aBbCbCbCbCbCbCbCA

inverse_ts = fabba.inverse_transform(string, ts[0]) # numerical time series reconstruction

Plot the time series and its polygonal chain reconstruction:

plt.plot(ts, label='time series')
plt.plot(inverse_ts, label='reconstruction')
plt.legend()
plt.grid(True, axis='y')
plt.show()

reconstruction

To play fABBA further with real datasets, we recommend users start with UCI Repository and UCR Archive.

⭐ Adaptive polygonal chain approximation

Instead of using fit_transform which combines the polygonal chain approximation of the time series and the symbolic conversion into one, both steps of fABBA can be performed independently. Here’s how to obtain the compression pieces and reconstruct time series by inversely transforming the pieces:

import numpy as np
from fABBA import compress
from fABBA import inverse_compress
ts = [np.sin(0.05*i) for i in range(1000)]
pieces = compress(ts, tol=0.1)               # pieces is a list of the polygonal chain pieces
inverse_ts = inverse_compress(pieces, ts[0]) # reconstruct polygonal chain from pieces

Similarly, the digitization can be implemented after compression step as below:

from fABBA import digitize
from fABBA import inverse_digitize
string, parameters = digitize(pieces, alpha=0.1, sorting='2-norm', scl=1) # compression of the polygon
print(''.join(string))                                 # prints aBbCbCbCbCbCbCbCA

inverse_pieces = inverse_digitize(string, parameters)
inverse_ts = inverse_compress(inverse_pieces, ts[0])   # numerical time series reconstruction

⭐ Alternative ABBA approach

We also provide other clustering based ABBA methods, it is easy to use with the support of scikit-learn tools. The user guidance is as follows

import numpy as np
from sklearn.cluster import KMeans
from fABBA import ABBAbase

ts = [np.sin(0.05*i) for i in range(1000)]         # original time series
#  specifies 5 symbols using kmeans clustering
kmeans = KMeans(n_clusters=5, random_state=0, init='k-means++', n_init='auto', verbose=0)     
abba = ABBAbase(tol=0.1, scl=1, clustering=kmeans)
string = abba.fit_transform(ts)                    # string representation of the time series
print(string)                                      # prints BbAaAaAaAaAaAaAaC
inverse_ts = abba.inverse_transform(string)        # reconstruction

⭐ For multiple time series data transform

Load JABBA package and data:

from fABBA import JABBA
from fABBA import loadData
train, test = loadData()

Built in JABBA provide parameter of init for the specification of ABBA methods, if set agg, then it will automatically turn to fABBA method, and if set it to k-means, it will turn to ABBA method automatically. Use JABBA object to fit and symbolize the train set via API fit_transform, and reconstruct the time series from the symbolic representation simply by

jabba = JABBA(tol=0.0005, init='agg', verbose=1)
symbols = jabba.fit_transform(train) 
reconst = jabba.inverse_transform(symbols)

Note: function loadData() is a lightweight API for time series dataset loading, which only supports part of data in UEA or UCR Archive, please refer to the document for full use detail. JABBA is used to process multiple time series as well as multivariate time series, so the input should be ensured to be 2-dimensional, for example, when loading the UCI dataset, e.g., Beef, use symbols = jabba.fit_transform(train) , when loading UEA dataset, e.g., BasicMotions, use symbols = jabba.fit_transform(train[0]) . For details, we refer to (https://www.cs.ucr.edu/~eamonn/time_series_data_2018/).

For the out-of-sample data, use the function transform to symbolize the test time series, and reconstruct the symbolization via function inverse_transform, the code illustration is as follows:

test_symbols, start_set = jabba.transform(test) # if UEA time series is used, simply use instead jabba.transform(test[0])
test_reconst = jabba.inverse_transform(test_symbols, start_set)

⭐ Image compression

The following example shows how to apply fABBA to image data.

import matplotlib.pyplot as plt
from fABBA.load_datasets import load_images
from fABBA import image_compress
from fABBA import image_decompress
from fABBA import fABBA
from cv2 import resize
img_samples = load_images() # load test images
img = resize(img_samples[0], (100, 100)) # select the first image for test

fabba = fABBA(tol=0.1, alpha=0.01, sorting='2-norm', scl=1, verbose=1)
string = image_compress(fabba, img)
inverse_img = image_decompress(fabba, string)

Plot the original image:

plt.imshow(img)
plt.show()

original image

Plot the reconstructed image:

plt.imshow(inverse_img)
plt.show()

reconstruction

🎨 Experiments

The folder "exp" contains all code required to reproduce the experiments in the manuscript "An efficient aggregation method for the symbolic representation of temporal data".

Some of the experiments also require the UCR Archive 2018 datasets which can be downloaded from UCR Time Series Classification Archive.

There are a number of dependencies listed below. Most of these modules, except perhaps the final ones, are part of any standard Python installation. We list them for completeness:

os, csv, time, pickle, numpy, warnings, matplotlib, math, collections, copy, sklearn, pandas, tqdm, tslearn

Please ensure that these modules are available before running the codes. A numpy version newer than 1.19.0 and less than 1.20 is required.

It is necessary to compile the Cython files in the experiments folder (though this is already compiled in the main module, the experiments code is separated). To compile the Cython extension in "src" use:

cd exp/src
python3 setup.py build_ext --inplace

or

cd exp/src
python setup.py build_ext --inplace

πŸ’Œ Others

We also provide C++ implementation for fABBA in the repository cabba, it would be nice to give a shot!

Run example:

git clone https://github.com/nla-group/fABBA.git
cd fABBA/cpp
g++ -o test runtime.cpp
./test

πŸ“Ž Citation

If you use fABBA in a scientific publication, we would appreciate your citing:

@article{10.1145/3532622,
author = {Chen, Xinye and G\"{u}ttel, Stefan},
title = {An Efficient Aggregation Method for the Symbolic Representation of Temporal Data},
year = {2023},
publisher = {ACM},
address = {USA},
volume = {17},
number = {1},
url = {https://doi.org/10.1145/3532622},
doi = {10.1145/3532622},
journal = {ACM Transactions on Knowledge Discovery from Data},
numpages = {22},
}
If you have any questions, please be free to reach us!

πŸ“ License

This project is licensed under the terms of the License.

fabba's People

Contributors

chenxinye avatar guettel avatar lrnv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fabba's Issues

Erroneous "This installation is not using Cython" message

I am consistently receiving a "This installation is not using Cython" and "cython fail" message when I use fABBA, even though I do have Cython installed. I have Cython version 3.0.8 and fABBA version 1.1.2, installed both using pip, and both packages appear in the same directory on my machine.

I'm not sure if this is affecting how fABBA actually runs, but as I follow the example code in the documentation, sometimes I get the same output, but sometimes not.

time series prediction

Hello! Thanks for developing this great library. i noticed that you also want to combine lstm in your subsequent work to predict time series. how is your current progress? after combining fabba with lstm by myself, the long-term prediction effect was always very poor, and it did not follow the trend for prediction.

picture1
picture2

Missing DOIs in bib

Joss's editorialbot there openjournals/joss-reviews#6294 found missing DOI's in the paper references:

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- 10.1007/s10618-020-00689-6 may be a valid DOI for title: ABBA: adaptive Brownian bridge-based symbolic aggregation of time series
- 10.1109/tit.1982.1056489 may be a valid DOI for title: Least squares quantization in PCM
- 10.1145/882082.882086 may be a valid DOI for title: A symbolic representation of time series, with implications for streaming algorithms
- 10.1145/3532622 may be a valid DOI for title: An efficient aggregation method for the symbolic representation of temporal data
- 10.1016/j.is.2023.102294 may be a valid DOI for title: ECG classification with learning ensemble based on symbolic discretization
- 10.1007/978-3-031-24378-3_4 may be a valid DOI for title: Fast Time Series Classification with Random Symbolic Subsequences
- 10.1007/s10618-007-0064-z may be a valid DOI for title: Experiencing SAX: a novel symbolic representation of time series
- 10.1109/seed55351.2022.00016 may be a valid DOI for title: Foreseer: Efficiently Forecasting Malware Event Series with Long Short-Term Memory
- 10.1016/j.ress.2023.109123 may be a valid DOI for title: Data-driven prognostics based on time-frequency analysis and symbolic recurrent neural network for fuel cells under dynamic load
- 10.1145/3448672 may be a valid DOI for title: A Framework for Generating Summaries from Temporal Personal Health Data

INVALID DOIs

- None

If you can check if the proposed DOI are right for your reference, and if so include them in your .bib that would be great :)

Discontinuous timeseries

@chenxinye Thanks for developing this great library. My questions are 1) Can this be applied to discontinuous time series where the sampling time / measurements is irregular?

  1. Are there specific circumstances based on which it is better to apply the clustering based compression?

  2. What is the key advantage of this method over SAX?

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.