Code Monkey home page Code Monkey logo

sax's Introduction

Time series symbolic discretization (approximation) with SAX

maven build codecov.io Maven Central License

SonarCloud

This code is released under GPL v.2.0 and implements in Java:

  • Symbolic Aggregate approXimation (i.e., SAX) toolkit stack [1]
  • EMMA (Enumeration of Motifs through Matrix Approximation) algorithm for time series motif discovery [2]
  • HOT-SAX - a time series anomaly (discord) discovery algorithm [3]
  • time series bitmap-related routines [4]
Note that the most of library's functionality is also available in R and Python as well...

[1] Lin, J., Keogh, E., Patel, P., and Lonardi, S., Finding Motifs in Time Series, The 2nd Workshop on Temporal Data Mining, the 8th ACM Int'l Conference on KDD (2002)

[2] Patel, P., Keogh, E., Lin, J., Lonardi, S., Mining Motifs in Massive Time Series Databases, In Proc. ICDM (2002)

[3] Keogh, E., Lin, J., Fu, A., HOT SAX: Efficiently finding the most unusual time series subsequence, In Proc. ICDM (2005)

[4] Kumar, N., Lolla, V.N., Keogh, E.J., Lonardi, S. and Chotirat (Ann) Ratanamahatana, Time-series Bitmaps: a Practical Visualization Tool for Working with Large Time Series Databases, In SDM 2005 Apr 21 (pp. 531-535).

Citing this work:

If you are using this implementation for you academic work, please cite our Grammarviz 3.0 paper:

[Citation] Senin, P., Lin, J., Wang, X., Oates, T., Gandhi, S., Boedihardjo, A.P., Chen, C., Frankenstein, S., GrammarViz 3.0: Interactive Discovery of Variable-Length Time Series Patterns, ACM Trans. Knowl. Discov. Data, February 2018.

an alternative solution for recurrent and anomalous patterns detection:

If you are interested in more advance techniques for time series pattern discovery -- the one which allows to discover recurrent and anomalous patterns of variable length -- please check out our new tool called GrammarViz 2.0. Based on symbolic discretization, Grammatical Inference, and algorithmic (i.e., Kolmogorv complexity) this new approach facilitates linear-time variable length motifs discovery and orders of magnitude faster than HOT-SAX discords discovery (but exactness is not guaranteed).

0.0 SAX transform in a nutshell

SAX is used to transform a sequence of rational numbers (i.e., a time series) into a sequence of letters (i.e., a string). An illustration of a time series of 128 points converted into the word of 8 letters:

SAX in a nutshell

As discretization is probably the most used transformation in data mining, SAX has been widely used throughout the field. Find more information about SAX at its authors pages: SAX overview by Jessica Lin, Eamonn Keogh's SAX page, or at sax-vsm wiki page.

1.0 Building

The code is written in Java and I use maven to build it. Use the profile single to build an executable jar containing all the dependencies:

$ mvn package -P single

[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building jmotif-sax
[INFO]    task-segment: [package]

...

[INFO] Building jar: /media/Stock/git/jmotif-sax/target/jmotif-sax-1.0.1-SNAPSHOT-jar-with-dependencies.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL

2.0 Time series to SAX conversion using CLI

The jar file can be used to convert a time series (represented as a single-column text file) to SAX via sliding window in command line:

$ java -jar target/jmotif-sax-0.1.1-SNAPSHOT-jar-with-dependencies.jar
Usage: <main class> [options] 
Options:
		--alphabet_size, -a
		   SAX alphabet size, Default: 3
		--data, -d
		   The input file name
		--out, -o
   		   The output file name
		--strategy
   		   SAX numerosity reduction strategy
   		   Default: EXACT, Possible Values: [NONE, EXACT, MINDIST]
		--threads, -t
   		   number of threads to use, Default: 1
		--threshold
   		   SAX normalization threshold, Default: 0.01
		--window_size, -w
   		   SAX sliding window size, Default: 30
		--word_size, -p
   		   SAX PAA word size, Default: 4

When run, it prints the time series point index and a corresponding word:

$ java -jar "target/jmotif-sax-1.0.1-SNAPSHOT-jar-with-dependencies.jar" \ 
                      -d src/resources/test-data/ecg0606_1.csv -o test.txt
$ head test.txt
0, aabc
8, aacc
13, abcc
20, abcb
...

3.0 API usage

There two classes implementing end-to-end workflow for SAX. These are TSProcessor (implements time series-related functions) and SAXProcessor (implements the discretization). Below are typical use scenarios:

3.1 Discretizing time-series by chunking:

// instantiate classes
NormalAlphabet na = new NormalAlphabet();
SAXProcessor sp = new SAXProcessor();

// read the input file
double[] ts = TSProcessor.readFileColumn(dataFName, 0, 0);

// perform the discretization
String str = sp.ts2saxByChunking(ts, paaSize, na.getCuts(alphabetSize), nThreshold);

// print the output
System.out.println(str);

3.2 Discretizing time-series via sliding window:

// instantiate classes
NormalAlphabet na = new NormalAlphabet();
SAXProcessor sp = new SAXProcessor();

// read the input file
double[] ts = TSProcessor.readFileColumn(dataFName, 0, 0);

// perform the discretization
SAXRecords res = sp.ts2saxViaWindow(ts, slidingWindowSize, paaSize, 
	na.getCuts(alphabetSize), nrStrategy, nThreshold);

// print the output
Set<Integer> index = res.getIndexes();
for (Integer idx : index) {
	System.out.println(idx + ", " + String.valueOf(res.getByIndex(idx).getPayload()));
}

3.3 Multi-threaded discretization via sliding window:

// instantiate classes
NormalAlphabet na = new NormalAlphabet();
SAXProcessor sp = new SAXProcessor();

// read the input file
double[] ts = TSProcessor.readFileColumn(dataFName, 0, 0);

// perform the discretization using 8 threads
ParallelSAXImplementation ps = new ParallelSAXImplementation();
SAXRecords res = ps.process(ts, 8, slidingWindowSize, paaSize, alphabetSize, 
	nrStrategy, nThreshold);

// print the output
Set<Integer> index = res.getIndexes();
for (Integer idx : index) {
	System.out.println(idx + ", " + String.valueOf(res.getByIndex(idx).getPayload()));
}

3.4 Time series motif (recurrent pattern) discovery

Class EMMAImplementation implements a method for getting the most frequent structural patterns with EMMA algorithm:

// read the data
double[] series = TSProcessor.readFileColumn(DATA_FNAME, 0, 0);

// find the best motif with EMMA
MotifRecord motifsEMMA = EMMAImplementation.series2EMMAMotifs(series, MOTIF_SIZE, 
				MOTIF_RANGE, PAA_SIZE, ALPHABET_SIZE, ZNORM_THRESHOLD);
    		
// print motifs
System.out.println(motifsEMMA);

Class SAXRecords implements a method for getting the most frequent SAX words:

// read the data
double[] series = TSProcessor.readFileColumn(DATA_FNAME, 0, 0);

// instantiate classes
Alphabet na = new NormalAlphabet();
SAXProcessor sp = new SAXProcessor();

// perform discretization
saxData = sp.ts2saxViaWindow(series, WIN_SIZE, PAA_SIZE, na.getCuts(ALPHABET_SIZE),
    		NR_STRATEGY, NORM_THRESHOLD);
    		
// get the list of 10 most frequent SAX words
ArrayList<SAXRecord> motifs = saxData.getMotifs(10);
SAXRecord topMotif = motifs.get(0);
    
// print motifs
System.out.println("top motif " + String.valueOf(topMotif.getPayload()) + " seen " + 
	   		topMotif.getIndexes().size() + " times.");

3.5 Time series anomaly detection using brute-force search

The BruteForceDiscordImplementation class implements a brute-force search for discords, which is intended to be used as a reference in tests (HOTSAX and NONE yield exactly the same discords).

discordsBruteForce = BruteForceDiscordImplementation.series2BruteForceDiscords(series, 
   WIN_SIZE, DISCORDS_TO_TEST, new LargeWindowAlgorithm());
    
    for (DiscordRecord d : discordsBruteForce) {
       System.out.println("brute force discord " + d.toString());
    }

3.6 Time series anomaly (discord) discovery using HOTSAX

The HOTSAXImplementation class implements a HOTSAX algorithm for time series discord discovery:

  discordsHOTSAX = HOTSAXImplementation.series2Discords(series, DISCORDS_TO_TEST, WIN_SIZE,
      PAA_SIZE, ALPHABET_SIZE, STRATEGY, NORM_THRESHOLD);
      
  for (DiscordRecord d : discordsHOTSAX) {
    System.out.println("hotsax hash discord " + d.toString());
  }

Note, that the "proper" strategy to use with HOTSAX is NumerosityReductionStrategy.NONE but you may try others in order to speed-up the search, exactness however, is not guaranteed.

The library source code has examples (tests) for using these here and here.

4.0 Time series bitmap

The library also implements simple routines to convert a time series to bitmap following [4]. Here is an example of six datasets from the paper: Six "normal" datasets

which were converted into the digram frequencies tables and colored with Jet palette:

Six "normal" datasets as bitmaps

and then clustered (hc, ave) based on the digram occurrence frequencies (euclidean):

Six "normal" datasets clustered via bitmap

5.0 Threaded performance

The plot shows the speedup achieved when using the parallelized SAX version on the dataset 300_signal1.txt of length 536,976 points. Parameters used in the experiment: sliding window size 200, PAA size 11, alphabet size 7, and three different NR strategies.

Note, that for MINDIST numerosity reduction strategy the parallelized code performs NONE-based discretization first and prunes the result second. The difference in performance for 7+ CPUs on the plot below is due to the uneven server load, I guess.

Performance plot

Made with Aloha!

Made with Aloha!

Versions:

1.1.4

  • fixing EMMA for a case of "tie" -- chosing a motif with the smallest variance

1.1.3

  • adding an EMMA implementation with a fake ADM
  • some old code optimization (breaking some APIs)

1.1.2

  • maintenance release -- most of changes in the shingling routines, fitting its API for other projects

1.1.1

  • HOTSAX implementation parameters bug fix

1.1.0

  • zNormalization behavior for a case when SD is less than threshold is changed -- yields zeros
  • GrammarViz3.0 release

1.0.10

  • shingling/bitmap CLI fixes
  • SAX via chunking fixes -- proper symbol indexes computed (thanks s-mckay!)

1.0.9

  • fixed the error with the discord size computation
  • changed HOTSAX and BRUTEFORCE behavior by adding z-Normalization to the distance computation routine

1.0.8

  • added shingling

1.0.7

  • logback dependencies removed

1.0.5 - 1.0.6

  • added discretization approximation error computation for grammarviz3 work

1.0.4

  • fixed SAX transform via sliding window, last window is now added

1.0.3

  • improved PAA performance

1.0.1 - 1.0.2

  • more tests, bug fixes, CI

0.0.1 - 1.0.0

  • cleaning up the old JMotif code and decoupling the SAX code from the rest

sax's People

Contributors

m-lerner avatar psenin-sanofi avatar seninp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sax's Issues

Feature Request: String to PAA

My request is motivated by the need to calculate the slope of a pattern.

So as to not let this request be too specific, it might be more generally useful to convert a pattern to a PAA (double[]) which can then be used to do calculations on it.

PAA conversion like SAXProcessor#ts2string() but in reverse.

TSProcessor and SAXProcessor exception handling improvement

Ideally, create and throw SAXRuntimeException.

In the following case, a checked exception is thrown on a condition similar to where typically java.lang.IndexOutOfBoundsException is thrown.

However IndexOutOfBoundsException extends RuntimeException.

There is little chance in the following case to meaningfully catch the checked exception and recover from it. Once it is thrown, it is game over because of a programming error.

SAXProcessor
public char[] ts2string(double[] ts, int paaSize, double[] cuts, double nThreshold)
throws SAXException {

throws it because it calls

TSProcessor
public double[] paa(double[] ts, int paaSize) throws SAXException {
if (len < paaSize) {
throw new SAXException("PAA size can't be greater than the timeseries size.");
}

What one ends up doing as a developer using this library is to ring fence the exception in a way like this:

final char[] ts2string;
try {
ts2string = sp.ts2string(windowSubseries, paaSize, na.getCuts(alphabetSize), normalizationThreshold);
} catch (SAXException ex) {
throw new RuntimeException(ex);
}

which disrupts the work flow when writing software.

SAXProcessor doesn't process series of length = window size

In SAXProcessor, the following loop prevents from processing the last windowSize elements of a series.
E.g. if the series is the same size of the window, no record is generated.

for (int i = 0; i < ts.length - windowSize; i++) 

I guess this should be

for (int i = 0; i <= ts.length - windowSize; i++) 

Using sax on gps points

Hi, I have a gps points dataset. It includes timestamp, position, velocity and other attributes.
I want to symbolize the velocity using the sax.
If the time interval between two consecutive point is not fixed, does sax support this case?

Maven repo availability

Hi Pavel, are there any plans to turn this lib available in any maven repository?

Thanks.

Java API implementing test time series string conversion by individual time series(rows) from CSV

@seninp

Have you got any equivalent implementation (java API) for the function that read test time series (CSV) dataset row-by-row and then converts them into SAX wordbag with each bag corresponding to individual time series?

I'm looking for the equivalence to the following R-code...

for (i in c(1:length(data_test[,1]))) {
print(paste(i))
series = data_test[i,]

bag = series_to_wordbag(series, w, p, a, "exact", 0.01)

Appreciate your quick help !

Thanks,
KMB

PAA doesn't behave as expected for series whose length is not multiple of paa size

I was starting from your code to create a SAX processor that operates reactively.
While coding PAA to operates on enumerables/observable collections I found something that I guess it's an error in TSProcessor PAA implementation
.
Take for example an array [1,2,3,4,5,6,7] and PAA size = 4.

The following R code (taken from this repo) generates what seems to me the correct output

x <- array(1:7, dim=c(7))
l <- length(x)
PAA_number = 4
PAA <- array(0, PAA_number)
for (i in 1:PAA_number) {
  PAA[i] <- mean(x[round((i - 1) * l/PAA_number + 1):round(i * l/PAA_number)])
  print(paste("segment",i,"[",      # a debug output
          round((i - 1) * l/PAA_number + 1),
          ":",round(i * l/PAA_number),"] ->",
          PAA[i]))
}

Output is:

[1] "segment 1 [ 1 : 2 ] -> 1.5"
[1] "segment 2 [ 3 : 4 ] -> 3.5"
[1] "segment 3 [ 4 : 5 ] -> 4.5"
[1] "segment 4 [ 6 : 7 ] -> 6.5"

To the contrary, the TSProcessor paa function generates:

[0] = 1.4285714285714286
[1] = 3.1428571428571428
[2] = 4.8571428571428568
[3] = 6.5714285714285712

Can you check?

do you have any examples of using shingles in time series classification?

Hi Senin,

Recently, I came across approach (cosine similarity) that tries to use k-shingles to help classify text documents based on similarity in subsequent string patterns?
My quick search found that you actually have implemented public classes in Java (in SAXProcessor.java that indeed represents timeseries into Shingles using:

  1. TS2Shingles - for single time series
  2. manySeriesToShingles - for multiple time series datasets

However, my question to you is have you attempted classifying time series based on k-shingles ? If yes, does it fair well than SAX-VSM? Any other findings from such..pls advice.

Thanks,
KMB

Generalization of Alphabet implementation

A while ago I read a bit about SAX, and played around with different implementation options. The results have not been published. This was just some recreational thing. I now reviewed this code, and thought about bringing it into a publishable shape, but a quick websearch about java sax time series led me to your implementation - and I'll probably not put any efforts into my implementation, because it most likely won't be able to compete with your library anyhow.

However, I was curious about how you implemented the breakpoint- and distance matrix computations. And ... admittedly, when I saw the giant arrays and switches in https://github.com/jMotif/SAX/blob/master/src/main/java/net/seninp/jmotif/sax/alphabet/NormalAlphabet.java , I twitched a little ;-)

The following is an implementation of your Alphabet class that does the computation of the cuts and the distance matrix generically, on the fly, for arbitrary alphabet sizes.

package net.seninp.jmotif.sax.alphabet;

import net.seninp.jmotif.sax.SAXException;

public class GeneralAlphabet extends Alphabet
{
    @Override
    public Integer getMaxSize()
    {
        return Integer.MAX_VALUE;
    }

    @Override
    public double[] getCuts(Integer size) throws SAXException
    {
        return computeEquiprobableRegions(size);
    }

    @Override
    public double[][] getDistanceMatrix(Integer size) throws SAXException
    {
        return computeDistanceMatrix(getCuts(size));
    }

    /**
     * The square root of two
     */
    private static final double SQRT_TWO = Math.sqrt(2);

    /**
     * Create an array that describes the breakpoints for dividing a
     * Gaussian distribution into equiprobable regions.  
     *  
     * @param n The number of bisection points
     * @return The array
     */
    private static double[] computeEquiprobableRegions(int n)
    {
        double b[] = new double[n - 1];
        for (int s = 1; s <= n - 1; s++)
        {
            double x = ((double) s / n);
            b[s - 1] = probit(x);
        }
        return b;
    }

    /**
     * The probit function.<br> 
     * <br>
     * (See http://en.wikipedia.org/wiki/Probit)
     * 
     * @param x The argument
     * @return The result
     */
    private static double probit(double x)
    {
        return SQRT_TWO * erfInv(2 * x - 1);
    }

    /**
     * The inverse error function.<br>
     * <br> 
     * (See http://en.wikipedia.org/wiki/Error_function)
     * 
     * @param x The argument
     * @return The result
     */
    private static double erfInv(double x)
    {
        // From https://people.maths.ox.ac.uk/gilesm/files/gems_erfinv.pdf
        double w = -Math.log((1.0 - x) * (1.0 + x));
        double p = 0;
        if (w < 5.0)
        {
            w = w - 2.5;
            p = 2.81022636e-08;
            p = 3.43273939e-07 + p * w;
            p = -3.5233877e-06 + p * w;
            p = -4.39150654e-06 + p * w;
            p = 0.00021858087 + p * w;
            p = -0.00125372503 + p * w;
            p = -0.00417768164 + p * w;
            p = 0.246640727 + p * w;
            p = 1.50140941 + p * w;
        }
        else
        {
            w = Math.sqrt(w) - 3.0;
            p = -0.000200214257;
            p = 0.000100950558 + p * w;
            p = 0.00134934322 + p * w;
            p = -0.00367342844 + p * w;
            p = 0.00573950773 + p * w;
            p = -0.0076224613 + p * w;
            p = 0.00943887047 + p * w;
            p = 1.00167406 + p * w;
            p = 2.83297682 + p * w;
        }
        return p * x;
    }

    /**
     * Computes the MINDIST lookup matrix, as described in the SAX paper
     * 
     * @param b The breakpoints, as computed with 
     * {@link #computeEquiprobableRegions(int)}
     * @return The distance matrix
     */
    private static double[][] computeDistanceMatrix(double b[])
    {
        int n = b.length + 1;
        double result[][] = new double[n][n];
        for (int r = 0; r < n; r++)
        {
            for (int c = r; c < n; c++)
            {
                double value = 0;
                if (Math.abs(r - c) > 1)
                {
                    value = b[Math.max(r, c) - 1] - b[Math.min(r, c)];
                }
                result[r][c] = value;
                result[c][r] = value;
            }
        }
        return result;
    }    
}

I'm not so sure abut the usage pattern of instances of this class. If the methods are called repeatedly and are performance critical, one could consider "caching" the results, so that the overhead for the computation will vanish, and the performance will eventually be the same as for the current, switch-based implementation - roughly like that:

package net.seninp.jmotif.sax.alphabet;

import java.util.LinkedHashMap;
import java.util.Map;

import net.seninp.jmotif.sax.SAXException;

public class CachingAlphabet extends Alphabet
{
    private final Alphabet delegate;
    private final Map<Integer, double[]> cuts;
    private final Map<Integer, double[][]> distanceMatrices;

    CachingAlphabet(Alphabet delegate)
    {
        this.delegate = delegate;
        this.cuts = new LinkedHashMap<Integer, double[]>();
        this.distanceMatrices = new LinkedHashMap<Integer, double[][]>();
    }

    @Override
    public Integer getMaxSize()
    {
        return delegate.getMaxSize();
    }

    @Override
    public double[] getCuts(Integer size) throws SAXException
    {
        double result[] = cuts.get(size);
        if (result == null)
        {
            result = delegate.getCuts(size);
            cuts.put(size, result);
        }
        return result;
    }

    @Override
    public double[][] getDistanceMatrix(Integer size) throws SAXException
    {
        double result[][] = distanceMatrices.get(size);
        if (result == null)
        {
            result = delegate.getDistanceMatrix(size);
            distanceMatrices.put(size, result);
        }
        return result;
    }
}

To be used as

Alphabet alphabet = new CachingAlphabet(new GenericAlphabet());

Regardless of the possible caching, here's a quick test showing that the results delivered by this implementation are epsilon-equal to the results from your current implementation.

package net.seninp.jmotif.sax.alphabet;

import java.util.Locale;

import net.seninp.jmotif.sax.SAXException;

public class AlphabetTest
{
    public static void main(String[] args) throws SAXException
    {
        NormalAlphabet a0 = new NormalAlphabet();
        GeneralAlphabet a1 = new GeneralAlphabet();

        int max = Math.min(a0.getMaxSize(), a1.getMaxSize());
        for (int i = 2; i < max; i++)
        {
            double[] c0 = a0.getCuts(i);
            double[] c1 = a1.getCuts(i);

            System.out.println("For " + i);
            //System.out.println("c1: "+Arrays.toString(c0));
            //System.out.println("c0: "+Arrays.toString(c1));

            double d0[][] = a0.getDistanceMatrix(i);
            double d1[][] = a1.getDistanceMatrix(i);

            //System.out.println("d0:\n" + createString(d0));
            //System.out.println("d1:\n" + createString(d1));

            System.out.println("maxAbsoluteDifference for cuts           : " + 
                maxAbsoluteDifference(c0, c1));
            System.out.println("maxAbsoluteDifference for distance matrix: " + 
                maxAbsoluteDifference(d0, d1));
        }
    }

    private static double maxAbsoluteDifference(double a0[], double a1[])
    {
        double maxAbsoluteDifference = Double.NEGATIVE_INFINITY;
        for (int i = 0; i < a0.length; i++)
        {
            double d = Math.abs(a0[i] - a1[i]);
            maxAbsoluteDifference = Math.max(maxAbsoluteDifference, d);
        }
        return maxAbsoluteDifference;
    }

    private static double maxAbsoluteDifference(double a0[][], double a1[][])
    {
        double maxAbsoluteDifference = Double.NEGATIVE_INFINITY;
        for (int i = 0; i < a0.length; i++)
        {
            maxAbsoluteDifference = Math.max(maxAbsoluteDifference, 
                maxAbsoluteDifference(a0[i], a1[i]));
        }
        return maxAbsoluteDifference;
    }

    private static String createString(double a[][])
    {
        StringBuilder sb = new StringBuilder();
        for (int r=0; r<a.length; r++)
        {
            if (r > 0)
            {
                sb.append("\n");
            }
            for (int c=0; c<a[r].length; c++)
            {
                if (c > 0)
                {
                    sb.append("  ");
                }
                sb.append(String.format(Locale.ENGLISH, "%5.5f", a[r][c]));
            }
        }
        return sb.toString();
    }
}

I considered opening this as a pull request, but think that it might be better to post this here, because there are several degrees of freedom for a possible integration. (You could even use this class to generate the lookup tables for your implementation, so that at least the 26 (instead of 20) letters can be covered). But of course, whether you use it or not is up to you.

If I use and publish this code in one of my libraries, it will be under the MIT/X11 license. But, from my perspective, the above code can be considered as "public domain" (one function refers to a paper, the link is in the code), so everybody may do with this whatever he wants.

Performance improvement in TSProcessor

Impressive API. Thanks :)

Please let me suggest small performance improvements that I have verified to be effective with JMH.

net.seninp.jmotif.sax.TSProcessor

public double mean(double[] series) {
double res = 0D;
int count = 0;// series.length has it
for (double tp : series) {
res += tp;
count += 1;
}
if (count > 0) {
return res / ((Integer) count).doubleValue();// (double)count is faster as it does not need boxing
}
return Double.NaN;
}

So we get:

public double mean(double[] series) {
double res = 0D;
for (double tp : series) {
res += tp;

}
    if (series.length > 0) {
        return res / (double)series.length;
}
return Double.NaN;

}

and

public double mean(int[] series) {
double res = 0D;
for (int tp : series) {
res += (double) tp;
}
if (series.length > 0) {
return res / (double)series.length;
}
return Double.NaN;
}

and

public double var(double[] series) {
double res = 0D;
double mean = mean(series);
for (double tp : series) {
res += (tp - mean) * (tp - mean);
}
if (series.length > 0) {
return res / (double)series.length;
}
return Double.NaN;
}

and

public double stDev(double[] series) {
double num0 = 0D;
double sum = 0D;
for (double tp : series) {
num0 = num0 + tp * tp;
sum = sum + tp;
}
double len = (double)series.length;
return Math.sqrt((len * num0 - sum * sum) / (len * (len - 1)));
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.