Our Cluster install Apache Spark 2.1.1. and Hadoop-2.7.3. If you need to run it in other Spark version, just update the spark version in build.sbt file of source code and recompile it.
The code has been writen in Scala and compiled in SBT.
Make a directory: mkdir ./etc and put the config.conf file under it. Change the configure in config.conf file.
- rw** : parameters generate raw time series;
- idx** : parameters generate TARDIS index;
- cl** : parameters create ground truth for KNN query;
- eq** : parameters control the exact match query and KNN query;
Before running program:
- create etc directory and put config.conf file under etc.
- create log directory.
- put spark-defaults.conf under the directory or use the default spark program configuration.
~/spark/bin/spark-submit --class org.apache.spark.edu.wpi.dsrg.tardis.TARDIS --properties-file ./spark-defaults.conf tardis_2.11-1.0.jar -h
- -h : display help information;
- -g : generate raw time series;
- -b : build index;
- -c knn : create ground truth for knn query;
- -q : run time series similarly query;
eqQueryType = exact: exact matching query, knn: kNN-Approximate query eqKnnType = 0: target node access, 1: one partition access, 2: multi partitions access
Change cluster application configure in the spark-defaults.conf
For whole cluster environment, consult your cluster administer. The configure file should be stored under hadoop/etc/hadoop/ and spark/conf directories.
If you use our program or code, please reference this paper in your paper as
Liang Zhang, Noura Alghamdi, Mohamed Y. Eltabakh, Elke A. Rundensteiner. TARDIS: Distributed Indexing Framework for Big Time Series Data. Proceedings of 35th IEEE International Conference on Data Engineering ICDE, 2019.