SINDY is a Scalable INclusion DependencY algorithm based on Apache Flink described in a BTW'15 paper. In contrast to the original algorithm in the publication, this repository contains derivatives of SINDY that support n-ary IND discovery and approximate/partial IND discovery.
SINDY is written in Java 8 and can be built using Maven 3.*. Although SINDY is based on Apache Flink, you don't necessarily need a cluster to run it (but you can). Flink is included as a library and will still use core-parallelism on single machines.
There are basically three options to run SINDY:
- You can integrate it as a library into your code (see the module
sindy-core
). Currently you can find the algorithmsSindy
(n-ary IND discovery) andSandy
(unary partial IND discovery). - You can use SINDY as a Metanome algorithm (see the module
sindy-metanome
). Running Maven'spackage
lifecycle phase, a Metanome algorithm jar file will be built. Note that Metanome currently does not support partial INDs. - You can run SINDY together with the Metadata Management System for further analysis of the discovered INDs (see the module
sindy-mdms
). Running Maven'spackage
lifecycle phase, a small distribution for the algorithm will be built. Note that currently, MDMS does not support partial INDs, either.