Polyite is a tool that iteratively optimizes the schedule of a program that is representable in the Polyhedron Model in order to exploit the computing capacity of a given multi-core hardware.
The exploration of the schedule search space can be done either at random or in a guided fashion using a genetic algorithm.
We describe the approach in detail in our TACO'2017 article Iterative Schedule Optimization for Parallelization in the Polyhedron Model.
Polyite is released under MIT license.
Polyite depends on LLVM, Clang, Polly, isl, libbarvinok and Armin Größlinger's implementation of Chernikova's algorithm. LLVM, Clang and Polly are released under the LLVM Release License or derivates of it. isl and Chernikova are released under MIT license. libbarvinok is GPL licensed.
We start from the bottom up with the dependences of Polyite and install everything inside the same directory, which could be your IDE's workspace.
We explain how to get Polyite running for the benchmarks from the Polybench 4.1 benchmark suite.
export BASE_DIR=$PWD
Polyite uses an extended version of Polly (we provide a version of 3.9.1 and the older version of January, 2016 that we refer to in our publications) that is capable of importing schedule trees from our extended JSCOP format and further transforming the imported schedules, for instance by tiling them. Therefore, you must clone LLVM, Clang and Polly from our specially provided repositories.
- Create a root directory for LLVM
mkdir llvm_root
export LLVM_ROOT="${BASE_DIR}/llvm_root"
- Clone LLVM
cd ${LLVM_ROOT}
git clone https://github.com/stganser/llvm.git
- Get Clang
cd ${LLVM_ROOT}/llvm/tools
git clone https://github.com/stganser/clang.git
- Get Polly
cd ${LLVM_ROOT}/llvm/tools
git clone https://github.com/stganser/polly.git
- Create a build directory for LLVM and build it using cmake
mkdir ${LLVM_ROOT}/llvm_build
cd ${LLVM_ROOT}/llvm_build
cmake ${LLVM_ROOT}/llvm
make
-
Make sure you have libgmp and libclang (version 3.8) (both including headers) installed on your system, as well as libtool
-
Get and build isl
cd ${BASE_DIR}
git clone https://github.com/stganser/isl.git
cd isl
mkdir install
export ISL_INSTALL="${PWD}/install"
./autogen.sh
./configure --prefix=${ISL_INSTALL} --with-jni-include=/usr/lib/jvm/default-java/include/ --with-clang=system
make install
- Generate the bindings
cd ${BASE_DIR}/isl/interface
make isl-scala.jar
cp -r java/gen src
cp scala/src/isl/Conversions.scala src/isl
zip -r isl-scala.jar src
The last three steps include the source code of the bindings into the generated library.
This subproject makes the isl Scala bindings accessible to Polyite.
- Clone the repository:
cd ${BASE_DIR}
git clone https://github.com/stganser/scala-isl-utils.git
cd ${BASE_DIR}/scala-isl-utils
export ISL_UTILS_ROOT=${BASE_DIR}/scala-isl-utils
mkdir libs
cp ${BASE_DIR}/isl/interface/isl-scala.jar libs
cp ${ISL_INSTALL}/lib/libisl*so* libs
libbarvinok provides an implementation of Barvinok's counting algorithm, which can be used to determine a polyhedron's volume. Since we do not have Scala bindings for libbarvinok, Polyite calls a small C-binary when it needs to determine a polyhedron's volume.
- Clone the repositories:
cd ${BASE_DIR}
git clone http://repo.or.cz/barvinok.git
cd barvinok
git checkout barvinok-0.39
export BARVINOK_INSTALL=barvinok/install
cd ${BASE_DIR}
git clone https://github.com/stganser/barvinok_binary.git
export BARVINOK_BINARY_ROOT=${BASE_DIR}/barvinok_binary
- Follow the projects' build instructions and make sure that you install
libbarvinok to
${BASE_DIR}/barvinok/install
.
This Scala library provides an implementation of Chernikova's algorithm to switch between the constraints representation and the generator representation of polyhedra.
cd ${BASE_DIR}
git clone https://github.com/stganser/chernikova.git
export CHERNIKOVA_ROOT=${BASE_DIR}/chernikova
The following steps describe how obtain Polyite itself and put everything together to run benchmarks from Polybench 4.1. Since we do not provide the configuration for a build tool, such as sbt, the easiest way to build Polyite is probably by using Scala IDE.
- Get the code:
cd ${BASE_DIR}
git clone https://github.com/stganser/polyite.git
export POLYITE_ROOT=${BASE_DIR}/polyite
- Download required libraries
cd ${POLYITE_ROOT}
mkdir libs
Now, download Scala 2.11.6 from
http://www.scala-lang.org/download/2.11.6.html
and copy scala-2.11.6/lib/scala-library.jar
and
scala-2.11.6/lib/scala-parser-combinators_2.11-1.0.3.jar
to
${POLYITE_ROOT}/libs
Further, one must download Apache Commons Lang version
3.4 and Apache
Commons Math version 3.6.1.
Again, copy the downloaded JAR-files to ${POLYITE_ROOT}/libs
.
Polyite relies on OpenMPI for its distributed
genetic algorithm. To be able to build Polyite, Java bindings for OpenMPI
must be in Polyites classpath. Follow the instructions on
https://www.open-mpi.org/faq/?category=java#java_build to build the bindings. Copy the file
mpi.jar
to ${POLYITE_ROOT}/libs
. Polyite works with OpenMPI version 2.1.1.
-
Import the projects polyite and scala-isl-utils in Scala IDE/ Eclipse and build everything. Make sure that the libraries in
${POLYITE_ROOT}/libs
are in your build path, as well as${ISL_UTILS_ROOT}/libs/isl-scala.jar
. Make sure, that your Scala compiler assumes Scala version 2.11. Make chernikova and scala-isl-utils depend on the downloaded Scala version as well and make chernikova depend on scala-isl-utils. -
To use Polyite, one must execute one of the following scripts, depending on the desired execution mode. The scripts assume that you have OpenJDK 8 installed in
/usr/lib/jvm/java-1.8.0-openjdk-amd64/
(the default location on Debian based systems). *run_genetic_opt
to execute the genetic algorithm *run_rand_exploration
to execute random exploration *run_rand_exploration_letsee
to execute random exploration using the search space construction described in Pouchet et al., PLDI'08.
The script `print_scop_size` extracts several SCoP metrics during benchmark
preparation.
In each of the scripts, set
* `ISL_UTILS_LOC` to the value of `${ISL_UTILS_ROOT}`
* `POLYITE_LOC` to the value of `${POLYITE_ROOT}`
* `CHERNIKOVA_LOC` to the value of `${CHERNIKOVA_ROOT}`
-
To compile and execute a schedule in order to determine its profitability, Polyite starts the script
measure_polybench.bash
. At the top of script, setPOLLY_INSTALL_DIR
to the value of${LLVM_ROOT}/llvm_build
. -
The scripts in
${POLYITE_ROOT}/polybench_scripts/
are used to prepare Polybench 4.1 benchmarks for optimization with Polyite. They generate prepared LLVM-IR code, execute baseline measurements, compute reference output and generate configuration files for random exploration or optimization with the genetic algorithm. * InmeasureBaseline.bash
setPOLLY_INSTALL_DIR
to the value of${LLVM_ROOT}/llvm_build
. * Inprepare_benchmarks.bash
setPOLLY_INSTALL_DIR
to the value of${LLVM_ROOT}/llvm_build
andPOLYITE_LOC
to the value of{POLYITE_ROOT}
. This script can later be called to prepare a list of Polybench 4.1 benchmarks. You may want to adapt the values of the default configuration files that the script generates for each benchmark. Compile${POLYITE_ROOT}/config_help.tex
to get a documentation of Polyite's
configuration options. * The filepolly_configurations.txt
contains a list of Polly configurations thatprepare_benchmarks.bash
considers during the baseline measurements.
Install `libpapi` version 5.4.3.
-
Download Polybench 4.1 and unpack the archive to
${POLYITE_ROOT}/polybench-c-4.1
. -
Create symbolic links in
polybench-c-4.1
:
cd ${POLYITE_ROOT}/polybench-c-4.1
ln -s ../polybench_scripts/baselineCollectData.bash baselineCollectData.bash
ln -s ../polybench_scripts/polly_configurations.txt polly_configurations.txt
ln -s ../polybench_scripts/benchmarks.txt benchmarks.txt
ln -s ../polybench_scripts/collectAllBaselineResults.bash collectAllBaselineResults.bash
ln -s ../polybench_scripts/generateRefOut.bash generateRefOut.bash
ln -s ../polybench_scripts/measureBaseline.bash measureBaseline.bash
ln -s ../polybench_scripts/prepare_benchmarks.bash prepare_benchmarks.bash
Most tools that are part of Polyite will print a help message if invoked without command line parameters.
To prepare benchmark gemm perform the following steps:
cd ${POLYITE_ROOT}/polybench-c-4.1
./prepare_benchmarks.bash true false false gemm
This creates the directory polybench-c-4.1/gemm
with the following content:
config_ga_gemm_kernel_gemm_%entry.split---%for.end40.properties
config_rand_gemm_kernel_gemm_%entry.split---%for.end40.properties
gemm.c
gemm.h
kernel_gemm___%entry.split---%for.end40.jscop
polybench.c
polybench.h
ref_output
There are
- configuration files for random exploration and optimization with the genetic algorithm. To understand and modify these, compile and read
${POLYITE_ROOT}/config_help.tex
. - All source files that are required to compile the benchmark.
- A JSCOP file that contains the model of the SCoP to optimize.
- A file containing reference output that was produced by a binary compiled with -O0.
To change the code regions to optimize (Polyite can optimize one SCoP at a
time) or change data set sizes, edit the file
polybench-c-4.1/benchmarks.txt
.
Now, you can run schedule optimization with the genetic algorithm:
cd ${POLYITE_ROOT}
./run_genetic_opt polybench-c-4.1/gemm/kernel_gemm___%entry.split---%for.end40.jscop polybench-c-4.1/gemm/config_ga_gemm_kernel_gemm_%entry.split---%for.end40.properties
Depending on your configuration, this produces one JSON file per generation of the genetic algorithm, one CSV file and a directory containing one JSCOP file per schedule for manual application of the generated schedules. The JSON files contain all attributes of the generated schedules and can be read by Polyite, for instance, in order to restart an interrupted run of the genetic algorithm or to generate further generations.
Analogously, random exploration can be run, using
cd ${POLYITE_ROOT}
./run_rand_exploration polybench-c-4.1/gemm/kernel_gemm___%entry.split---%for.end40.jscop polybench-c-4.1/gemm/config_rand_gemm_kernel_gemm_%entry.split---%for.end40.properties
To use SLURM for the evaluation of your schedules, put something like the following into your configuration file:
numMeasurementThreads=42
measurementCommand=srun -Aduckdonald -pthebigcluster --constraint=fastest_cpu_available_plx --mem=32768 --exclusive -t31 -n1 ${POLYITE_ROOT}/measure_polybench.bash
It is important to use srun
, since Polyite must be able into interactively
communicate with the benchmarking script via STDIN and STDOUT. Polyite can
pass a given numactl configuration to
the benchmarking script.
In the evaluation of our papers about Polyite, pre-vectorization by Polly was
disabled (-polly-vectorizer=none
). At least in Polly 3.8/3.9 Polyite will
break Polly's schedule tree optimization, as the pre-vectorizer operates on any
schedule tree band node that has a coincident member. Yet, Polly seems to
expect that coincident members occur only in band nodes whose iteration domain
contains a single statement. We provide the configuration option expectPrevectorization
which, if set to true
, will cause Polyite to process
schedule trees that will not break Polly's pre-vectorizer. This feature is experimental and is only meant as a basic support for pre-vectorization with
Polly. It is unclear, whether Polyite is suitable for a schedule search space
exploration with pre-vectorization enabled in its current state.
Suggestions for improvements are welcome :-)
Polyite has the ability to learn performance models from the results of
previous iterative optimization runs. It can use these models to speed up
iterative optimization process by using a classifier that can identify likely
unprofitable schedules. To learn the classifier, Polyite relies on Python 3.5
and scikit-learn version 0.19.2. Follow
their instructions for the installation of scikit-learn and its dependencies
via pip
.
We provide tools for the generation of training data for the classifier. The data must originate from runs of Polyites random exploration or genetic algorithm during which benchmarking had been used to assess schedules fitness.
The tool merge_populations
can be used to merge sets of schedules and
their evaluation results. These could for instance be the populations of a run
of the genetic algorithm. Polyite writes each of these to a separate file. Merging schedules into a single file that belong to different SCoPs is impossible. merge_populations
as the following command line interface:
./merge_populations <JSCOP file> <output JSON file> <output CSV file> <num. execution time measurements> <population files>
- A JSCOP file that represents the SCoP
- The output JSON file that will contain the merged set of schedules.
- A CSV file in Polyite's output CSV format that contains isl union map representations of the schedules in the merged set of schedules together with the results from benchmarking the schedules.
- The number of execution time measurements per schedule during the generating of the input data.
- The JSON files containing the input data.
The tool fitness_calculator
labels schedules and calculates their feature
vectors. It has the following command line ínterface:
./fitness_calculator <JSCOP file> <configuration file> <input CSV file> <output CSV file>
- A JSCOP file that models the SCoP to which the schedules in the input CSV file corresponds.
- A CSV file contain the schedules to be labeled together with their measured execution times.
- The output CSV file that Polyite will use to train its classifier.
Polyite can train its classifier using training data from multiple CSV files. Read the documentation of Polyite's configuration options for details.
© 2017-2019, Stefan Ganser