Code Monkey home page Code Monkey logo

asl-hdbscan's Introduction

ASL21

Project work for the course Advanced Systems Lab, FS2021 @ETHZ

Team members (alphabetical order)

  • Mark Frey
  • Beat Hubmann
  • Martin Tschechne
  • Robin Worreby

Report

Team 33 Final Report

Code

Building and running

Create the build directory and run cmake and then make. The optimization level can be set from the command line using

cmake -D CMAKE_CXX_FLAGS="-O3"

This will generate two files, unit_tests and hdbscan. For testing just execute unit_tests.
hdbscan takes 6 required and 6 optional command line arguments. The minimal command to run is

./hdbscan -points_file=../data/data_set_medium.csv -hierarchy_file=output/hierarchy.csv -tree_file=output/tree.csv -visualization_file=output/results.vis -partition_file=output/partition.csv -outlier_score_file=output/outlier_scores.csv -num_points 1000 -num_dimension 2

The optional arguments are

  • constraints path to the constraints csv
  • min_pts minimum number of points (default: 8)
  • min_cl_size minimal cluster size (default: 8)
  • compact Whether or not to compact the output (default: true)
  • dist_function Which distance function to use (default: euclidean)
  • optimization_level Which optimization level to use (default: no_optimization)

This information can also be accessed by executing

hdbscan -help

Benchmarking

Different build variants (compiler optimizations) can be added in CMakeLists. To create a new build variant add

build_hdbscan_benchmarking("-O3")

to CMakeLists and change -O3 to whatever flag you want. All builds will be written to build/benchmarking with name hdbscan<OPTIMIZATION FLAG> (without <>). These executables can then e.g. be used by a python script to create the plots.

Benchmark MST

From the plotting folder execute

python3 plot_mst_benchmark.py --points_file=../data/unit_tests/example_high_dim_10000_32.csv --num_points=10000 --num_dimension=32 --recompute --output_name=results_10k --title_heatmap="Relative speed-up 10k points, 32 dimensions" --title_bar="Cycles by optimization, 10k points, 32 dimensions" --disable_nobitset --num_runs=2

Adapt the num_runs parameter to taste

Benchmarking with LIKWID

In order to benchmark with LIKWID you need to checkout out the 82-benchmark-with-likwid-v2 branch. Depending on your setup you might have to adjust the MSR rights on your machine, which can be done the following way:

sudo modprobe msr

An example on how to run LIKWID is given in run_files/run_likwid.sh. For further information on LIKWID you can check the official documentation here.

LIKWID benchmarking results

Benchmarking results

final comparison

Raw distance benchmarking results

Raw MST benchmarking results

Literature

Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection
Accelerated Hierarchical Density Clustering

asl-hdbscan's People

Contributors

beathubmann avatar martintschechne avatar rworreby avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

kungrainbow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.