Code Monkey home page Code Monkey logo

parchain's Introduction

ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain

This repository contains code for ParChain: A Framework for Parallel Hierarchical Agglomerative Clustering using Nearest-Neighbor Chain.

To get the submodules:

git pull
git submodule update --init

Implementations

Each folder contains an implementation that we tested.

parchain Our ParChain algorithms for complete linkage with Euclidean distance, Ward's linkage with Euclidean distance, average linkage with Euclidean distance and average linkage with squared Euclidean distance. This program is adapted from the problem based benchmark suite (PBBS).

multi-threaded-NN-chain Jeon et al.'s parallel nearest-neighbor chain implementation. It supports average linkage with Euclidean distance.

fast_protein_cluster fast_protein_cluster parallel HAC implementation that works for complete and average linkage with Euclidean distance metric. The original code works for root-mean-square deviation (RMSD) and template modeling score (TM-score) measures.

fastcluster fastcluster is a sequential HAC implementation in C++ with python interface.

scipy and sklearn can be installed from python.

althaus Althaus et al.'s is a parallel HAC algorithm for complete linkage. We implemented it in C++.

Installation

Compiler:

  • g++ = 7.5.0 with support for Cilk Plus
  • python3 >= 3.6.9
  • make

Python libraries:

  • scipy >= 1.5.4
  • sklearn >= 0.24.2
  • pandas
  • numpy

Data

You can download our datasets here

Running Tests

For running time tests, we use numactl. It can be installed using apt install numactl. For the memory tests, valgrind >= 3.17.0 is required. You can download here.

To run our scripts, the datasets need to be in the "datasets" folder under this folder.

PC, PC-m, and PC-mr

The runtime/memory experiment result of our algorithm runs the following. The result files will be in outputs/outputs_{method} and outputs/memo_outputs. You can modify datasets and dims in pc_exp.sh for the datasets to run.

./pc_exp.sh complete
./pc_exp.sh ward
./pc_exp.sh avg 32
./pc_exp.sh avgsq 

./pc_exp_memo.sh

The runtime/memory experiment result of our algorithm runs the following. The result files will be in outputs/outputs_matrix and outputs/memo_outputs. You can modify datasets and dims in pc_exp.sh for the datasets to run.

./pc_matrix_exp.sh complete
./pc_matrix_exp.sh ward
./pc_matrix_exp.sh avg
./pc_matrix_exp.sh avgsq 

./pc_matrix_exp_memo.sh

The runtime/memory experiment result of our algorithm runs the following. The result files will be in outputs/outputs_matrix_range and outputs/memo_outputs. You can modify datasets and dims in pc_exp.sh for the datasets to run.

./pc_matrix_range_exp.sh complete
./pc_matrix_range_exp.sh ward
./pc_matrix_range_exp.sh avg
./pc_matrix_range_exp.sh avgsq 

./pc_matrix_exp_range_memo.sh

Jeon

The runtime/memory experiment result of Jeon runs the following. The result files will be in outputs/outputs_jeon and outputs/memo_outputs.

./jeon_exp.sh
./jeon_exp_memo.sh

fastprotein

The runtime/memory experiment result of fastprotein runs the following. The result files will be in outputs/outputs_fastprotein and outputs/memo_outputs.

./fastprotein_exp.sh
./fastprotein_exp_memo.sh

Scipy and Sklearn and fastcluster

The runtime/memory experiment result of scipy and sklearn runs the following. The result files will be in outputs/outputs_scipy, outputs/outputs_sklearn, outputs/outputs_fastcluster and outputs/memo_outputs. You need to install Scipy, Sklearn, and fastcluster first. Scipy and Sklearn and be installed using pip. The fastcluster submodule in this repo includes some additional methods (linear memory nearest neighbor chain for Ward's and average linkage with squared Euclidean distance metric). You can refer to fastcluster/ for installation method.

./python_libs_exp.sh
./python_libs_exp_memo.sh

Althaus

The runtime/memory experiment result of fastprotein runs the following. The result files will be in outputs/outputs_clink and outputs/memo_outputs.

./clink_exp.sh

parchain's People

Contributors

yushangdi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.