Code Monkey home page Code Monkey logo

bsa-spmm_euro-par-2024_'s Introduction

Accelerated Block-Sparsity-Aware Matrix Reordering for Leveraging Tensor Cores in Sparse Matrix-Multivector Multiplication

Prerequisites

  • g++ $\ge$ 11
  • cmake $\ge$ 3.14
  • git
  • python $\ge$ 3.9
  • CUDA $=$ 12.1
  • NVIDIA GPU with sm $\ge$ 80

Step 1. Setup and Download

Setup the environmental variable

Change some variables, CUDA_PATH and CUDA_ARCH, in the env.sh file according to your computer. CUDA_PATH denotes the path where nvcc is installed. And change CUDA_ARCH following the specification. Other environmental variables will be setup automatically.

export CUDA_PATH=/usr/local/cuda-12.1
export CUDA_ARCH=86

And then, execute the env.sh file with source command to export the environmental variables and install python packages.

source env.sh

Install one of the baselines, Sputnik

bash install_sputnik.sh

Download the dataset

bash download_data.sh

Install required package (for Debian)

The Debian user should install the bc package as shown below because the bc package is not pre-installed in the Debian system.

sudo apt-get install bc

Step 2. Compile and run the experiments

After running the shell script, The each figure file is generated and located in plots directory.

Compile the source codes

bash build.sh

To reproduce the figure 4

Benchmarking all algorithms in Figure 4 on the large DLMC dataset takes more than 5 hours. The paper includes ASpT-RR as a benchmark baseline in figure 4, but as it is not currently open-source, we are unable to provide it. Therefore, we ask for your understanding that it is not included in the released artifact.

bash run_fig4_dlmc_sh       

If you want to shorten the execution time and conduct a brief experiment, just run run_fig4_dlmc_short.sh.

This script conducts the experiment on just 2 matrices for each sparsity in a subfigure.

bash run_fig4_dlmc_short.sh # Brief version

To reproduce the figure 5

It will take about 30 minutes to run and plot the figure.

bash run_fig5_dlmc_sh

Similar to Figure 4, there is a brief version of Figure 5 that requires about 5 minutes to execute.

bash run_fig5_dlmc_short.sh # Brief version

To reproduce the figure 6

It will take about 30 minutes to run and plot the figure.

bash run_fig6_dlmc_sh

bsa-spmm_euro-par-2024_'s People

Contributors

dleunji avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.