Code Monkey home page Code Monkey logo

bspmm's Introduction

NWChem's BSPMM communication pattern

Block sparse matrix multiplication (BSPMM) is the dominant cost in the CCSD and CCSD(T) quantum chemical many-body methods of NWChem, a prominent quantum chemistry application suite for large-scale simulations of chemical and biological systems. NWChem implements its BSPMM with dense matrix operations using a get-compute-update pattern: each worker (processing entity) uses MPI_Get to retrieve the submatrices it needs, and after the multiplication it uses an MPI_Accumulate to update the memory at the target location.

This repository contains mini-apps that implement a 2D version of BSPMM to perform A ร— B = C, wherein the input matrices A and B are composed of tiles. The nonzero tiles are evenly distributed among the ranks in a round-robin fashion. Each rank maintains a work-unit table that lists all the multiplication operations that workers need in order to cooperatively execute. Rank 0 hosts a global counter, which the workers fetch and add atomically (MPI_Fetch_and_op). The fetched counter serves as an index to the work-unit table. Each worker locally accumulates its C tiles until the next fetched work unit corresponds to a different C tile, in which case the worker uses an MPI_Accumulate to update the C tile. A worker is a process in MPI everywhere and a thread in MPI+threads.

This mini-app was written after discussions with Pavan Balaji, Min Si, and Shintaro Iwasaki from Argonne National Laboratory, and with Jeff Hammond from Intel.

Versions

bspmm_single.c

  • MPI everywhere version (flat MPI)

bspmm_multiple.c

  • MPI+OpenMP version using MPI_THREAD_MULTIPLE
  • Expressing no logical parallelism exposed to the MPI library

bspmm_multiple_nwins.c

  • MPI+OpenMP version using MPI_THREAD_MULTIPLE
  • Expressing logical parallelism to the MPI library
    • The MPI_Get operations of each thread are independent. Hence, each thread uses their own window.
    • All threads must use the same window for MPI_Accumulate operations since atomicity across windows for the same memory location is undefined. However, the issue of MPI_Accumulate operations from multiple threads could still occur independently since BSPMM does not require ordering of these operations. Hence, we hint accumulate_ordering=none to window for MPI_Accumulate operations.

bspmm's People

Contributors

rzambre avatar sb17v avatar

Stargazers

 avatar Nikolaos Tselepidis avatar Bernd Doser avatar Yiltan avatar Sergei Bastrakov avatar Jeff Hammond avatar Chamin Nalinda avatar

Watchers

James Cloos avatar Chamin Nalinda avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.