penn-graphics-research / claymore Goto Github PK

View Code? Open in Web Editor NEW

192.0 6.0 29.0 31.44 MB

License: MIT License

CMake 3.26% C++ 35.61% Cuda 60.77% Python 0.36%

cuda cpp14 mpm simulation-framework high-performance-computing gpu-computing physics-based-simulation

claymore's Introduction

A Massively Parallel and Scalable Multi-GPU Material Point Method

Documentation

Description

This is the opensource code for the SIGGRAPH 2020 paper:

A Massively Parallel and Scalable Multi-GPU Material Point Method

page, pdf, supp, video

Authors: Xinlei Wang*, Yuxing Qiu*, Stuart R. Slattery, Yu Fang, Minchen Li, Song-Chun Zhu, Yixin Zhu, Min Tang, Dinesh Manocha Chenfanfu Jiang (* Equal contributions)

Harnessing the power of modern multi-GPU architectures, we present a massively parallel simulation system based on the Material Point Method (MPM) for simulating physical behaviors of materials undergoing complex topological changes, self-collision, and large deformations. Our system makes three critical contributions. First, we introduce a new particle data structure that promotes coalesced memory access patterns on the GPU and eliminates the need for complex atomic operations on the memory hierarchy when writing particle data to the grid. Second, we propose a kernel fusion approach using a new Grid-to-Particles-to-Grid (G2P2G) scheme, which efficiently reduces GPU kernel launches, improves latency, and significantly reduces the amount of global memory needed to store particle data. Finally, we introduce optimized algorithmic designs that allow for efficient sparse grids in a shared memory context, enabling us to best utilize modern multi-GPU computational platforms for hybrid Lagrangian-Eulerian computational patterns. We demonstrate the effectiveness of our method with extensive benchmarks, evaluations, and dynamic simulations with elastoplasticity, granular media, and fluid dynamics. In comparisons against an open-source and heavily optimized CPU-based MPM codebase on an elastic sphere colliding scene with particle counts ranging from 5 to 40 million, our GPU MPM achieves over 100X per-time-step speedup on a workstation with an Intel 8086K CPU and a single Quadro P6000 GPU, exposing exciting possibilities for future MPM simulations in computer graphics and computational science. Moreover, compared to the state-of-the-art GPU MPM method, we not only achieve 2X acceleration on a single GPU but our kernel fusion strategy and Array-of-Structs-of-Array (AoSoA) data structure design also generalizes to multi-GPU systems. Our multi-GPU MPM exhibits near-perfect weak and strong scaling with 4 GPUs, enabling performant and large-scale simulations on a 1024x1024x1024 grid with close to 100 million particles with less than 4 minutes per frame on a single 4-GPU workstation and 134 million particles with less than 1 minute per frame on an 8-GPU workstation.

Compilation

This is a cross-platform C++/CUDA cmake project. The minimum version requirement of cmake is 3.15, yet the latest version is generally recommended. The required CUDA version is 10.2 or 11.

Currently, supported OS includes Windows 10 and Ubuntu (>=18.04), and tested compilers includes gcc8.4, msvc v142, clang-9 (includes msvc version).

Build

Run the following command in the root directory. Note that adding "--config Release" to the last command is needed when compiling using msvc.

cd build
cmake ..
cmake --build .

Or configure the project using the CMake Tools extension in Visual Studio Code (recommended).

Data

Currently, binary position data and the level-set (signed distance field) data are accepted as input files for particles. Uniformly sampling particles from analytic geometries is another viable way for the initialization of models.

Run Demos

The project provides the following GPU-based schemes for MPM:

GMPM: improved single-GPU pipeline
MGSP: static geometry (particle) partitioning multi-GPU pipeline

Go to Projects/**, run the executable.

Code Usage

Use the codebase in another cmake c++ project.

Directly include the codebase as a submodule, and follow the examples in the Projects.

Develop upon the codebase.

Create a sub-folder in Projects with a cmake file at its root.

Bibtex

Please cite our paper if you use this code for your research:

@article{Wang2020multiGMPM,
    author = {Xinlei Wang* and Yuxing Qiu* and Stuart R. Slattery and Yu Fang and Minchen Li and Song-Chun Zhu and Yixin Zhu and Min Tang and Dinesh Manocha and Chenfanfu Jiang},
    title = {A Massively Parallel and Scalable Multi-GPU Material Point Method},
    journal = {ACM Transactions on Graphics},
    year = {2020},
    volume = {39},
    number = {4},
    articleno = {Article 30}
}

Credits

This project draws inspirations from Taichi, GMPM.

Acknowledgement

We thank Yuanming Hu for useful discussions and proofreading, Feng Gao for his help on configuring workstations. We appreciate Prof. Chenfanfu Jiang and Yuanming Hu for their insightful advice on the documentation.

Dependencies

The following libraries are adopted in our project development:

cub (now replaced by Thrust)
fmt

For particle data IO and generation, we use these two libraries in addition:

Due to the C++ standard requirement (at most C++14) for compiling CUDA (10.2) code, we import these following libraries as well:

claymore's People

Contributors

Stargazers

Watchers

claymore's Issues

Error: ""Assertion failed: IsObject(), file D:\claymore\build\_deps\rapidjson-src\include\rapidjson/document.h, line 1274" while running GMPM

When I am running the GMPM executable with input files "knot1.poly" or "dragon_particles.bin", I keep getting error as mentioned in the title.

Any help is appreciated. Thank you.

incorrect implementation (?) of NACC algorithm in "constitutive_models.cuh"

In the function "compute_stress<float, MaterialE::NACC>" that updates the deformation gradient there is the variable "s_hat_trial_sqrnorm" that stands for the squared norm of the matrix s_hat. However, in the paper "CD-MPM: Continuum Damage Material Point Methods for Dynamic Fracture Animation: Supplemental Document", just the norm is used, not the squared norm.

Excutable issue with "cudaErrorInvalidDevice", any solution?

Try to run mgsp with single GPU, while report the following error

[Init -- Begin] Cuda
[InitInfo -- DevNum] Detected 1 CUDA Capable device(s)
[InitInfo -- DevNum] Prepare to use 1 device(s) in Multi-GPU test
[InitInfo -- Dev Property] GPU device 0 (0-th group on board)
global memory: 25390546944 bytes,
shared memory per block: 49152 bytes,
registers per SM: 65536,
Multi-Processor count: 84,
SM compute capabilities: 8.6.
[InitInfo -- stream] Create 32 streams for device 0
monotonic allocator alignment (Bytes): 512 size (MB): 2900.7
[InitInfo -- memory] device 0
free bytes/total bytes: 24332795904/25390546944,
pre-allocated size: 3041599488 bytes

    [Init] CudaContext 0
    [InitInfo -- Default Dev] Default context: 0

[Init -- End] == Finished 'Cuda' initialization

CUDA error at /home/chi/MyCode/GitHub/claymore/Library/MnSystem/Cuda/Cuda.h:73 code=101(cudaErrorInvalidDevice) "cudaSetDevice(dev_id)"

Any idea?

GMPM crashes for me, even before trying to load objects.

Where are benchmarks?

Hi,

I've checked your projects.

You compared with Taichi in your paper, but the code in this project contains only initialization.

Do you have any sources written in Taichi?
If you have any sources, could you share with us?

Best regards,

cmake error with "fopen_s"

Hi there,
I have recently encountered a cmake error that: "error: identifier 'fopen_s' is undefined" in the process of compiling gmpm. I didn't quite understand the cause of the error. How can I solve this problem? Thank you so much in advance

And my system info should be as: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, nvidia driver Version: 515.65.01, CUDA Version: 11.7

Can't figure out, how to get rid of this error

Severity Code Description Project File Line Suppression State
Error C2668 'mn::logic_and': ambiguous call to overloaded function mncuda C:\Users\janis\Downloads\claymore-master\claymore-master\Library\MnBase\Meta\Meta.h 37

CUDA error at D:\claymore\Projects\MGSP\mgsp_benchmark.cuh:33 code=2(cudaErrorMemoryAllocation) "cudaMalloc(&ret, bytes)"

Hi,
When I run MGSP with case 2 and constexpr int g_device_cnt = 1;, the program was able to generate 11 frames, then at step 4238 this error CUDA error at D:\claymore\Projects\MGSP\mgsp_benchmark.cuh:33 code=2(cudaErrorMemoryAllocation) "cudaMalloc(&ret, bytes)" popped out and the program just stuck there. (At the end of the post you can see the information for step 4237 and step 4238.) What could be the problem?

I am using a Windows machine with Quadro P2000. The initialization info. is the following:

   [InitInfo -- DevNum] Detected 1 CUDA Capable device(s)
   [InitInfo -- DevNum] Prepare to use 1 device(s) in Multi-GPU test
   [InitInfo -- Dev Property] GPU device 0 (0-th group on board)
   global memory: 4294967296 bytes                
   shared memory per block: 49152 bytes,               
   registers per SM: 65536,
   Multi-Processor count: 6,
   SM compute capabilities: 6.1.
   [InitInfo -- stream] Create 32 streams for device 0
   monotonic allocator alignment (Bytes): 512      size (MB): 421.366
   [InitInfo -- memory] device 0                
   free bytes/total bytes: 3534671054/0,
   pre-allocated size: 441833881 bytes
  [Init] CudaContext 0
  [InitInfo -- Default Dev] Default context: 0
  [Init -- End] == Finished 'Cuda' initialization

The information for step 4237 and step 4238 is:

←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4237 grid_update_query: 0.261056 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[1m0.42346576 --0.0001--> 0.45833334, defaultDt: 0.0001, maxVel: 2.8010573
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4237 halo_g2p2g: 0.221856 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] step 4237 collect_send_halo_grid: 0.0012 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4237 non_halo_g2p2g: 13.597696 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] step 4237 receive_reduce_halo_grid: 0.0171 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4237 update_partition: 2.519936 ms
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4237 build_partition_for_grid: 0.105344 ms
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4237 copy_grid_blocks: 0.494464 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[1m----------------------------------------------------------------
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] step 4237 halo_tagging: 0.291648 ms
←[0m←[38;2;000;128;000mhalo particle blocks[0]: 0
←[0m←[38;2;000;128;000mhalo grid blocks[0][0]: 0
←[0m←[1m----------------------------------------------------------------
←[0m←[1m←[38;2;255;255;000mblock count on device 0: 5383, 9291, 13522 [18000]; 42058 [62500]
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4237 build_partition_for_particles: 0.202016 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[1mresizing blocks 13522 -> 27000
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4238 grid_update_query: 0.262624 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[1m0.42356575 --0.0001--> 0.45833334, defaultDt: 0.0001, maxVel: 2.8178828
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4238 halo_g2p2g: 0.235968 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] step 4238 collect_send_halo_grid: 0.0011 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4238 non_halo_g2p2g: 13.612352 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] step 4238 receive_reduce_halo_grid: 0.0201 ms
←[0m←[1m----------------------------------------------------------------
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4238 update_partition: 2.341984 ms
←[0m←[38;2;000;255;255mGPU[0] frame 11 step 4238 build_partition_for_grid: 0.13248 ms
←[0mCUDA error at D:\Work\claymore\Projects\MGSP\mgsp_benchmark.cuh:33 code=2(cudaErrorMemoryAllocation) "cudaMalloc(&ret, bytes)"