I have been successful in run the nvm-latency-bench without GPU. The output of that is

When I run nvm-latency-bench I get the following output <div class="snippet-clipbo

Great to hear :) When you use the --gpu</c

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Issue when using the cuda example/benchmark about ssd-gpu-dma HOT 12 CLOSED

enfiskutensykkel commented on July 17, 2024

Issue when using the cuda example/benchmark

from ssd-gpu-dma.

Comments (12)

ZaidQureshi commented on July 17, 2024 1

Thank you for your responses. Your suggestions worked, I believe it was the issue of the missing Module.symvers file.

By the way, I have tested your code on kernel 4.15 and it seems to be working fine.

This is not really an issue, but I have a question: with the cuda-latency test or the nvm-latency with --gpu flag, is CPU compeltely out of the data and control path as you are using both GPUDirect RDMA and ASYNC?

Thanks.

from ssd-gpu-dma.

enfiskutensykkel commented on July 17, 2024

Hi,

Could you try passing --gpu 0 as argument to nvm-latency-bench and post the results?

Also, please run dmesg and post any output from the libnvm helper kernel module (if there is any).

It could also be worth trying to build the project in debug mode (using -DCMAKE_BUILD_TYPE=Debug as argument to CMake. Have you verified that the iommu is disabled?

Regards,
Jonas

from ssd-gpu-dma.

enfiskutensykkel commented on July 17, 2024

Could you also verify that the IOMMU is disabled? (For example show the output of cat /proc/cmdline | grep iommu). I plan on implementing support for it in newer kernels, but haven't gotten so far yet.

Regards,
Jonas

from ssd-gpu-dma.

ZaidQureshi commented on July 17, 2024

When I run nvm-latency-bench I get the following output

./bin/nvm-latency-bench --ctrl=/dev/libnvm0 --blocks=1000  --queue="no=128,location=local" --gpu 0

Resetting controller... DONE
Preparing queues... DONE
Preparing buffers and transfer lists... FAIL
Unexpected runtime error: Failed to map device memory for controller: Invalid argument

This is the output after I recompiled with debug mode

./bin/nvm-latency-bench --ctrl=/dev/libnvm0 --blocks=1000  --queue="no=128,location=local" --gpu 0

Resetting controller... DONE
Preparing queues... DONE
Preparing buffers and transfer lists... [map_memory] Page mapping kernel request failed: Invalid argument
FAIL
Unexpected runtime error: Failed to map device memory for controller: Invalid argument

The output in the kernel log after running the above program is

[Aug 3 09:50] Unknown ioctl command from process 28198: 1075347458

IOMMU should be disabled as there is no output printed for cat /proc/cmdline | grep iommu

from ssd-gpu-dma.

enfiskutensykkel commented on July 17, 2024

Hi,

It appears that the kernel module has not been compiled with CUDA support. When you run CMake on a clean catalogue, the status output should say Using NVIDIA driver found in ${driver_dir} and Configuring kernel module with CUDA.

The build script tries to locate the driver automatically, but might fail looking up (for example if the Module.symvers file isn't generated), in which case you probably need to point it to the driver path manually. For example:
cmake .. -DNVIDIA=/usr/src/nvidia-384-384.111

The driver folder also need to contain a file called Module.symvers, if it doesn't you probably need to run make in that directory so that it is generated.

Let me know if this solves your problem or not. :)

Regards,
Jonas

from ssd-gpu-dma.

enfiskutensykkel commented on July 17, 2024

Great to hear :)

When you use the --gpu flag on nvm-latency-bench the CPU is still responsible for submitting commands and processing completions, but the disk is writing data directly into (or reading directly from) GPU memory. So this example only uses the GPUDirect RDMA feature.

It's only for the nvm-cuda-bench that the CPU is entirely out of the control path and everything is controlled by the GPU. The relevant code for this is in benchmarks/cuda/main.cu#, namely the readSingleBuffered and readDoubleBuffered CUDA kernels. In other words, this example uses both Async and RDMA.

Also note that this benchmark has lower bandwidth because it is moving memory from an input buffer into an output buffer (with an offset) in order to emulate a GPU workload.

from ssd-gpu-dma.

ZaidQureshi commented on July 17, 2024

By nvidia-cuda-bench did you mean nvm-cuda-bench?
Thank you, actually I am interested in the nvm-cuda-bench example as it completely removes CPU from the control path. I will look at the code that you pointed at. Thank you so much for the hints.

from ssd-gpu-dma.

ZaidQureshi commented on July 17, 2024

Actually, I just decided to test nvm-cuda-bench, and either their is an issue in some calculations in the code or maybe something is broken but when I run it I get the following output:

./bin/nvm-cuda-bench --ctrl=/dev/libnvm0 

CUDA device           : 0 Tesla V100-PCIE-16GB (0000:09:00.0)
Controller page size  : 4096 B
Namespace block size  : 512 B
Number of threads     : 32
Chunks per thread     : 32
Pages per chunk       : 1
Total number of pages : 1024
Total number of blocks: 8192
Double buffering      : no
Event time elapsed    : 8.192 µs
Estimated bandwidth   : 512000.001 MiB/s

That is an insanely high bandwidth. My SSD is supposed to have a bandwidth of about 6GB/s.

When I run it with stats set to true all the columns have the value of either 0 or -nan.

from ssd-gpu-dma.

enfiskutensykkel commented on July 17, 2024

I suspect this may be caused by me not having tested for more recent GPUs than Pascal and not compiling SM code for newer architectures.

Please try adding specifying -Dnvidia_archs=70 to CMake and rebuild (you might need to do a make clean first).

Let me know if that works :)

Regards,
Jonas

from ssd-gpu-dma.

enfiskutensykkel commented on July 17, 2024

Hi @ZaidQureshi,

Just a follow up, did setting the nvidia_archs flag work for you?

Regards,
Jonas

from ssd-gpu-dma.

enfiskutensykkel commented on July 17, 2024

I hope that the issue was resolved. Please reopen it or make any additional comments if there is anything else.

from ssd-gpu-dma.

Issue when using the cuda example/benchmark about ssd-gpu-dma HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent