Code Monkey home page Code Monkey logo

Comments (12)

ZaidQureshi avatar ZaidQureshi commented on July 17, 2024 1

Thank you for your responses. Your suggestions worked, I believe it was the issue of the missing Module.symvers file.

By the way, I have tested your code on kernel 4.15 and it seems to be working fine.

This is not really an issue, but I have a question: with the cuda-latency test or the nvm-latency with --gpu flag, is CPU compeltely out of the data and control path as you are using both GPUDirect RDMA and ASYNC?

Thanks.

from ssd-gpu-dma.

enfiskutensykkel avatar enfiskutensykkel commented on July 17, 2024

Hi,

Could you try passing --gpu 0 as argument to nvm-latency-bench and post the results?

Also, please run dmesg and post any output from the libnvm helper kernel module (if there is any).

It could also be worth trying to build the project in debug mode (using -DCMAKE_BUILD_TYPE=Debug as argument to CMake. Have you verified that the iommu is disabled?

Regards,
Jonas

from ssd-gpu-dma.

enfiskutensykkel avatar enfiskutensykkel commented on July 17, 2024

Could you also verify that the IOMMU is disabled? (For example show the output of cat /proc/cmdline | grep iommu). I plan on implementing support for it in newer kernels, but haven't gotten so far yet.

Regards,
Jonas

from ssd-gpu-dma.

ZaidQureshi avatar ZaidQureshi commented on July 17, 2024

When I run nvm-latency-bench I get the following output

./bin/nvm-latency-bench --ctrl=/dev/libnvm0 --blocks=1000  --queue="no=128,location=local" --gpu 0

Resetting controller... DONE
Preparing queues... DONE
Preparing buffers and transfer lists... FAIL
Unexpected runtime error: Failed to map device memory for controller: Invalid argument

This is the output after I recompiled with debug mode

./bin/nvm-latency-bench --ctrl=/dev/libnvm0 --blocks=1000  --queue="no=128,location=local" --gpu 0

Resetting controller... DONE
Preparing queues... DONE
Preparing buffers and transfer lists... [map_memory] Page mapping kernel request failed: Invalid argument
FAIL
Unexpected runtime error: Failed to map device memory for controller: Invalid argument

The output in the kernel log after running the above program is

[Aug 3 09:50] Unknown ioctl command from process 28198: 1075347458

IOMMU should be disabled as there is no output printed for cat /proc/cmdline | grep iommu

from ssd-gpu-dma.

enfiskutensykkel avatar enfiskutensykkel commented on July 17, 2024

Hi,

It appears that the kernel module has not been compiled with CUDA support. When you run CMake on a clean catalogue, the status output should say Using NVIDIA driver found in ${driver_dir} and Configuring kernel module with CUDA.

The build script tries to locate the driver automatically, but might fail looking up (for example if the Module.symvers file isn't generated), in which case you probably need to point it to the driver path manually. For example:
cmake .. -DNVIDIA=/usr/src/nvidia-384-384.111

The driver folder also need to contain a file called Module.symvers, if it doesn't you probably need to run make in that directory so that it is generated.

Let me know if this solves your problem or not. :)

Regards,
Jonas

from ssd-gpu-dma.

enfiskutensykkel avatar enfiskutensykkel commented on July 17, 2024

Great to hear :)

When you use the --gpu flag on nvm-latency-bench the CPU is still responsible for submitting commands and processing completions, but the disk is writing data directly into (or reading directly from) GPU memory. So this example only uses the GPUDirect RDMA feature.

It's only for the nvm-cuda-bench that the CPU is entirely out of the control path and everything is controlled by the GPU. The relevant code for this is in benchmarks/cuda/main.cu#, namely the readSingleBuffered and readDoubleBuffered CUDA kernels. In other words, this example uses both Async and RDMA.

Also note that this benchmark has lower bandwidth because it is moving memory from an input buffer into an output buffer (with an offset) in order to emulate a GPU workload.

from ssd-gpu-dma.

ZaidQureshi avatar ZaidQureshi commented on July 17, 2024

By nvidia-cuda-bench did you mean nvm-cuda-bench?
Thank you, actually I am interested in the nvm-cuda-bench example as it completely removes CPU from the control path. I will look at the code that you pointed at. Thank you so much for the hints.

from ssd-gpu-dma.

ZaidQureshi avatar ZaidQureshi commented on July 17, 2024

Actually, I just decided to test nvm-cuda-bench, and either their is an issue in some calculations in the code or maybe something is broken but when I run it I get the following output:

./bin/nvm-cuda-bench --ctrl=/dev/libnvm0 

CUDA device           : 0 Tesla V100-PCIE-16GB (0000:09:00.0)
Controller page size  : 4096 B
Namespace block size  : 512 B
Number of threads     : 32
Chunks per thread     : 32
Pages per chunk       : 1
Total number of pages : 1024
Total number of blocks: 8192
Double buffering      : no
Event time elapsed    : 8.192 µs
Estimated bandwidth   : 512000.001 MiB/s

That is an insanely high bandwidth. My SSD is supposed to have a bandwidth of about 6GB/s.

When I run it with stats set to true all the columns have the value of either 0 or -nan.

from ssd-gpu-dma.

enfiskutensykkel avatar enfiskutensykkel commented on July 17, 2024

I suspect this may be caused by me not having tested for more recent GPUs than Pascal and not compiling SM code for newer architectures.

Please try adding specifying -Dnvidia_archs=70 to CMake and rebuild (you might need to do a make clean first).

Let me know if that works :)

Regards,
Jonas

from ssd-gpu-dma.

enfiskutensykkel avatar enfiskutensykkel commented on July 17, 2024

Hi @ZaidQureshi,

Just a follow up, did setting the nvidia_archs flag work for you?

Regards,
Jonas

from ssd-gpu-dma.

enfiskutensykkel avatar enfiskutensykkel commented on July 17, 2024

I hope that the issue was resolved. Please reopen it or make any additional comments if there is anything else.

from ssd-gpu-dma.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.