Comments (12)
Thank you for your responses. Your suggestions worked, I believe it was the issue of the missing Module.symvers file.
By the way, I have tested your code on kernel 4.15 and it seems to be working fine.
This is not really an issue, but I have a question: with the cuda-latency test or the nvm-latency with --gpu flag, is CPU compeltely out of the data and control path as you are using both GPUDirect RDMA and ASYNC?
Thanks.
from ssd-gpu-dma.
Hi,
Could you try passing --gpu 0
as argument to nvm-latency-bench and post the results?
Also, please run dmesg
and post any output from the libnvm helper kernel module (if there is any).
It could also be worth trying to build the project in debug mode (using -DCMAKE_BUILD_TYPE=Debug
as argument to CMake. Have you verified that the iommu is disabled?
Regards,
Jonas
from ssd-gpu-dma.
Could you also verify that the IOMMU is disabled? (For example show the output of cat /proc/cmdline | grep iommu
). I plan on implementing support for it in newer kernels, but haven't gotten so far yet.
Regards,
Jonas
from ssd-gpu-dma.
When I run nvm-latency-bench I get the following output
./bin/nvm-latency-bench --ctrl=/dev/libnvm0 --blocks=1000 --queue="no=128,location=local" --gpu 0
Resetting controller... DONE
Preparing queues... DONE
Preparing buffers and transfer lists... FAIL
Unexpected runtime error: Failed to map device memory for controller: Invalid argument
This is the output after I recompiled with debug mode
./bin/nvm-latency-bench --ctrl=/dev/libnvm0 --blocks=1000 --queue="no=128,location=local" --gpu 0
Resetting controller... DONE
Preparing queues... DONE
Preparing buffers and transfer lists... [map_memory] Page mapping kernel request failed: Invalid argument
FAIL
Unexpected runtime error: Failed to map device memory for controller: Invalid argument
The output in the kernel log after running the above program is
[Aug 3 09:50] Unknown ioctl command from process 28198: 1075347458
IOMMU should be disabled as there is no output printed for cat /proc/cmdline | grep iommu
from ssd-gpu-dma.
Hi,
It appears that the kernel module has not been compiled with CUDA support. When you run CMake on a clean catalogue, the status output should say Using NVIDIA driver found in ${driver_dir}
and Configuring kernel module with CUDA
.
The build script tries to locate the driver automatically, but might fail looking up (for example if the Module.symvers file isn't generated), in which case you probably need to point it to the driver path manually. For example:
cmake .. -DNVIDIA=/usr/src/nvidia-384-384.111
The driver folder also need to contain a file called Module.symvers
, if it doesn't you probably need to run make in that directory so that it is generated.
Let me know if this solves your problem or not. :)
Regards,
Jonas
from ssd-gpu-dma.
Great to hear :)
When you use the --gpu
flag on nvm-latency-bench
the CPU is still responsible for submitting commands and processing completions, but the disk is writing data directly into (or reading directly from) GPU memory. So this example only uses the GPUDirect RDMA feature.
It's only for the nvm-cuda-bench
that the CPU is entirely out of the control path and everything is controlled by the GPU. The relevant code for this is in benchmarks/cuda/main.cu#, namely the readSingleBuffered
and readDoubleBuffered
CUDA kernels. In other words, this example uses both Async and RDMA.
Also note that this benchmark has lower bandwidth because it is moving memory from an input buffer into an output buffer (with an offset) in order to emulate a GPU workload.
from ssd-gpu-dma.
By nvidia-cuda-bench did you mean nvm-cuda-bench?
Thank you, actually I am interested in the nvm-cuda-bench example as it completely removes CPU from the control path. I will look at the code that you pointed at. Thank you so much for the hints.
from ssd-gpu-dma.
Actually, I just decided to test nvm-cuda-bench, and either their is an issue in some calculations in the code or maybe something is broken but when I run it I get the following output:
./bin/nvm-cuda-bench --ctrl=/dev/libnvm0
CUDA device : 0 Tesla V100-PCIE-16GB (0000:09:00.0)
Controller page size : 4096 B
Namespace block size : 512 B
Number of threads : 32
Chunks per thread : 32
Pages per chunk : 1
Total number of pages : 1024
Total number of blocks: 8192
Double buffering : no
Event time elapsed : 8.192 µs
Estimated bandwidth : 512000.001 MiB/s
That is an insanely high bandwidth. My SSD is supposed to have a bandwidth of about 6GB/s.
When I run it with stats set to true all the columns have the value of either 0 or -nan.
from ssd-gpu-dma.
I suspect this may be caused by me not having tested for more recent GPUs than Pascal and not compiling SM code for newer architectures.
Please try adding specifying -Dnvidia_archs=70
to CMake and rebuild (you might need to do a make clean
first).
Let me know if that works :)
Regards,
Jonas
from ssd-gpu-dma.
Hi @ZaidQureshi,
Just a follow up, did setting the nvidia_archs
flag work for you?
Regards,
Jonas
from ssd-gpu-dma.
I hope that the issue was resolved. Please reopen it or make any additional comments if there is anything else.
from ssd-gpu-dma.
Related Issues (20)
- Add support for larger IO queues HOT 3
- Investigate P2P support in the Linux DMA API for latest kernel release HOT 2
- ROCm support ? HOT 5
- Build and Binding the helper driver HOT 1
- Incorrect use of DMA API HOT 1
- Kernel module doesn't clean up resources if program crashes or doesn't release handles HOT 3
- Does --verify option works? HOT 11
- Change benchmark statistics to per command for nvm-latency-bench
- Floating Point Exception HOT 16
- Sperating SQ, CQ, and PRP List Memories HOT 2
- nvm-cuda-bench infintiely waiting for IO completion HOT 1
- Invalid NSID HOT 29
- Does CQ and SQ memory need to be contiguous HOT 1
- Cmake output saying 'Configuring kernel module without CUDA' HOT 39
- Unexpected error: Unexpected CUDA error: an illegal memory access was encountered HOT 6
- nvm-identify run error HOT 7
- nvm-cuda-bench failed as "an illegal memory access was encountered" HOT 5
- Can not find "nvm-latency-bench" in build/bin HOT 4
- Issue with multiple queues for latency benchmark
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ssd-gpu-dma.