Code Monkey home page Code Monkey logo

Comments (10)

tbennun avatar tbennun commented on September 26, 2024

It looks like cmake is not finding CUBLAS. What is your cmake version? Is CUBLAS installed (it should be)?

from mgbench.

whytehuang avatar whytehuang commented on September 26, 2024

Hi tbennun,
I try to use RHEL8.0(AARCH64) still same, cmake version: cmake-3.11.4-3.el8.aarch64.
RHEL8.1's cmake is same, 3.11.4-3.el8.

About CUBLAS, I only use YUM to install nvidia-driver-cuda and cuda.

from mgbench.

tbennun avatar tbennun commented on September 26, 2024

the tests require CUBLAS too (in order to exhaustively utilize the GPU in some benchmarks). Please try to install CUBLAS as well and see if it works.

If you do not want to install CUBLAS, comment out these three lines:
https://github.com/tbennun/mgbench/blob/master/CMakeLists.txt#L88-L90
and every line in run.sh that has sgemm in it.

from mgbench.

whytehuang avatar whytehuang commented on September 26, 2024

Hi Thennun
Can you tell me what package have CUBLAS, I use "yum install CUBLAS" can't see it.

from mgbench.

tbennun avatar tbennun commented on September 26, 2024

I don't know, sorry. My best guess would be to go to: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

from mgbench.

whytehuang avatar whytehuang commented on September 26, 2024

Hi tbennun
Sorry for late,
I use command: yum list cublas
show
[root@localhost mgbench]# yum list cublas
Updating Subscription Management repositories.
Unable to read consumer identity
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Last metadata expiration check: 1:29:43 ago on Mon 02 Mar 2020 12:54:16 AM EST.
Installed Packages
libcublas-devel.aarch64 10.2.2.91-1 @cuda-10-2-local-10.2.91-435.17.01
libcublas10.aarch64 10.2.2.91-1 @cuda-10-2-local-10.2.91-435.17.01

Does this tool support ARM platform?(OS:RHEL8.1)

from mgbench.

tbennun avatar tbennun commented on September 26, 2024

yes, it's standard C++, CUDA and CMake, and as far as I remember, it was tested on the TX1.

Try to remove this line: https://github.com/tbennun/mgbench/blob/master/CMakeLists.txt#L89
and instead, replace the line after it with:

target_link_libraries(sgemm gflags-static cublas ${EXTRA_LIBS})

Does that work?

from mgbench.

whytehuang avatar whytehuang commented on September 26, 2024

Dear tbennun
follow your step, it can work!
So this is correct step? because I execute run.sh and check L1-log found some strange performance.
looks like DMA not support or disable?

Install
[root@localhost mgbench]# ./build.sh
mkdir: cannot create directory ‘build’: File exists
-- Configuring done
-- Generating done
-- Build files have been written to: /root/mgbench/build
Scanning dependencies of target devinfo
Scanning dependencies of target gflags-static
Scanning dependencies of target numgpus
Scanning dependencies of target gflags_nothreads-static
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags-static.dir/src/gflags_completions.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags-static.dir/src/gflags.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags_nothreads-static.dir/src/gflags.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags_nothreads-static.dir/src/gflags_reporting.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags-static.dir/src/gflags_reporting.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags_nothreads-static.dir/src/gflags_completions.cc.o
[ 30%] Building CXX object CMakeFiles/devinfo.dir/src/L0/devinfo.cpp.o
[ 30%] Building CXX object CMakeFiles/numgpus.dir/src/L0/numgpus.cpp.o
[ 34%] Linking CXX executable numgpus
[ 34%] Built target numgpus
[ 38%] Linking CXX executable devinfo
[ 38%] Built target devinfo
[ 42%] Linking CXX static library lib/libgflags.a
[ 42%] Built target gflags-static
[ 50%] Building NVCC (Device) object CMakeFiles/sgemm.dir/src/L2/sgemm/sgemm_generated_sgemm.cu.o
[ 50%] Building NVCC (Device) object CMakeFiles/gol.dir/src/L2/gol/gol_generated_golsample.cu.o
Scanning dependencies of target fullduplex
Scanning dependencies of target halfduplex
Scanning dependencies of target scatter
[ 53%] Building NVCC (Device) object CMakeFiles/uva.dir/src/L1/uva_generated_uva.cu.o
[ 61%] Building CXX object CMakeFiles/halfduplex.dir/src/L1/halfduplex.cpp.o
[ 61%] Building CXX object CMakeFiles/fullduplex.dir/src/L1/fullduplex.cpp.o
[ 65%] Building CXX object CMakeFiles/scatter.dir/src/L1/scatter.cpp.o
[ 69%] Linking CXX static library lib/libgflags_nothreads.a
[ 69%] Built target gflags_nothreads-static
[ 73%] Linking CXX executable fullduplex
[ 76%] Linking CXX executable halfduplex
[ 76%] Built target fullduplex
[ 76%] Built target halfduplex
[ 80%] Linking CXX executable scatter
[ 80%] Built target scatter
Scanning dependencies of target uva
[ 84%] Linking CXX executable uva
[ 84%] Built target uva
Scanning dependencies of target gol
[ 88%] Building CXX object CMakeFiles/gol.dir/src/L2/gol/main.cpp.o
[ 92%] Linking CXX executable gol
[ 92%] Built target gol
Scanning dependencies of target sgemm
[ 96%] Building CXX object CMakeFiles/sgemm.dir/src/L2/sgemm/main.cpp.o
[100%] Linking CXX executable sgemm
[100%] Built target sgemm
[root@localhost mgbench]#./run.sh
Number of GPUs: 3
Found nvidia-smi at /usr/bin/nvidia-smi

L0 diagnostics

1/2 Computer information
2/2 Device information

L1 Tests

1/8 Half-duplex (unidirectional) memory copy
2/8 Full-duplex (bidirectional) memory copy
3/8 Half-duplex DMA Read
4/8 Full-duplex DMA Read
5/8 Half-duplex DMA Write
6/8 Full-duplex DMA Write
7/8 Scatter-Gather
8/8 Scaling

L2 Tests

1/7 Matrix multiplication (correctness)
2/7 Matrix multiplication (performance, single precision)
3/7 Matrix multiplication (performance, double precision)
4/7 Stencil (correctness)
5/7 Stencil (performance)
6/7 Stencil (single GPU correctness)
7/7 Cooling

Result:
l0-devices:

DMA access:
| 1 2 3
---+---------
1 | x 0 0
2 | 0 x 0
3 | 0 0 x

l1-uvafull
Exchanging between GPU 0 and GPU 1: No DMA
Exchanging between GPU 0 and GPU 2: No DMA
Exchanging between GPU 1 and GPU 2: No DMA
l1-uvahalf

Copying from host to GPU 0: 11884.63 MB/s (8.414230 ms)
Copying from host to GPU 1: 11912.26 MB/s (8.394710 ms)
Copying from host to GPU 2: 11888.95 MB/s (8.411170 ms)
Copying from GPU 0 to GPU 1: No DMA
Copying from GPU 0 to GPU 2: No DMA
Copying from GPU 1 to GPU 0: No DMA
Copying from GPU 1 to GPU 2: No DMA
Copying from GPU 2 to GPU 0: No DMA
Copying from GPU 2 to GPU 1: No DMA
l1-uvawfull
Exchanging between GPU 0 and GPU 1: No DMA
Exchanging between GPU 0 and GPU 2: No DMA
Exchanging between GPU 1 and GPU 2: No DMA
l1-uvawhalf
Copying from GPU 0 to host: 9900.07 MB/s (10.100940 ms)
Copying from GPU 1 to host: 12925.89 MB/s (7.736410 ms)
Copying from GPU 2 to host: 12925.17 MB/s (7.736840 ms)
Copying from GPU 1 to GPU 0: No DMA
Copying from GPU 2 to GPU 0: No DMA
Copying from GPU 0 to GPU 1: No DMA
Copying from GPU 2 to GPU 1: No DMA
Copying from GPU 0 to GPU 2: No DMA
Copying from GPU 1 to GPU 2: No DMA

from mgbench.

tbennun avatar tbennun commented on September 26, 2024

Great! I'll modify the CMake file if the CUBLAS fix works everywhere for me.

I don't know what is the issue behind the DMA problem. The configuration is highly platform dependent, and you may need to change some BIOS settings. Try running nvidia-smi topo -m and see what hardware capabilities are enabled.

As this is out of the scope of mgbench, is this issue considered resolved?

from mgbench.

whytehuang avatar whytehuang commented on September 26, 2024

Dear Tbennun, thanks for your support, this issue is fix, you can closed this issue.
Many thanks.

from mgbench.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.