Hi, I have a problem, I want to use MGbench at ARM TX2 platform. But when I ex

I don't know, sorry. My best guess would be to go to: <a href="https://docs.nvidia.com

Compiler in ARM TX2 will fail about mgbench HOT 10 CLOSED

whytehuang commented on September 26, 2024

Compiler in ARM TX2 will fail

from mgbench.

Comments (10)

tbennun commented on September 26, 2024

It looks like cmake is not finding CUBLAS. What is your cmake version? Is CUBLAS installed (it should be)?

from mgbench.

whytehuang commented on September 26, 2024

Hi tbennun,
I try to use RHEL8.0(AARCH64) still same, cmake version: cmake-3.11.4-3.el8.aarch64.
RHEL8.1's cmake is same, 3.11.4-3.el8.

About CUBLAS, I only use YUM to install nvidia-driver-cuda and cuda.

from mgbench.

tbennun commented on September 26, 2024

the tests require CUBLAS too (in order to exhaustively utilize the GPU in some benchmarks). Please try to install CUBLAS as well and see if it works.

If you do not want to install CUBLAS, comment out these three lines:
https://github.com/tbennun/mgbench/blob/master/CMakeLists.txt#L88-L90
and every line in run.sh that has sgemm in it.

from mgbench.

whytehuang commented on September 26, 2024

Hi Thennun
Can you tell me what package have CUBLAS, I use "yum install CUBLAS" can't see it.

from mgbench.

tbennun commented on September 26, 2024

I don't know, sorry. My best guess would be to go to: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

from mgbench.

whytehuang commented on September 26, 2024

Hi tbennun
Sorry for late,
I use command: yum list cublas
show
[root@localhost mgbench]# yum list cublas
Updating Subscription Management repositories.
Unable to read consumer identity
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Last metadata expiration check: 1:29:43 ago on Mon 02 Mar 2020 12:54:16 AM EST.
Installed Packages
libcublas-devel.aarch64 10.2.2.91-1 @cuda-10-2-local-10.2.91-435.17.01
libcublas10.aarch64 10.2.2.91-1 @cuda-10-2-local-10.2.91-435.17.01

Does this tool support ARM platform?(OS:RHEL8.1)

from mgbench.

tbennun commented on September 26, 2024

yes, it's standard C++, CUDA and CMake, and as far as I remember, it was tested on the TX1.

Try to remove this line: https://github.com/tbennun/mgbench/blob/master/CMakeLists.txt#L89
and instead, replace the line after it with:

target_link_libraries(sgemm gflags-static cublas ${EXTRA_LIBS})

Does that work?

from mgbench.

whytehuang commented on September 26, 2024

Dear tbennun
follow your step, it can work!
So this is correct step? because I execute run.sh and check L1-log found some strange performance.
looks like DMA not support or disable?

Install
[root@localhost mgbench]# ./build.sh
mkdir: cannot create directory ‘build’: File exists
-- Configuring done
-- Generating done
-- Build files have been written to: /root/mgbench/build
Scanning dependencies of target devinfo
Scanning dependencies of target gflags-static
Scanning dependencies of target numgpus
Scanning dependencies of target gflags_nothreads-static
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags-static.dir/src/gflags_completions.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags-static.dir/src/gflags.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags_nothreads-static.dir/src/gflags.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags_nothreads-static.dir/src/gflags_reporting.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags-static.dir/src/gflags_reporting.cc.o
[ 23%] Building CXX object deps/gflags/CMakeFiles/gflags_nothreads-static.dir/src/gflags_completions.cc.o
[ 30%] Building CXX object CMakeFiles/devinfo.dir/src/L0/devinfo.cpp.o
[ 30%] Building CXX object CMakeFiles/numgpus.dir/src/L0/numgpus.cpp.o
[ 34%] Linking CXX executable numgpus
[ 34%] Built target numgpus
[ 38%] Linking CXX executable devinfo
[ 38%] Built target devinfo
[ 42%] Linking CXX static library lib/libgflags.a
[ 42%] Built target gflags-static
[ 50%] Building NVCC (Device) object CMakeFiles/sgemm.dir/src/L2/sgemm/sgemm_generated_sgemm.cu.o
[ 50%] Building NVCC (Device) object CMakeFiles/gol.dir/src/L2/gol/gol_generated_golsample.cu.o
Scanning dependencies of target fullduplex
Scanning dependencies of target halfduplex
Scanning dependencies of target scatter
[ 53%] Building NVCC (Device) object CMakeFiles/uva.dir/src/L1/uva_generated_uva.cu.o
[ 61%] Building CXX object CMakeFiles/halfduplex.dir/src/L1/halfduplex.cpp.o
[ 61%] Building CXX object CMakeFiles/fullduplex.dir/src/L1/fullduplex.cpp.o
[ 65%] Building CXX object CMakeFiles/scatter.dir/src/L1/scatter.cpp.o
[ 69%] Linking CXX static library lib/libgflags_nothreads.a
[ 69%] Built target gflags_nothreads-static
[ 73%] Linking CXX executable fullduplex
[ 76%] Linking CXX executable halfduplex
[ 76%] Built target fullduplex
[ 76%] Built target halfduplex
[ 80%] Linking CXX executable scatter
[ 80%] Built target scatter
Scanning dependencies of target uva
[ 84%] Linking CXX executable uva
[ 84%] Built target uva
Scanning dependencies of target gol
[ 88%] Building CXX object CMakeFiles/gol.dir/src/L2/gol/main.cpp.o
[ 92%] Linking CXX executable gol
[ 92%] Built target gol
Scanning dependencies of target sgemm
[ 96%] Building CXX object CMakeFiles/sgemm.dir/src/L2/sgemm/main.cpp.o
[100%] Linking CXX executable sgemm
[100%] Built target sgemm
[root@localhost mgbench]#./run.sh
Number of GPUs: 3
Found nvidia-smi at /usr/bin/nvidia-smi

L0 diagnostics

1/2 Computer information
2/2 Device information

L1 Tests

1/8 Half-duplex (unidirectional) memory copy
2/8 Full-duplex (bidirectional) memory copy
3/8 Half-duplex DMA Read
4/8 Full-duplex DMA Read
5/8 Half-duplex DMA Write
6/8 Full-duplex DMA Write
7/8 Scatter-Gather
8/8 Scaling

L2 Tests

1/7 Matrix multiplication (correctness)
2/7 Matrix multiplication (performance, single precision)
3/7 Matrix multiplication (performance, double precision)
4/7 Stencil (correctness)
5/7 Stencil (performance)
6/7 Stencil (single GPU correctness)
7/7 Cooling

Result:
l0-devices:

DMA access:
| 1 2 3
---+---------
1 | x 0 0
2 | 0 x 0
3 | 0 0 x

l1-uvafull
Exchanging between GPU 0 and GPU 1: No DMA
Exchanging between GPU 0 and GPU 2: No DMA
Exchanging between GPU 1 and GPU 2: No DMA
l1-uvahalf

Copying from host to GPU 0: 11884.63 MB/s (8.414230 ms)
Copying from host to GPU 1: 11912.26 MB/s (8.394710 ms)
Copying from host to GPU 2: 11888.95 MB/s (8.411170 ms)
Copying from GPU 0 to GPU 1: No DMA
Copying from GPU 0 to GPU 2: No DMA
Copying from GPU 1 to GPU 0: No DMA
Copying from GPU 1 to GPU 2: No DMA
Copying from GPU 2 to GPU 0: No DMA
Copying from GPU 2 to GPU 1: No DMA
l1-uvawfull
Exchanging between GPU 0 and GPU 1: No DMA
Exchanging between GPU 0 and GPU 2: No DMA
Exchanging between GPU 1 and GPU 2: No DMA
l1-uvawhalf
Copying from GPU 0 to host: 9900.07 MB/s (10.100940 ms)
Copying from GPU 1 to host: 12925.89 MB/s (7.736410 ms)
Copying from GPU 2 to host: 12925.17 MB/s (7.736840 ms)
Copying from GPU 1 to GPU 0: No DMA
Copying from GPU 2 to GPU 0: No DMA
Copying from GPU 0 to GPU 1: No DMA
Copying from GPU 2 to GPU 1: No DMA
Copying from GPU 0 to GPU 2: No DMA
Copying from GPU 1 to GPU 2: No DMA

from mgbench.

tbennun commented on September 26, 2024

Great! I'll modify the CMake file if the CUBLAS fix works everywhere for me.

I don't know what is the issue behind the DMA problem. The configuration is highly platform dependent, and you may need to change some BIOS settings. Try running nvidia-smi topo -m and see what hardware capabilities are enabled.

As this is out of the scope of mgbench, is this issue considered resolved?

from mgbench.

whytehuang commented on September 26, 2024

Dear Tbennun, thanks for your support, this issue is fix, you can closed this issue.
Many thanks.

from mgbench.

Compiler in ARM TX2 will fail about mgbench HOT 10 CLOSED

Comments (10)

L0 diagnostics

L1 Tests

L2 Tests

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent