wu-kan / hpl-ai Goto Github PK
View Code? Open in Web Editor NEWAn implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3
Home Page: https://wu-kan.cn/2021/03/14/HPL-AI/
License: MIT License
An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3
Home Page: https://wu-kan.cn/2021/03/14/HPL-AI/
License: MIT License
工作中需要在FPGA上实现一个支持FP64-FP16混合精度的加速器,基于您基础上我只需要少量修改就可以完成交付了
$ spack debug report
* **Spack:** 0.16.1
* **Python:** 3.7.5
* **Platform:** linux-centos7-aarch64
$ spack find -v --loaded
==> 26 installed packages
-- linux-centos7-aarch64 / [email protected] ----------------------------
[email protected]~binutils~bootstrap~nvptx~piclibs~strip languages=c,c++,fortran patches=b620e9257ec6b9845f961931b0aa92c35b37e72b15d88ee392c7b67620ebaa2f,dc1ca240b7fb70112ae6cc47cd86925adf78d29ed9d0c26b0c51d52e40ceca0e
[email protected]
[email protected]
[email protected]
[email protected] patches=7a6dd71bcda4803d6b89612706a17b8816e1acd5dd9bf1bec29cf748f3b60008
[email protected]+optimize+pic+shared
-- linux-centos7-aarch64 / [email protected] ----------------------------
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]~cuda~ipo+openmp+shared build_type=RelWithDebInfo cuda_arch=none
[email protected]
[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared
[email protected]
[email protected]
[email protected]
[email protected]~python
[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8
[email protected]~symlinks+termlib
[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94
[email protected]~consistent_fpcsr~ilp64+pic+shared threads=openmp
[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none
[email protected]+cpanm+shared+threads
[email protected]
[email protected]~pic
[email protected]+optimize+pic+shared
我在昇腾设备上运行了hpl-ai+ascend代码,成功获得了PASS的结果。我知道大概步骤是先用半精度获得一个近似的结果,然后用数值迭代的方法获得满足精度要求的最终结果。我找出了代码中数值迭代的代码HPL_pir,我让程序直接跳过HPL_pir,我本以为会获得一个近似但不准确的结果,但输出是这样的:
================================================================================
HPL-AI Mixed-Precision Benchmark v2.3d -- April 23, 2021
Written by Junkang Huang and Kan Wu, Sun Yat-sen University
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 32768
NB : 4096 4096
PMAP : Row-major process mapping
P : 2
Q : 4
PFACT : Right
NBMIN : 2
NDIV : 2
RFACT : Right
BCAST : 1ring
DEPTH : 1
SWAP : Spread-roll (long)
L1 : no-transposed form
U : transposed form
EQUIL : yes
ALIGN : 16 HPLAI_T_AFLOAT precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2R2 32768 4096 2 4 39.43 5.9493e+02
HPLAI_pdgesv() start time Thu Aug 26 04:52:00 2021
HPLAI_pdgesv() end time Thu Aug 26 04:52:40 2021
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.74877907e+11 ...... FAILED
||Ax-b||_oo . . . . . . . . . . . . . . . . . = 0.499998
||A||_oo . . . . . . . . . . . . . . . . . . . = 114629.501221
||A||_1 . . . . . . . . . . . . . . . . . . . = 74376.728001
||x||_oo . . . . . . . . . . . . . . . . . . . = 0.000000
||x||_1 . . . . . . . . . . . . . . . . . . . = 0.000000
||b||_oo . . . . . . . . . . . . . . . . . . . = 0.499998
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10R2R2 32768 4096 2 4 33.78 6.9446e+02
HPLAI_pdgesv() start time Thu Aug 26 04:52:42 2021
HPLAI_pdgesv() end time Thu Aug 26 04:53:16 2021
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.74877907e+11 ...... FAILED
||Ax-b||_oo . . . . . . . . . . . . . . . . . = 0.499998
||A||_oo . . . . . . . . . . . . . . . . . . . = 114629.501221
||A||_1 . . . . . . . . . . . . . . . . . . . = 74376.728001
||x||_oo . . . . . . . . . . . . . . . . . . . = 0.000000
||x||_1 . . . . . . . . . . . . . . . . . . . = 0.000000
||b||_oo . . . . . . . . . . . . . . . . . . . = 0.499998
================================================================================
Finished 2 tests with the following results:
0 tests completed and passed residual checks,
2 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
x的范数是0,似乎x并没有计算出来。
目前我不确定是我编译运行的操作问题,还是我对代码的理解有问题,还是代码存在bug?
我的操作步骤是这样的:
spack unload -a
spack load [email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected] arch=linux-ubuntu18.04-aarch64
autoreconf -ivf
export DDK_PATH=/usr/local/Ascend/ascend-toolkit/latest/arm64-linux
./configure LIBS="-L$DDK_PATH/fwkacllib/lib64 -lascendcl -lacl_op_compiler" CPPFLAGS="-I$DDK_PATH/fwkacllib/include -DHPLAI_ACL_BLASPP_GEMM"
make -j
OMP_NUM_THREADS=24 $(which mpirun) -n 8 testing/xhpl_ai
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.