Code Monkey home page Code Monkey logo

hpl-ai's Introduction

hpl-ai's People

Contributors

wu-kan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

hpl-ai's Issues

感谢作者放出可用的 HPL-AI

工作中需要在FPGA上实现一个支持FP64-FP16混合精度的加速器,基于您基础上我只需要少量修改就可以完成交付了

HPL failed residual checks on linux-centos7-aarch64

image

$ spack debug report
* **Spack:** 0.16.1
* **Python:** 3.7.5
* **Platform:** linux-centos7-aarch64
$ spack find -v --loaded
==> 26 installed packages
-- linux-centos7-aarch64 / [email protected] ----------------------------
[email protected]~binutils~bootstrap~nvptx~piclibs~strip languages=c,c++,fortran patches=b620e9257ec6b9845f961931b0aa92c35b37e72b15d88ee392c7b67620ebaa2f,dc1ca240b7fb70112ae6cc47cd86925adf78d29ed9d0c26b0c51d52e40ceca0e
[email protected]
[email protected]
[email protected]
[email protected] patches=7a6dd71bcda4803d6b89612706a17b8816e1acd5dd9bf1bec29cf748f3b60008
[email protected]+optimize+pic+shared

-- linux-centos7-aarch64 / [email protected] ----------------------------
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]~cuda~ipo+openmp+shared build_type=RelWithDebInfo cuda_arch=none
[email protected]
[email protected]~cairo~cuda~gl~libudev+libxml2~netloc~nvml+pci+shared
[email protected]
[email protected]
[email protected]
[email protected]~python
[email protected]+sigsegv patches=3877ab548f88597ab2327a2230ee048d2d07ace1062efe81fc92e91b7f39cd00,fc9b61654a3ba1a8d6cd78ce087e7c96366c290bc8d2c299f09828d793b853c8
[email protected]~symlinks+termlib
[email protected] patches=4e1d78cbbb85de625bad28705e748856033eaafab92a66dffd383a3d7e00cc94
[email protected]~consistent_fpcsr~ilp64+pic+shared threads=openmp
[email protected]~atomics~cuda~cxx~cxx_exceptions+gpfs~java~legacylaunchers~lustre~memchecker~pmi~singularity~sqlite3+static~thread_multiple+vt+wrapper-rpath fabrics=none schedulers=none
[email protected]+cpanm+shared+threads
[email protected]
[email protected]~pic
[email protected]+optimize+pic+shared

对hpl-ai+ascend的运行结果有点困惑

我在昇腾设备上运行了hpl-ai+ascend代码,成功获得了PASS的结果。我知道大概步骤是先用半精度获得一个近似的结果,然后用数值迭代的方法获得满足精度要求的最终结果。我找出了代码中数值迭代的代码HPL_pir,我让程序直接跳过HPL_pir,我本以为会获得一个近似但不准确的结果,但输出是这样的:

================================================================================
HPL-AI Mixed-Precision Benchmark v2.3d  --  April 23, 2021
Written by Junkang Huang and Kan Wu,  Sun Yat-sen University
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   32768 
NB     :    4096     4096 
PMAP   : Row-major process mapping
P      :       2 
Q      :       4 
PFACT  :   Right 
NBMIN  :       2 
NDIV   :       2 
RFACT  :   Right 
BCAST  :   1ring 
DEPTH  :       1 
SWAP   : Spread-roll (long)
L1     : no-transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 16 HPLAI_T_AFLOAT precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR10R2R2       32768  4096     2     4              39.43             5.9493e+02
HPLAI_pdgesv() start time Thu Aug 26 04:52:00 2021

HPLAI_pdgesv() end time   Thu Aug 26 04:52:40 2021

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.74877907e+11 ...... FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =           0.499998
||A||_oo . . . . . . . . . . . . . . . . . . . =      114629.501221
||A||_1  . . . . . . . . . . . . . . . . . . . =       74376.728001
||x||_oo . . . . . . . . . . . . . . . . . . . =           0.000000
||x||_1  . . . . . . . . . . . . . . . . . . . =           0.000000
||b||_oo . . . . . . . . . . . . . . . . . . . =           0.499998
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR10R2R2       32768  4096     2     4              33.78             6.9446e+02
HPLAI_pdgesv() start time Thu Aug 26 04:52:42 2021

HPLAI_pdgesv() end time   Thu Aug 26 04:53:16 2021

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.74877907e+11 ...... FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =           0.499998
||A||_oo . . . . . . . . . . . . . . . . . . . =      114629.501221
||A||_1  . . . . . . . . . . . . . . . . . . . =       74376.728001
||x||_oo . . . . . . . . . . . . . . . . . . . =           0.000000
||x||_1  . . . . . . . . . . . . . . . . . . . =           0.000000
||b||_oo . . . . . . . . . . . . . . . . . . . =           0.499998
================================================================================

Finished      2 tests with the following results:
              0 tests completed and passed residual checks,
              2 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

x的范数是0,似乎x并没有计算出来。
目前我不确定是我编译运行的操作问题,还是我对代码的理解有问题,还是代码存在bug?
我的操作步骤是这样的:

spack unload -a
spack load [email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected] arch=linux-ubuntu18.04-aarch64

autoreconf -ivf
export DDK_PATH=/usr/local/Ascend/ascend-toolkit/latest/arm64-linux
./configure LIBS="-L$DDK_PATH/fwkacllib/lib64 -lascendcl -lacl_op_compiler" CPPFLAGS="-I$DDK_PATH/fwkacllib/include -DHPLAI_ACL_BLASPP_GEMM"
make -j
OMP_NUM_THREADS=24 $(which mpirun) -n 8 testing/xhpl_ai

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.