Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

This is the answer, I assume: <a href="https://llvm.org/docs/LangRef.html#fast-math-fl

One more question <a class="user-mention notranslate" data-hovercard-type="user" data-

fastmath about pyhpc-benchmarks HOT 9 CLOSED

prisae commented on May 25, 2024

fastmath

from pyhpc-benchmarks.

Comments (9)

dionhaefner commented on May 25, 2024

Good idea. I haven't tried it yet, but I was planning on re-running some of the experiments, and I'll make sure to try it out.

I'll let you know!

from pyhpc-benchmarks.

prisae commented on May 25, 2024

Great, thanks!

from pyhpc-benchmarks.

dionhaefner commented on May 25, 2024

It seems like using fastmath leads to a 10-20% performance improvement, and does produce the same outputs (as measured by np.allclose).

The following timings are with fastmath=True. You can compare the Δ values to the original benchmarks from the Readme.

Equation of state

benchmarks.equation_of_state
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.001     0.001     0.001     0.001     0.001     0.015     2.727
       4,096  numpy          1,000     0.002     0.001     0.002     0.002     0.002     0.002     0.016     1.000

      16,384  numba         10,000     0.003     0.001     0.002     0.002     0.002     0.002     0.017     2.787
      16,384  numpy          1,000     0.007     0.001     0.007     0.007     0.007     0.007     0.020     1.000

      65,536  numba          1,000     0.010     0.001     0.010     0.010     0.010     0.010     0.023     4.127
      65,536  numpy            100     0.040     0.002     0.037     0.040     0.040     0.041     0.046     1.000

     262,144  numba          1,000     0.035     0.001     0.035     0.035     0.035     0.035     0.055     5.014
     262,144  numpy            100     0.177     0.006     0.171     0.173     0.176     0.177     0.204     1.000

   1,048,576  numba            100     0.146     0.001     0.144     0.145     0.146     0.147     0.149     5.387
   1,048,576  numpy             10     0.787     0.014     0.779     0.781     0.783     0.784     0.827     1.000

   4,194,304  numba             10     0.573     0.004     0.571     0.572     0.572     0.573     0.585     6.106
   4,194,304  numpy             10     3.501     0.016     3.490     3.492     3.494     3.496     3.537     1.000

  16,777,216  numba             10     2.050     0.001     2.048     2.049     2.049     2.050     2.052     6.609
  16,777,216  numpy             10    13.546     0.004    13.538    13.543    13.546    13.549    13.552     1.000

Isoneutral mixing

benchmarks.isoneutral_mixing
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.007     3.707
       4,096  numpy          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.009     1.000

      16,384  numba          1,000     0.005     0.000     0.005     0.005     0.005     0.005     0.009     2.657
      16,384  numpy          1,000     0.014     0.000     0.014     0.014     0.014     0.014     0.019     1.000

      65,536  numba          1,000     0.024     0.000     0.024     0.024     0.024     0.024     0.028     2.329
      65,536  numpy            100     0.057     0.001     0.056     0.056     0.056     0.057     0.061     1.000

     262,144  numba            100     0.096     0.003     0.092     0.094     0.096     0.096     0.110     2.384
     262,144  numpy            100     0.228     0.007     0.223     0.223     0.224     0.232     0.254     1.000

   1,048,576  numba             10     0.446     0.000     0.446     0.446     0.446     0.446     0.446     2.556
   1,048,576  numpy             10     1.141     0.004     1.136     1.137     1.139     1.144     1.149     1.000

   4,194,304  numba             10     1.834     0.002     1.828     1.835     1.835     1.835     1.836     2.634
   4,194,304  numpy             10     4.832     0.013     4.814     4.819     4.834     4.842     4.851     1.000

  16,777,216  numba             10     7.476     0.009     7.465     7.470     7.474     7.478     7.499     3.016
  16,777,216  numpy             10    22.551     0.028    22.508    22.536    22.548    22.572    22.600     1.000

Turbulent kinetic energy

benchmarks.turbulent_kinetic_energy
===================================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.003     1.891
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.004     1.000

      16,384  numba          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.005     1.763
      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.009     1.000

      65,536  numba          1,000     0.013     0.000     0.012     0.013     0.013     0.013     0.014     2.020
      65,536  numpy          1,000     0.026     0.000     0.026     0.026     0.026     0.026     0.030     1.000

     262,144  numba            100     0.042     0.001     0.041     0.042     0.042     0.042     0.044     2.384
     262,144  numpy            100     0.100     0.002     0.099     0.100     0.100     0.100     0.110     1.000

   1,048,576  numba            100     0.176     0.002     0.168     0.176     0.176     0.177     0.185     3.035
   1,048,576  numpy             10     0.534     0.001     0.533     0.533     0.534     0.534     0.537     1.000

   4,194,304  numba             10     0.676     0.007     0.668     0.670     0.675     0.678     0.693     3.011
   4,194,304  numpy             10     2.034     0.004     2.029     2.032     2.033     2.035     2.046     1.000

  16,777,216  numba             10     2.571     0.002     2.566     2.570     2.571     2.572     2.573     3.942
  16,777,216  numpy             10    10.135     0.003    10.131    10.132    10.135    10.137    10.140     1.000

from pyhpc-benchmarks.

prisae commented on May 25, 2024

Thanks! Very interesting, 10-20% is quite significant. So by glancing at the Readme shows that it would put numba ahead of others in a few places more here and there.

I always use numba with fastmath=True, and so far had never a problem with it.

Thanks again!

from pyhpc-benchmarks.

leonfoks commented on May 25, 2024

This was a good run down of -fastmath for the gcc compiler.

https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do

In most cases, fastmath wont matter in terms of results, but in some use cases it will have a great effect. I think it is typically used by default and turned off when problems arise.

from pyhpc-benchmarks.

prisae commented on May 25, 2024

Thanks @leonfoks , that is a great resource. Now the interesting question is: is --ffast-math with gcc the same as -fastmath with numba (hence llvm)?

from pyhpc-benchmarks.

prisae commented on May 25, 2024

This is the answer, I assume: https://llvm.org/docs/LangRef.html#fast-math-flags

Also interesting to see that in some cases it can mean as up to twice as fast: https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#fastmath (for a simple thing as a summed square root).

from pyhpc-benchmarks.

prisae commented on May 25, 2024

One more question @dionhaefner : Is your NumPy compiled with Intel’s MKL? (Did you use pip or conda to install NumPy?) This would have a huge influence on most backends not just Numba, I assume; at least also for the pure NumPy results (https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#linear-algebra).

And, in the case of numba, having icc_rt installed could help too, conda install -c numba icc_rt (https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#intel-svml).

from pyhpc-benchmarks.

dionhaefner commented on May 25, 2024

These benchmarks don't use any linear algebra (apart from a small part of the TKE benchmark), so I wouldn't expect them to make any calls to BLAS or MKL.

I haven't tried icc_rt, but it sounds interesting. Maybe I'll do a test run, soon.

from pyhpc-benchmarks.

fastmath about pyhpc-benchmarks HOT 9 CLOSED

Comments (9)

Equation of state

Isoneutral mixing

Turbulent kinetic energy

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent