Code Monkey home page Code Monkey logo

Comments (9)

dionhaefner avatar dionhaefner commented on May 25, 2024

Good idea. I haven't tried it yet, but I was planning on re-running some of the experiments, and I'll make sure to try it out.

I'll let you know!

from pyhpc-benchmarks.

prisae avatar prisae commented on May 25, 2024

Great, thanks!

from pyhpc-benchmarks.

dionhaefner avatar dionhaefner commented on May 25, 2024

It seems like using fastmath leads to a 10-20% performance improvement, and does produce the same outputs (as measured by np.allclose).

The following timings are with fastmath=True. You can compare the Δ values to the original benchmarks from the Readme.

Equation of state

benchmarks.equation_of_state
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.001     0.001     0.001     0.001     0.001     0.015     2.727
       4,096  numpy          1,000     0.002     0.001     0.002     0.002     0.002     0.002     0.016     1.000

      16,384  numba         10,000     0.003     0.001     0.002     0.002     0.002     0.002     0.017     2.787
      16,384  numpy          1,000     0.007     0.001     0.007     0.007     0.007     0.007     0.020     1.000

      65,536  numba          1,000     0.010     0.001     0.010     0.010     0.010     0.010     0.023     4.127
      65,536  numpy            100     0.040     0.002     0.037     0.040     0.040     0.041     0.046     1.000

     262,144  numba          1,000     0.035     0.001     0.035     0.035     0.035     0.035     0.055     5.014
     262,144  numpy            100     0.177     0.006     0.171     0.173     0.176     0.177     0.204     1.000

   1,048,576  numba            100     0.146     0.001     0.144     0.145     0.146     0.147     0.149     5.387
   1,048,576  numpy             10     0.787     0.014     0.779     0.781     0.783     0.784     0.827     1.000

   4,194,304  numba             10     0.573     0.004     0.571     0.572     0.572     0.573     0.585     6.106
   4,194,304  numpy             10     3.501     0.016     3.490     3.492     3.494     3.496     3.537     1.000

  16,777,216  numba             10     2.050     0.001     2.048     2.049     2.049     2.050     2.052     6.609
  16,777,216  numpy             10    13.546     0.004    13.538    13.543    13.546    13.549    13.552     1.000

Isoneutral mixing

benchmarks.isoneutral_mixing
============================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.007     3.707
       4,096  numpy          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.009     1.000

      16,384  numba          1,000     0.005     0.000     0.005     0.005     0.005     0.005     0.009     2.657
      16,384  numpy          1,000     0.014     0.000     0.014     0.014     0.014     0.014     0.019     1.000

      65,536  numba          1,000     0.024     0.000     0.024     0.024     0.024     0.024     0.028     2.329
      65,536  numpy            100     0.057     0.001     0.056     0.056     0.056     0.057     0.061     1.000

     262,144  numba            100     0.096     0.003     0.092     0.094     0.096     0.096     0.110     2.384
     262,144  numpy            100     0.228     0.007     0.223     0.223     0.224     0.232     0.254     1.000

   1,048,576  numba             10     0.446     0.000     0.446     0.446     0.446     0.446     0.446     2.556
   1,048,576  numpy             10     1.141     0.004     1.136     1.137     1.139     1.144     1.149     1.000

   4,194,304  numba             10     1.834     0.002     1.828     1.835     1.835     1.835     1.836     2.634
   4,194,304  numpy             10     4.832     0.013     4.814     4.819     4.834     4.842     4.851     1.000

  16,777,216  numba             10     7.476     0.009     7.465     7.470     7.474     7.478     7.499     3.016
  16,777,216  numpy             10    22.551     0.028    22.508    22.536    22.548    22.572    22.600     1.000

Turbulent kinetic energy

benchmarks.turbulent_kinetic_energy
===================================
Running on CPU

size          backend     calls     mean      stdev     min       25%       median    75%       max       Δ
------------------------------------------------------------------------------------------------------------------
       4,096  numba         10,000     0.001     0.000     0.001     0.001     0.001     0.001     0.003     1.891
       4,096  numpy          1,000     0.002     0.000     0.002     0.002     0.002     0.002     0.004     1.000

      16,384  numba          1,000     0.004     0.000     0.004     0.004     0.004     0.004     0.005     1.763
      16,384  numpy          1,000     0.007     0.000     0.007     0.007     0.007     0.007     0.009     1.000

      65,536  numba          1,000     0.013     0.000     0.012     0.013     0.013     0.013     0.014     2.020
      65,536  numpy          1,000     0.026     0.000     0.026     0.026     0.026     0.026     0.030     1.000

     262,144  numba            100     0.042     0.001     0.041     0.042     0.042     0.042     0.044     2.384
     262,144  numpy            100     0.100     0.002     0.099     0.100     0.100     0.100     0.110     1.000

   1,048,576  numba            100     0.176     0.002     0.168     0.176     0.176     0.177     0.185     3.035
   1,048,576  numpy             10     0.534     0.001     0.533     0.533     0.534     0.534     0.537     1.000

   4,194,304  numba             10     0.676     0.007     0.668     0.670     0.675     0.678     0.693     3.011
   4,194,304  numpy             10     2.034     0.004     2.029     2.032     2.033     2.035     2.046     1.000

  16,777,216  numba             10     2.571     0.002     2.566     2.570     2.571     2.572     2.573     3.942
  16,777,216  numpy             10    10.135     0.003    10.131    10.132    10.135    10.137    10.140     1.000

from pyhpc-benchmarks.

prisae avatar prisae commented on May 25, 2024

Thanks! Very interesting, 10-20% is quite significant. So by glancing at the Readme shows that it would put numba ahead of others in a few places more here and there.

I always use numba with fastmath=True, and so far had never a problem with it.

Thanks again!

from pyhpc-benchmarks.

leonfoks avatar leonfoks commented on May 25, 2024

This was a good run down of -fastmath for the gcc compiler.

https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do

In most cases, fastmath wont matter in terms of results, but in some use cases it will have a great effect. I think it is typically used by default and turned off when problems arise.

from pyhpc-benchmarks.

prisae avatar prisae commented on May 25, 2024

Thanks @leonfoks , that is a great resource. Now the interesting question is: is --ffast-math with gcc the same as -fastmath with numba (hence llvm)?

from pyhpc-benchmarks.

prisae avatar prisae commented on May 25, 2024

This is the answer, I assume: https://llvm.org/docs/LangRef.html#fast-math-flags

Also interesting to see that in some cases it can mean as up to twice as fast: https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#fastmath (for a simple thing as a summed square root).

from pyhpc-benchmarks.

prisae avatar prisae commented on May 25, 2024

One more question @dionhaefner : Is your NumPy compiled with Intel’s MKL? (Did you use pip or conda to install NumPy?) This would have a huge influence on most backends not just Numba, I assume; at least also for the pure NumPy results (https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#linear-algebra).

And, in the case of numba, having icc_rt installed could help too, conda install -c numba icc_rt (https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#intel-svml).

from pyhpc-benchmarks.

dionhaefner avatar dionhaefner commented on May 25, 2024

These benchmarks don't use any linear algebra (apart from a small part of the TKE benchmark), so I wouldn't expect them to make any calls to BLAS or MKL.

I haven't tried icc_rt, but it sounds interesting. Maybe I'll do a test run, soon.

from pyhpc-benchmarks.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.