Comments (9)
Good idea. I haven't tried it yet, but I was planning on re-running some of the experiments, and I'll make sure to try it out.
I'll let you know!
from pyhpc-benchmarks.
Great, thanks!
from pyhpc-benchmarks.
It seems like using fastmath
leads to a 10-20% performance improvement, and does produce the same outputs (as measured by np.allclose
).
The following timings are with fastmath=True
. You can compare the Δ
values to the original benchmarks from the Readme.
Equation of state
benchmarks.equation_of_state
============================
Running on CPU
size backend calls mean stdev min 25% median 75% max Δ
------------------------------------------------------------------------------------------------------------------
4,096 numba 10,000 0.001 0.001 0.001 0.001 0.001 0.001 0.015 2.727
4,096 numpy 1,000 0.002 0.001 0.002 0.002 0.002 0.002 0.016 1.000
16,384 numba 10,000 0.003 0.001 0.002 0.002 0.002 0.002 0.017 2.787
16,384 numpy 1,000 0.007 0.001 0.007 0.007 0.007 0.007 0.020 1.000
65,536 numba 1,000 0.010 0.001 0.010 0.010 0.010 0.010 0.023 4.127
65,536 numpy 100 0.040 0.002 0.037 0.040 0.040 0.041 0.046 1.000
262,144 numba 1,000 0.035 0.001 0.035 0.035 0.035 0.035 0.055 5.014
262,144 numpy 100 0.177 0.006 0.171 0.173 0.176 0.177 0.204 1.000
1,048,576 numba 100 0.146 0.001 0.144 0.145 0.146 0.147 0.149 5.387
1,048,576 numpy 10 0.787 0.014 0.779 0.781 0.783 0.784 0.827 1.000
4,194,304 numba 10 0.573 0.004 0.571 0.572 0.572 0.573 0.585 6.106
4,194,304 numpy 10 3.501 0.016 3.490 3.492 3.494 3.496 3.537 1.000
16,777,216 numba 10 2.050 0.001 2.048 2.049 2.049 2.050 2.052 6.609
16,777,216 numpy 10 13.546 0.004 13.538 13.543 13.546 13.549 13.552 1.000
Isoneutral mixing
benchmarks.isoneutral_mixing
============================
Running on CPU
size backend calls mean stdev min 25% median 75% max Δ
------------------------------------------------------------------------------------------------------------------
4,096 numba 10,000 0.001 0.000 0.001 0.001 0.001 0.001 0.007 3.707
4,096 numpy 1,000 0.004 0.000 0.004 0.004 0.004 0.004 0.009 1.000
16,384 numba 1,000 0.005 0.000 0.005 0.005 0.005 0.005 0.009 2.657
16,384 numpy 1,000 0.014 0.000 0.014 0.014 0.014 0.014 0.019 1.000
65,536 numba 1,000 0.024 0.000 0.024 0.024 0.024 0.024 0.028 2.329
65,536 numpy 100 0.057 0.001 0.056 0.056 0.056 0.057 0.061 1.000
262,144 numba 100 0.096 0.003 0.092 0.094 0.096 0.096 0.110 2.384
262,144 numpy 100 0.228 0.007 0.223 0.223 0.224 0.232 0.254 1.000
1,048,576 numba 10 0.446 0.000 0.446 0.446 0.446 0.446 0.446 2.556
1,048,576 numpy 10 1.141 0.004 1.136 1.137 1.139 1.144 1.149 1.000
4,194,304 numba 10 1.834 0.002 1.828 1.835 1.835 1.835 1.836 2.634
4,194,304 numpy 10 4.832 0.013 4.814 4.819 4.834 4.842 4.851 1.000
16,777,216 numba 10 7.476 0.009 7.465 7.470 7.474 7.478 7.499 3.016
16,777,216 numpy 10 22.551 0.028 22.508 22.536 22.548 22.572 22.600 1.000
Turbulent kinetic energy
benchmarks.turbulent_kinetic_energy
===================================
Running on CPU
size backend calls mean stdev min 25% median 75% max Δ
------------------------------------------------------------------------------------------------------------------
4,096 numba 10,000 0.001 0.000 0.001 0.001 0.001 0.001 0.003 1.891
4,096 numpy 1,000 0.002 0.000 0.002 0.002 0.002 0.002 0.004 1.000
16,384 numba 1,000 0.004 0.000 0.004 0.004 0.004 0.004 0.005 1.763
16,384 numpy 1,000 0.007 0.000 0.007 0.007 0.007 0.007 0.009 1.000
65,536 numba 1,000 0.013 0.000 0.012 0.013 0.013 0.013 0.014 2.020
65,536 numpy 1,000 0.026 0.000 0.026 0.026 0.026 0.026 0.030 1.000
262,144 numba 100 0.042 0.001 0.041 0.042 0.042 0.042 0.044 2.384
262,144 numpy 100 0.100 0.002 0.099 0.100 0.100 0.100 0.110 1.000
1,048,576 numba 100 0.176 0.002 0.168 0.176 0.176 0.177 0.185 3.035
1,048,576 numpy 10 0.534 0.001 0.533 0.533 0.534 0.534 0.537 1.000
4,194,304 numba 10 0.676 0.007 0.668 0.670 0.675 0.678 0.693 3.011
4,194,304 numpy 10 2.034 0.004 2.029 2.032 2.033 2.035 2.046 1.000
16,777,216 numba 10 2.571 0.002 2.566 2.570 2.571 2.572 2.573 3.942
16,777,216 numpy 10 10.135 0.003 10.131 10.132 10.135 10.137 10.140 1.000
from pyhpc-benchmarks.
Thanks! Very interesting, 10-20% is quite significant. So by glancing at the Readme shows that it would put numba ahead of others in a few places more here and there.
I always use numba with fastmath=True
, and so far had never a problem with it.
Thanks again!
from pyhpc-benchmarks.
This was a good run down of -fastmath for the gcc compiler.
https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do
In most cases, fastmath wont matter in terms of results, but in some use cases it will have a great effect. I think it is typically used by default and turned off when problems arise.
from pyhpc-benchmarks.
Thanks @leonfoks , that is a great resource. Now the interesting question is: is --ffast-math
with gcc
the same as -fastmath
with numba
(hence llvm
)?
from pyhpc-benchmarks.
This is the answer, I assume: https://llvm.org/docs/LangRef.html#fast-math-flags
Also interesting to see that in some cases it can mean as up to twice as fast: https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#fastmath (for a simple thing as a summed square root).
from pyhpc-benchmarks.
One more question @dionhaefner : Is your NumPy compiled with Intel’s MKL? (Did you use pip
or conda
to install NumPy?) This would have a huge influence on most backends not just Numba, I assume; at least also for the pure NumPy results (https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#linear-algebra).
And, in the case of numba, having icc_rt
installed could help too, conda install -c numba icc_rt
(https://numba.pydata.org/numba-doc/dev/user/performance-tips.html#intel-svml).
from pyhpc-benchmarks.
These benchmarks don't use any linear algebra (apart from a small part of the TKE benchmark), so I wouldn't expect them to make any calls to BLAS or MKL.
I haven't tried icc_rt
, but it sounds interesting. Maybe I'll do a test run, soon.
from pyhpc-benchmarks.
Related Issues (10)
- Add TPU support
- Compare with DaCe framework? HOT 4
- Compare with an MLIR-based stencil DSL HOT 1
- Compare with TACO Python binding HOT 1
- Adding Transonic to comparison HOT 1
- Adding Weld to comparison
- deprecated jax ops: index_update HOT 1
- turbulent_kinetic_energy returns inconsistent results HOT 5
- Error while installing GPU environment HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyhpc-benchmarks.