Comments (14)
At first I thought this might be due to acos
precision, but now I don't think it is.
@manodeep I think the method we're using to compute delta-theta is not numerically stable. We're doing
DOUBLE costheta = (one - half*sqr_chord_sep);
where sqr_chord_sep
will be approximately (0.02/180*np.pi)**2 = 1.2e-7
for an angular separation of 0.02 deg. Then the costheta
subtraction underflows to 1, unsurprisingly. It seems to me we should be able to do better than this.
Perhaps we could switch to theta = 2*arcsin(C/2)
at some small angle; the expression is exact, but slower because it involves both arcsin
and sqrt
.
Or perhaps we could switch to the small angle approximation theta = C
. For 0.02 deg, the error is only 2e-12. The next term in the expansion is C^3/24
, which is also cheap to compute once we have C
. It should be possible to compute the angle for which the error due to 1-C^2
is greater than the error due to the small angle approximation, and use that as the switchover point.
Or perhaps there's another trig trick that avoids 1-C^2
?
from corrfunc.
Thanks @JonLoveday for reporting the issue! It's not entirely unexpected - computing with floating points numbers can cause such seemingly odd behaviour. Corrfunc was not designed to be resilient against such numerical issues - for example, rpavg/weightavg
values can differ between float and double precision.
What I would also be curious to know is if a brute-force python computation with float-precision produced results more consistent with double-precision.
@lgarrison Yeah
from corrfunc.
Thanks @lgarrison and @manodeep! In the past, I have always used theta = 2*arcsin(C/2). It just occurred to me that you can avoid both the arcsin and sqrt calculations by converting the bins from theta -> 4 sin^2(theta/2).
from corrfunc.
Currently Corrfunc can compute angular separations between [0, 180] - i.e.,
from corrfunc.
sin^2(theta/2) is a unique mapping of theta over [0, 180]
from corrfunc.
Right!
from corrfunc.
Thanks for the great suggestion, @JonLoveday! I made a proof-of-concept in #297. The biggest hurdle remaining is probably deciding how to compute theta for purposes of returning the average binned theta. We could bite the bullet and do 2*arcsin(sqrt(C^2)/2)
... but for most angles arccos(1 - C^2/2)
is probably accurate enough, and theta = C
might suffice for small angles (maybe with an additional C^3/24
term).
On the other hand, whatever we choose has to be implemented in the SIMD kernels, which makes a theta computation with two branches less appealing.
I've updated the title to reflect the underlying issue.
from corrfunc.
Thanks for the great suggestion, @JonLoveday! I made a proof-of-concept in #297. The biggest hurdle remaining is probably deciding how to compute theta for purposes of returning the average binned theta. We could bite the bullet and do
2*arcsin(sqrt(C^2)/2)
... but for most anglesarccos(1 - C^2/2)
is probably accurate enough, andtheta = C
might suffice for small angles (maybe with an additionalC^3/24
term).
I think for most practical purposes, it would be good enough to calculate the average binned C^2 value, and then transform to an average theta.
from corrfunc.
While looking at #301, I realised that we could easily recast the current calculation to be:
costheta := x1*x2 + y1*y2 + z1*z2
and remove the entire chord_sep
calculation. That should be better behaved and much easier to implement within the existing framework
from corrfunc.
I think this issue is less about avoiding the chord_sep
calculation and more about choice of variables. If cos(theta) is the variable, no matter how you compute it, it's going to have very poor precision for small angles. A dot product wouldn't fix the original issue, for example—the cosine of an angle less than ~0.02 degrees is indistinguishable from 1 in float32, regardless of how you get there. So I don't think we'd see much gain from a dot-product approach, but I'd be happy to be proven wrong!
from corrfunc.
I thought that the issue was arising because 1 - 0.5*C^2 for small C was numerically unstable, and switching to the dot product would add terms that were all of similar magnitude, and hence better-behaved numerically.
@lgarrison I can confirm that your assertion is correct - even with the dot-product method (in #303), the second bin still contains 0 elements for a float32
calculation. Now I understand what is going on - the difference between the two bin-edges for the second bin is 3e-8 and is smaller than float-eps (few*10^-7) - therefore, the bin can not be populated with a float32
. Maybe that should be the criteria to switch to a different variable or even check that condition is satisfied or even "refuse" to compute in float32
?
from corrfunc.
Modifying the reproducer slightly to check the bins:
import numpy as np
def corrfunc_test(nran=10000, tmin=0.01, tmax=10, nbins=20):
"""Random-random pair count test."""
bins = np.geomspace(tmin, tmax, nbins + 1)
for typ in [np.float32, np.float64]:
cosbins = np.cos(bins*np.pi/180., dtype=typ)
diff = np.diff(cosbins)
print("float type = {} max. diff = {}".format(typ, max(diff)))
produces
float type = <class 'numpy.float32'> max. diff = 0.0
float type = <class 'numpy.float64'> max. diff = -1.5158711841323225e-08
from corrfunc.
Maybe that should be the criteria to switch to a different variable or even check that condition is satisfied or even "refuse" to compute in float32?
I think using sep^2 as the variable (i.e. #297) is generally a better method, since we don't care as much about precision at large angles. I don't even think we'd need to switch between variables. However, for computing thetaavg, different methods are probably better for different separations (#296 (comment)), which is annoying for SIMD kernels.
And the warnings in #299 are all tunable; I chose a 1% loss of precision in theta as the threshold, but this can be changed or promoted to an error as desired.
from corrfunc.
@lgarrison I don't quite follow why . The 0-counts issue that @JonLoveday has reported is occurring because the second bin has zero width when costheta
is computed with 32-bit floats and obviously there can not be any pairs in a bin when the associated bin-width is zero. We should add a check within Corrfunc that errors out when a zero-width bin is passed (and the call from DDtheta_mocks would pass the relevant precision costheta
bins)
Separately, there is the issue of what's the best numerical method to use with 32-bit floats. To me, the calculation of costheta
as a dot product seems to be a better candidate than the calculation with (1-C^2/2)
since it is possible that C << 1
. If you see in #303, I have completely removed the chord_sep
calculation within the innermost loop, and done the costheta
calculation as a dot-product. On my M2 laptop, the new DDtheta fallback kernels run faster than the current ones with chord_sep
- i.e., we might be getting faster and more precise with the dot-product.
Regardless of the method, we will need an inverse trig call when computing theta-averages - so I am not sure whether the timings will differ too much between arccos()
and arcsin(sqrt(...))
from corrfunc.
Related Issues (20)
- script hangs when using unbuffered output HOT 17
- separation average HOT 7
- ld: library not found for -liomp5 when installing HOT 5
- DDsmu and DDsmu_mocks give different results HOT 4
- cz vs z check and fixing by speed of light on comoving distances HOT 2
- test failure after pip install HOT 8
- Can't install Corrfunc on google colab HOT 2
- pytest fails on local supercomputer (Skylake cpus) HOT 3
- Custom compiler instruction for pip does not work HOT 2
- Corrfunc DDtheta_mocks DEC range HOT 4
- Some questions about API and integrated test suite HOT 4
- GitHub CI runners need to be updated
- Add asv benchmarks HOT 1
- M2 compiling issues HOT 22
- Help with installing on macOS Ventura 13 HOT 3
- Xi from LRGs.dat HOT 3
- Option to use S/DGEMM within lattice-pairs HOT 11
- Readthedocs build fails on changing the CPython interpreter to 3.x HOT 1
- Add new AVX/SSE calls for cos/sin functions HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from corrfunc.