manodeep / corrfunc Goto Github PK
View Code? Open in Web Editor NEW⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Home Page: https://corrfunc.readthedocs.io
License: MIT License
⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
Home Page: https://corrfunc.readthedocs.io
License: MIT License
When calling countpairs_rp_pi_mocks with lower redshift bin edges close to 0.1 or below, Corrfunc returns the error:
Error in file: ../../utils/gridlink_mocks.c func: gridlink3D line: 593 with expression `idec >=0 && idec < ngrid_dec[iz]'
Declination index for particle position = -1 must be within [0, 0) for cz-bin = 0
even when called on catalogs with >1 galaxy, and whose declination range is non-zero.
Needs a theory_dtype
and a mocks_dtype
in Corrfunc/__init__.py
and corresponding sed
within the Makefiles in the two python_bindings
directories. Similar to the ones I have already in place for installing the headers.
Currently the codes simply exit without any regards to cleaning up the memory. There is a simple way to register an atexit
function that cleans up on exit()
. However, the C interface only allows the function pointer to atexit
to be void (*fp)(void)
, which means typically all the pointers would need to be global. I see (at least) two ways of implementing the cleanup:
atexit
function that simply loops over the allocated nelements
in the pointer to pointer array and does the appropriate cleanup.There is virtually no error-handling in the C extensions for python.
That way, any possible version of the code can be invoked at runtime. This behaviour would also require a wrapper over all API functions. Particularly, DOUBLE_PREC
and PERIODIC
options need to be removed.
The python layer would check the type of the supplied arrays + keywords and call the appropriate function. Or better yet, call the underlying C wrapper which then resolves the correct routine.
In a reasonably likely scenario, the compiler might be capable of supporting AVX instructions but the CPU is not. In such cases, there will be a runtime error: "Illegal instruction".
I have encountered this already on the TRAVIS OSX Workers and there is manual fix in the common.mk
file
Currently, python setup.py build
just runs make
under the hood but ignores command-line arguments. For instance, if the compiler is set in the command-line argument, then that compiler should be used. Should be straightforward as calling make argv
, once argv has been parsed
Headers like defs.h
, function_precision.h
need to copied to appropriate locations. Should be straightforward to specify in MANIFEST.in
. Other option is to recursively read the library header, e.g., countpairs.h
and include all of the non-standard included files (and the includes from the includes and so on).
Modularize the codes such that the highest available instruction set is selected automatically at run-time. Will require some parts of pairwise to be ported over
I think this happens since the io/utils files are specified in each of the required Makefiles, rather than being made once. Multiple threads try to create, e.g., io.o and ends up corrupting the object file. Probably use make -j4 just to be on the safe-side.
Conda has a system of relative paths - so building with conda python on osx results in a runtime error. I have outlined some of the fixes in the FAQ. Best option that works most of the time is to set the DYLD_FALLBACK_LIBRARY_PATH.
Solution would be to have a conda package that users can do "conda install Corrfunc" and conda will handle the relative paths issue
Seems to go away when I disable LINK_IN_RA and LINK_IN_DEC (so only LINK_IN_CZ) is enabled. Haven't figure out what causes it. Quite possibly the same bug as astropy/halotools#218
HT @duncandc - I had forgotten that this issue needed to be fixed.
All of the python and numpy checks run on each include of the common.mk file. The solution is to create a phony target that is a dependency for the python_bindings Makefiles. This way, python checks will only be performed while building the python_bindings library.
make clean or make distclean should absolutely not check any dependencies. So, the install requirements need to be split up into a C library building/installing as well.
There is no point constructing all of the lattices for (say) N <= 100
. All functions need a brute-force algorithm for really low particle number.
Not sure what kind (Hilbert/H-order/Z-order) would work the best for correlation functions. Might be best to add it to the v2.0 branch.
There is a lot of repetition of rules in the Makefiles. One easy way to fix that would be to have a common naming scheme and then add the rules into common.mk
. The include common.mk
line would have to moved to the end of Makefiles, after all the sources and types of targets have been assigned.
Currently, all the count*driver
functions dispatch to the function compiled with the highest instruction set (e.g., when AVX
and SSE
are both available, only the AVX
function will be called). By using varargs
, this should be implementable reasonably easily.
Create a setup.py such that the usual workflow of pip install or conda install works. Ideally, the package should be called `Corrfunc' and contain submodules for the theory and mocks routines.
* Corrfunc
> __init__.py
> xi_theory
>> countpairs
>> countpairs_rp_pi
>> countpairs_xi
>> countpairs_wp
>> countspheres_vpf
> xi_mocks
>> countpairs_rp_pi_mocks
>> countpairs_theta_mocks
>> countspheres_vpf_mocks
For the calculations that involve converting from redshift (or cz
) to co-moving distance, the conversion depends on the cosmology. Therefore, the relevant codes DDrppi_mocks
and vpf_mocks
should either let the user pass in the co-moving distances directly or let the user specify the cosmology. Currently, cosmology has to be implemented directly in the code itself -- in utils/cosmology_params.c
under the function init_cosmology
. This change will also remove the need for the globals declared/defined under utils/cosmology_params.[ch]
.
Both solutions should be implemented. The first one (user does the calculation) is easy: just check the flag is_comoving_dist
in struct config_options
defined in utils/defs.h
.
The second one (cosmology specified at runtime) will need a little more attention. There is no machinery at the command-line executables to handle cosmology. And I really would like to keep the python API identical to the command-line.
Running the codes produces some details about the data-set + progress info. Add ability in Makefile to suppress all such output.
Ability to do so already exists for utils/gridlink.c
-- if the compilation option -DSILENT
is present, then the info output is suppressed. Just needs to be propagated to the individual functions in countpairs*.c
files. All the calls to progressbar + variables used by progressbar just need to be within #ifndef SILENT
. Ideally, the progressbar.c dependency should be removed if the -DSILENT
option is in effect.
Thanks to @andrew-zentner for pointing this out.
Affects very low number of particles (obviously). wp
and xi
are affected
RTD
builds somehow broke and now the API page does not display any more. Wiping and will rebuild in ~ 2 days. H/T @aphearin
When the compiler is not set in the command-line, common.mk uses the default compiler (typically "cc").
Seems that if the C function is running in the main python thread, then it is cumbersome to capture Ctrl-C. Easiest work-around might be spawn two python threads and run the calculations in the background thread while the main thread continues to run. If a KeyBoardInterrupt (or some other exception) is raised, then the main thread can send a SIGINT to the C function.
Since this v2.0.0
is the first one with docs, I need to create a stable release and functioning docs.
Creating a separate issue for conda. I am having serious trouble creating a conda installable package -- probably because I do all of the symbol resolution + relative paths myself.
The repo has ballooned to 200 MB - needs to be aggressively cleaned. Looks like this might have a solution but requires rewriting the entire history and rebasing all clones.
Sometimes make -j4 will result in a crash while compiling the codes in the python_bindings library. Most of the crashes are related to the vpf library not being built yet (usually the vpf library is the one that is building simultaneously but has not finished yet).
The codes would be so much faster on the GPU. Unfortunately, I hardly know anything about GPUs
Reported by @andrew-zentner. Make crashes with error:
make: *** No rule to make target
|', needed by install'. Stop.
$ make --version
GNU Make version 3.79.1, by Richard Stallman and Roland McGrath.
Built for i386-apple-darwin8.5.1
Mac OS 10.7.5
The implementation of assign_ngb_cells
is very sub-optimal and requires allocating a totncells^2*8
bytes array. For small rmax, totncells = NLATMAX * NLATMAX * NLATMAX = 10^6
. Therefore, the code wants to allocate a 10^12 * 8 ~ 7.5 GiB
array. Not only is this wasteful, performance is severely compromised. The assign_ngb_cells
takes ~3seconds while the actual pair-counting takes only ~0.5s.
This issue came up while trying to compare to the range_search
routine in @mlpack.
Missed the factor of 4.0/3.0 in the volume normalization
Travis CI seems to never run with a "missing config" whenever I add in os: osx in the .travis.yml file.
The README provides a link to the following gist, in which timing comparisons are made to several publicly available codes providing pair-counters: https://gist.github.com/manodeep/cffd9a5d77510e43ccf0.
These timings no longer reflect the version of halotools that is up on pip, v0.4. The halotools function called in this gist is no longer in the repo. Its equivalent is npairs_3d
, which can be imported as follows:
from halotools.mock_observables.pair_counters import npairs_3d
The API of npairs_3d
and npairs
is identical, but the performance has improved qualitatively.
For Npts = (8e4, 1e5, 5e5, 1e6), the quoted timings are times = (2.109, 2.821, 51.567, 203.456). However, when I use v0.4, I get times = (0.247, 0.351, 6.62, 26.3), about 8x faster than the quoted timings. Note that the v0.4 halotools timings are only ~20% slower than the quoted numbers for "Corrfunc naive", and range from 35%-2.5x slower than "Corrfunc AVX".
Of course, the exact comparison can only be made properly on the same machine, and probably Corrfunc has sped up since these timings, too. So those timings should be updated before more can be said. However, in the Corrfunc README, it is claimed that Corrfunc is "at least an order of magnitude faster than all existing public codes". In light of the above, that no longer seems like a fair claim to make. Please consider revisiting this claim after updating the timings.
All the codes currently create a duplicate data-set for the particles (inside each cell). This can probably be avoided by creating another temporary index for the particles that contains the cell-id, and then sorting the particles on cell-id (and freeing the temporary index). Probably will lead to better cache-behaviour as well.
The fix requires implementing a new cellarray struct that keeps track of the start/end locations for particles per cell. Then a new gridlink has to be coded up to assign the temporary index, sort the particles based on the index (using SGLIB to simultaneously sort the X/Y/Z arrays) and then another loop to find the start and end indices for particles in a cell.
cellstruct should probably contain just the start and end indices - just need to be careful about off-by one errors in the for loops inside the relevant countpairs_* functions.
OpenMP support is available with clang >= 3.7 -- but need to account for the variety of disguised compilers on Macs (for instance, where gcc
is actually clang
, clang
is Apple clang and not the LLVM version, macports
vs brew
clang ...and things I do not know about)
Fix to be implemented in common.mk
The implementation is fairly straightforward but the memory requirements are quite large! So, the distance separation needs to be a fairly small fraction of the doman.
All of the APIs return a heap-allocated struct result
. This has potential for a memory-leak, in case the user does not free the memory afterwards. I am probably doing that from the python interfaces. An easy fix would be to replace those with stack-allocated structs, this would require replacing all results->
with results.
I have encountered two cases where python is set via an alias. There is essentially no foolproof way of solving this alias issue -- since aliases are not available in non-interactive shells (as are the ones run by make
with $(shell)
.
PYTHON_VERSION_FULL := $(wordlist 2,4,$(subst ., ,$(shell python --version 2>&1)))
However, the user needs to be at least warned about this scenario, and a potential fix suggested. One way to solve it is to alias | grep python
, and then checking if the RHS contains python
as the last word. In that case, the user has to replace all instances of python with this RHS.
Or, I could just see if there is an alias and define a variable PYTHONCOMMAND appropriately and then use PYTHONCOMMAND exclusively in place of python
.
Once the code compiles, the next steps are unclear. Thanks to @aphearin
The *.ff
data files under data
directory are all written on litte-endian
systems. ftread.c
needs to be modified to always convert from little-endian
to host-order
before using the data read in from disk.
Easy to replace. But would probably change the output of DDrppi_mocks
and vpf_mocks
and the correct
result for the tests would have to be updated.
Call to set_cosmo_dist
should also then pass the max. cz
required. Removes the need for macro MAX_REDSHIFT_FOR_COSMO_DIST
set in set_cosmo_dist.h
I was using corrfunc and I ran into this error:
linklist> ERROR: nlattice = 2 is so small that with periodic wrapping the same cells will be counted twice ....exiting
Which was a bit hard to parse. I eventually pieced together that the binfile I had written contained a bin edge that was too large relative to the size of my box. It would be helpful is the error message could say this, rather than sending me to the source code.
v2.0
needs to have an additional thin struct
wrapper for passing extra quantities (e.g., weights for particles). This will break the v1
API, since all countpairs*
routines will need to accept this extra input arg.
I have no idea how this is happening. Must be a race-condition -- tests pass cleanly (10+ times) on my laptop.
Should be trivial with pandoc
. Need to adjust setup.py
to no longer run the conversion while creating the source distribution
The flag -Wa,-q
causes compile to break. This flag is supposed to use the clang assembler when gcc is the compiler. However, when clang is the underlying compiler even though gcc is invoked, the compilation will crash. Thanks to @aphearin for pointing this out.
The fix is to check for clang
being the underlying compiler even when CC=gcc
. Needs to be implemented in common.mk
.
Looking at countpairs.c
under xi_theory/xi_of_r
, the quadruple for loop is in the wrong order. It should be i
, then j
, then ii
and then jj
.
The C extension module is only valid for python2 and will fail for python3
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.