Code Monkey home page Code Monkey logo

corrfunc's Introduction

Corrfunc logo

Latest Release PyPI Release MIT License GitHub Actions Status Documentation Status Open Issues

Core Infrastructure Best Practices Status

Description

This repo contains a suite of codes to calculate correlation functions and other clustering statistics for simulated galaxies in a cosmological box (co-moving XYZ) and on observed galaxies with on-sky positions (RA, DEC, CZ). Read the documentation on corrfunc.rtfd.io.

Why Should You Use it

  1. Fast Theory pair-counting is 7x faster than SciPy cKDTree, and at least 2x faster than all existing public codes.
  2. OpenMP Parallel All pair-counting codes can be done in parallel (with strong scaling efficiency >~ 95% up to 10 cores)
  3. Python Extensions Python extensions allow you to do the compute-heavy bits using C while retaining all of the user-friendliness of Python.
  4. Weights All correlation functions now support arbitrary, user-specified weights for individual points
  5. Modular The code is written in a modular fashion and is easily extensible to compute arbitrary clustering statistics.
  6. Future-proof As we get access to newer instruction-sets, the codes will get updated to use the latest and greatest CPU features.

If you use the codes for your analysis, please star this repo -- that helps us keep track of the number of users.

Benchmark against Existing Codes

Please see this gist for some benchmarks with current codes. If you have a pair-counter that you would like to compare, please add in a corresponding function and update the timings.

Installation

Pre-requisites

  1. make >= 3.80
  2. OpenMP capable compiler like icc, gcc>=4.6 or clang >= 3.7. If not available, please disable USE_OMP option option in theory.options and mocks.options. On a HPC cluster, consult the cluster documentation for how to load a compiler (often module load gcc or similar). If you are using Corrfunc with Anaconda Python, then conda install gcc (MAC/linux) should work. On MAC, (sudo) port install gcc5 is also an option.
  3. gsl >= 2.4. On an HPC cluster, consult the cluster documentation (often module load gsl will work). With Anaconda Python, use conda install -c conda-forge gsl (MAC/linux). On MAC, you can use (sudo) port install gsl (MAC) if necessary.
  4. python >= 2.7 or python>=3.4 for compiling the CPython extensions.
  5. numpy>=1.7 for compiling the CPython extensions.
$ git clone https://github.com/manodeep/Corrfunc.git
$ cd Corrfunc
$ make
$ make install
$ python -m pip install . [--user]

$ make tests  # run the C tests
$ python -m pip install pytest
$ python -m pytest  # run the Python tests

Assuming you have gcc in your PATH, make and make install should compile and install the C libraries + Python extensions within the source directory. If you would like to install the CPython extensions in your environment, then python -m pip install . [--user] should be sufficient. If you are primarily interested in the Python interface, you can condense all of the steps by using python -m pip install . [--user] --install-option="CC=yourcompiler" after git clone [...] and cd Corrfunc.

Compilation Notes

  • If Python and/or numpy are not available, then the CPython extensions will not be compiled.
  • make install simply copies files into the lib/bin/include sub-directories. You do not need root permissions
  • Default compiler on MAC is set to clang, if you want to specify a different compiler, you will have to call make CC=yourcompiler, make install CC=yourcompiler, make tests CC=yourcompiler etc. If you want to permanently change the default compiler, then please edit the common.mk file in the base directory.
  • If you are directly using python -m pip install . [--user] --install-option="CC=yourcompiler", please run a make distclean beforehand (especially if switching compilers)
  • Please note that Corrfunc is compiling with optimizations for the architecture it is compiled on. That is, it uses gcc -march=native or similar. For this reason, please try to compile Corrfunc on the architecture it will be run on (usually this is only a concern in heterogeneous compute environments, like an HPC cluster with multiple node types). In many cases, you can compile on a more capable architecture (e.g. with AVX-512 support) then run on a less capable architecture (e.g. with only AVX2), because the runtime dispatch will select the appropriate kernel. But the non-kernel elements of Corrfunc may emit AVX-512 instructions due to -march=native. If an Illegal instruction error occurs, then you'll need to recompile on the target architecture.

Installation notes

If compilation went smoothly, please run make tests to ensure the code is working correctly. Depending on the hardware and compilation options, the tests might take more than a few minutes. Note that the tests are exhaustive and not traditional unit tests.

For Python tests, please run python -m pip install pytest and python -m pytest from the Corrfunc root dir.

While we have tried to ensure that the package compiles and runs out of the box, cross-platform compatibility turns out to be incredibly hard. If you run into any issues during compilation and you have all of the pre-requisites, please see the FAQ or email the Corrfunc mailing list. Also, feel free to create a new issue with the Installation label.

Method 2: pip installation

The Python package is directly installable via python -m pip install Corrfunc. However, in that case you will lose the ability to recompile the code. This usually fine if you are only using the Python interface and are on a single machine, like a laptop. For usage on a cluster or other environment with multiple CPU architectures, you may find it more useful to use the Source Installation method above in case you need to compile for a different architecture later.

Testing a pip-installed Corrfunc

You can check that a pip-installed Corrfunc is working with:

$ python -m pytest --pyargs Corrfunc

The pip installation does not include all of the test data contained in the main repo, since it would total over 100 MB and the tests that generate on-the-fly data are similarly exhaustive. pytest will mark tests where the data files are not availabe as "skipped". If you would like to run the data-based tests, please use the Source Installation method.

OpenMP on OSX

Automatically detecting OpenMP support from the compiler and the runtime is a bit tricky. If you run into any issues compiling (or running) with OpenMP, please refer to the FAQ for potential solutions.

Clustering Measures on simulated galaxies

Input data

The input galaxies (or any discrete distribution of points) are derived from a simulation. For instance, the galaxies could be a result of an Halo Occupation Distribution (HOD) model, a Subhalo Abundance matching (SHAM) model, a Semi-Empirical model (SEM), or a Semi-Analytic model (SAM) etc. The input set of points can also be the dark matter halos, or the dark matter particles from a cosmological simulation. The input set of points are expected to have positions specified in Cartesian XYZ.

Types of available clustering statistics

All codes that work on cosmological boxes with co-moving positions are located in the theory directory. The various clustering measures are:

  1. DD -- Measures auto/cross-correlations between two boxes. The boxes do not need to be cubes.
  2. xi -- Measures 3-d auto-correlation in a cubic cosmological box. Assumes PERIODIC boundary conditions.
  3. wp -- Measures auto 2-d point projected correlation function in a cubic cosmological box. Assumes PERIODIC boundary conditions.
  4. DDrppi -- Measures the auto/cross correlation function between two boxes. The boxes do not need to be cubes.
  5. DDsmu -- Measures the auto/cross correlation function between two boxes. The boxes do not need to be cubes.
  6. vpf -- Measures the void probability function + counts-in-cells.

Clustering measures on observed galaxies

Input data

The input galaxies are typically observed galaxies coming from a large-scale galaxy survey. In addition, simulated galaxies that have been projected onto the sky (i.e., where observational systematics have been incorporated and on-sky positions have been generated) can also be used. We generically refer to both these kinds of galaxies as "mocks".

The input galaxies are expected to have positions specified in spherical co-ordinates with at least right ascension (RA) and declination (DEC). For spatial correlation functions, an approximate "co-moving" distance (speed of light multiplied by redshift, CZ) is also required.

Types of available clustering statistics

All codes that work on mock catalogs (RA, DEC, CZ) are located in the mocks directory. The various clustering measures are:

  1. DDrppi_mocks -- The standard auto/cross correlation between two data sets. The outputs, DD, DR and RR can be combined using wprp to produce the Landy-Szalay estimator for wp(rp).
  2. DDsmu_mocks -- The standard auto/cross correlation between two data sets. The outputs, DD, DR and RR can be combined using the Python utility convert_3d_counts_to_cf to produce the Landy-Szalay estimator for xi(s, mu).
  3. DDtheta_mocks -- Computes angular correlation function between two data sets. The outputs from DDtheta_mocks need to be combined with wtheta to get the full omega(theta)
  4. vpf_mocks -- Computes the void probability function on mocks.

Science options

If you plan to use the command-line, then you will have to specify the code runtime options at compile-time. For theory routines, these options are in the file theory.options while for the mocks, these options are in file mocks.options.

Note All options can be specified at runtime if you use the Python interface or the static libraries. Each one of the following Makefile option has a corresponding entry for the runtime libraries.

Theory (in theory.options)

  1. PERIODIC (ignored in case of wp/xi) -- switches periodic boundary conditions on/off. Enabled by default.
  2. OUTPUT_RPAVG -- switches on output of <rp> in each rp bin. Can be a massive performance hit (~ 2.2x in case of wp). Disabled by default.

Mocks (in mocks.options)

  1. OUTPUT_RPAVG -- switches on output of <rp> in each rp bin for DDrppi_mocks. Enabled by default.
  2. OUTPUT_THETAAVG -- switches on output of in each theta bin. Can be extremely slow (~5x) depending on compiler, and CPU capabilities. Disabled by default.
  3. LINK_IN_DEC -- creates binning in declination for DDtheta_mocks. Please check that for your desired limits \theta, this binning does not produce incorrect results (due to numerical precision). Generally speaking, if your \thetamax (the max. \theta to consider pairs within) is too small (probaly less than 1 degree), then you should check with and without this option. Errors are typically sub-percent level.
  4. LINK_IN_RA -- creates binning in RA once binning in DEC has been enabled for DDtheta_mocks. Same numerical issues as LINK_IN_DEC
  5. FAST_ACOS -- Relevant only when OUTPUT_THETAAVG is enabled for DDtheta_mocks. Disabled by default. An arccos is required to calculate <\theta>. In absence of vectorized arccos (intel compiler, icc provides one via intel Short Vector Math Library), this calculation is extremely slow. However, we can approximate arccos using polynomials (with Remez Algorithm). The approximations are taken from implementations released by Geometric Tools. Depending on the level of accuracy desired, this implementation of fast acos can be tweaked in the file utils/fast_acos.h. An alternate, less accurate implementation is already present in that file. Please check that the loss of precision is not important for your use-case.
  6. COMOVING_DIST -- Currently there is no support in Corrfunc for different cosmologies. However, for the mocks routines like, DDrppi_mocks and vpf_mocks, cosmology parameters are required to convert between redshift and co-moving distance. Both DDrppi_mocks and vpf_mocks expects to receive a redshift array as input; however, with this option enabled, the redshift array will be assumed to contain already converted co-moving distances. So, if you have redshifts and want to use an arbitrary cosmology, then convert the redshifts into co-moving distances, enable this option, and pass the co-moving distance array into the routines.

Common Code options for both Mocks and Theory

  1. DOUBLE_PREC -- switches on calculations in double precision. Calculations are performed in double precision when enabled. This option is disabled by default in theory and enabled by default in the mocks routines.
  2. USE_OMP -- uses OpenMP parallelization. Scaling is great for DD (close to perfect scaling up to 12 threads in our tests) and okay (runtime becomes constant ~6-8 threads in our tests) for DDrppi and wp. Enabled by default. The Makefile will compare the CC variable with known OpenMP enabled compilers and set compile options accordingly. Set in common.mk by default.
  3. ENABLE_MIN_SEP_OPT -- uses some further optimisations based on the minimum separation between pairs of cells. Enabled by default.
  4. COPY_PARTICLES -- whether or not to create a copy of the particle positions (and weights, if supplied). Enabled by default (copies of the particle arrays are created)
  5. FAST_DIVIDE -- Disabled by default. Divisions are slow but required DDrppi_mocks(r_p,\pi), DDsmu_mocks(s, \mu) and DD(s, \mu). Enabling this option, replaces the divisions with a reciprocal followed by a Newton-Raphson. The code will run ~20% faster at the expense of some numerical precision. Please check that the loss of precision is not important for your use-case.

Optimization for your architecture

  1. The values of bin_refine_factor and/or zbin_refine_factor in the countpairs\_\*.c files control the cache-misses, and consequently, the runtime. In trial-and-error methods, Manodeep has seen any values larger than 3 are generally slower for theory routines but can be faster for mocks. But some different combination of 1/2 for (z)bin_refine_factor might be faster on your platform.
  2. If you are using the angular correlation function and need thetaavg, you might benefit from using the INTEL MKL library. The vectorized trigonometric functions provided by MKL can provide significant speedup.

Running the codes

Read the documentation on corrfunc.rtfd.io.

Using the command-line interface

Navigate to the correct directory. Make sure that the options, set in either theory.options or mocks.options in the root directory are what you want. If not, edit those two files (and possibly common.mk), and recompile. Then, you can use the command-line executables in each individual subdirectory corresponding to the clustering measure you are interested in. For example, if you want to compute the full 3-D correlation function, \xi(r), then run the executable theory/xi/xi. If you run executables without any arguments, the program will output a message with all the required arguments.

Calling from C

Look under the run_correlations.c and run_correlations_mocks.c to see examples of calling the C API directly. If you run the executables, run_correlations and run_correlations_mocks, the output will also show how to call the command-line interface for the various clustering measures.

Calling from Python

If all went well, the codes can be directly called from python. Please see call_correlation_functions.py and call_correlation_functions_mocks.py for examples on how to use the CPython extensions directly. Here are a few examples:

from __future__ import print_function
import os.path as path
import numpy as np
import Corrfunc
from Corrfunc.theory import wp

# Setup the problem for wp
boxsize = 500.0
pimax = 40.0
nthreads = 4

# Create a fake data-set.
Npts = 100000
x = np.float32(np.random.random(Npts))
y = np.float32(np.random.random(Npts))
z = np.float32(np.random.random(Npts))
x *= boxsize
y *= boxsize
z *= boxsize

# Setup the bins
rmin = 0.1
rmax = 20.0
nbins = 20

# Create the bins
rbins = np.logspace(np.log10(0.1), np.log10(rmax), nbins + 1)

# Call wp
wp_results = wp(boxsize, pimax, nthreads, rbins, x, y, z, verbose=True, output_rpavg=True)

# Print the results
print("#############################################################################")
print("##       rmin           rmax            rpavg             wp            npairs")
print("#############################################################################")
print(wp_results)

Author & Maintainers

Corrfunc was designed and implemented by Manodeep Sinha, with contributions from Lehman Garrison, Nick Hand, and Arnaud de Mattia. Corrfunc is currently maintained by Manodeep Sinha and Lehman Garrison.

Citing

If you use Corrfunc for research, please cite using the MNRAS code paper with the following bibtex entry:

@ARTICLE{2020MNRAS.491.3022S,
    author = {{Sinha}, Manodeep and {Garrison}, Lehman H.},
    title = "{CORRFUNC - a suite of blazing fast correlation functions on
    the CPU}",
    journal = {\mnras},
    keywords = {methods: numerical, galaxies: general, galaxies:
    haloes, dark matter, large-scale structure of Universe, cosmology:
    theory},
    year = "2020",
    month = "Jan",
    volume = {491},
    number = {2},
    pages = {3022-3041},
    doi = {10.1093/mnras/stz3157},
    adsurl =
    {https://ui.adsabs.harvard.edu/abs/2020MNRAS.491.3022S},
    adsnote = {Provided by the SAO/NASA
    Astrophysics Data System}
}

If you are using Corrfunc v2.3.0 or later, and you benefit from the enhanced vectorised kernels, then please additionally cite this paper:

@InProceedings{10.1007/978-981-13-7729-7_1,
    author="Sinha, Manodeep and Garrison, Lehman",
    editor="Majumdar, Amit and Arora, Ritu",
    title="CORRFUNC: Blazing Fast Correlation Functions with AVX512F SIMD Intrinsics",
    booktitle="Software Challenges to Exascale Computing",
    year="2019",
    publisher="Springer Singapore",
    address="Singapore",
    pages="3--20",
    isbn="978-981-13-7729-7",
    url={https://doi.org/10.1007/978-981-13-7729-7_1}
}

Mailing list

If you have questions or comments about the package, please do so on the mailing list: https://groups.google.com/forum/#!forum/corrfunc

LICENSE

Corrfunc is released under the MIT license. Basically, do what you want with the code, including using it in commercial application.

Project URLs

corrfunc's People

Contributors

aphearin avatar cbyrohl avatar christopher-bradshaw avatar dependabot[bot] avatar gbeltzmo avatar gitter-badger avatar lapereznyc avatar lgarrison avatar manodeep avatar misharash avatar nickhand avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

corrfunc's Issues

Runtime errors with conda python osx

Conda has a system of relative paths - so building with conda python on osx results in a runtime error. I have outlined some of the fixes in the FAQ. Best option that works most of the time is to set the DYLD_FALLBACK_LIBRARY_PATH.

Solution would be to have a conda package that users can do "conda install Corrfunc" and conda will handle the relative paths issue

Allow cosmology to be specified for mock spatial calculations

For the calculations that involve converting from redshift (or cz) to co-moving distance, the conversion depends on the cosmology. Therefore, the relevant codes DDrppi_mocks and vpf_mocks should either let the user pass in the co-moving distances directly or let the user specify the cosmology. Currently, cosmology has to be implemented directly in the code itself -- in utils/cosmology_params.c under the function init_cosmology. This change will also remove the need for the globals declared/defined under utils/cosmology_params.[ch].

Both solutions should be implemented. The first one (user does the calculation) is easy: just check the flag is_comoving_distin struct config_options defined in utils/defs.h.

The second one (cosmology specified at runtime) will need a little more attention. There is no machinery at the command-line executables to handle cosmology. And I really would like to keep the python API identical to the command-line.

Makefile crashes when running with make -j

I think this happens since the io/utils files are specified in each of the required Makefiles, rather than being made once. Multiple threads try to create, e.g., io.o and ends up corrupting the object file. Probably use make -j4 just to be on the safe-side.

If bin edge is too large, throw an informative error.

I was using corrfunc and I ran into this error:

linklist> ERROR: nlattice = 2 is so small that with periodic wrapping the same cells will be counted twice ....exiting

Which was a bit hard to parse. I eventually pieced together that the binfile I had written contained a bin edge that was too large relative to the size of my box. It would be helpful is the error message could say this, rather than sending me to the source code.

Pass on setup.py command-line args to make

Currently, python setup.py build just runs make under the hood but ignores command-line arguments. For instance, if the compiler is set in the command-line argument, then that compiler should be used. Should be straightforward as calling make argv, once argv has been parsed

Replace integration in `set_cosmo_dist` with gsl integration

Easy to replace. But would probably change the output of DDrppi_mocks and vpf_mocks and the correct result for the tests would have to be updated.

Call to set_cosmo_dist should also then pass the max. cz required. Removes the need for macro MAX_REDSHIFT_FOR_COSMO_DIST set in set_cosmo_dist.h

Duplicating the particles in memory

All the codes currently create a duplicate data-set for the particles (inside each cell). This can probably be avoided by creating another temporary index for the particles that contains the cell-id, and then sorting the particles on cell-id (and freeing the temporary index). Probably will lead to better cache-behaviour as well.

The fix requires implementing a new cellarray struct that keeps track of the start/end locations for particles per cell. Then a new gridlink has to be coded up to assign the temporary index, sort the particles based on the index (using SGLIB to simultaneously sort the X/Y/Z arrays) and then another loop to find the start and end indices for particles in a cell.

cellstruct should probably contain just the start and end indices - just need to be careful about off-by one errors in the for loops inside the relevant countpairs_* functions.

Update gist displaying timing comparisons

The README provides a link to the following gist, in which timing comparisons are made to several publicly available codes providing pair-counters: https://gist.github.com/manodeep/cffd9a5d77510e43ccf0.

These timings no longer reflect the version of halotools that is up on pip, v0.4. The halotools function called in this gist is no longer in the repo. Its equivalent is npairs_3d, which can be imported as follows:

from halotools.mock_observables.pair_counters import npairs_3d

The API of npairs_3d and npairs is identical, but the performance has improved qualitatively.
For Npts = (8e4, 1e5, 5e5, 1e6), the quoted timings are times = (2.109, 2.821, 51.567, 203.456). However, when I use v0.4, I get times = (0.247, 0.351, 6.62, 26.3), about 8x faster than the quoted timings. Note that the v0.4 halotools timings are only ~20% slower than the quoted numbers for "Corrfunc naive", and range from 35%-2.5x slower than "Corrfunc AVX".

Of course, the exact comparison can only be made properly on the same machine, and probably Corrfunc has sped up since these timings, too. So those timings should be updated before more can be said. However, in the Corrfunc README, it is claimed that Corrfunc is "at least an order of magnitude faster than all existing public codes". In light of the above, that no longer seems like a fair claim to make. Please consider revisiting this claim after updating the timings.

Travis build fails at times with a parallel build

Sometimes make -j4 will result in a crash while compiling the codes in the python_bindings library. Most of the crashes are related to the vpf library not being built yet (usually the vpf library is the one that is building simultaneously but has not finished yet).

Compilation issues with gcc (where clang is masquerading as gcc)

The flag -Wa,-q causes compile to break. This flag is supposed to use the clang assembler when gcc is the compiler. However, when clang is the underlying compiler even though gcc is invoked, the compilation will crash. Thanks to @aphearin for pointing this out.

The fix is to check for clang being the underlying compiler even when CC=gcc. Needs to be implemented in common.mk.

Avoid resource leak under Ctrl-C

Currently the codes simply exit without any regards to cleaning up the memory. There is a simple way to register an atexit function that cleans up on exit(). However, the C interface only allows the function pointer to atexit to be void (*fp)(void), which means typically all the pointers would need to be global. I see (at least) two ways of implementing the cleanup:

  1. Call a different function within atexit. That function keeps a static pointer and has to be called beforehand to register the pointer(s) to be freed.
  2. Keep a global pointer to an array of pointers and their corresponding cleanup function pointers + arguments. Create an atexit function that simply loops over the allocated nelements in the pointer to pointer array and does the appropriate cleanup.

common.mk runs python/numpy checks from every include directory

All of the python and numpy checks run on each include of the common.mk file. The solution is to create a phony target that is a dependency for the python_bindings Makefiles. This way, python checks will only be performed while building the python_bindings library.

make clean or make distclean should absolutely not check any dependencies. So, the install requirements need to be split up into a C library building/installing as well.

Loop blocking is incorrect

Looking at countpairs.c under xi_theory/xi_of_r, the quadruple for loop is in the wrong order. It should be i, then j, then ii and then jj.

Enable openmp with clang

OpenMP support is available with clang >= 3.7 -- but need to account for the variety of disguised compilers on Macs (for instance, where gcc is actually clang, clang is Apple clang and not the LLVM version, macports vs brew clang ...and things I do not know about)

Fix to be implemented in common.mk

Performance regression for small rmax

The implementation of assign_ngb_cells is very sub-optimal and requires allocating a totncells^2*8 bytes array. For small rmax, totncells = NLATMAX * NLATMAX * NLATMAX = 10^6. Therefore, the code wants to allocate a 10^12 * 8 ~ 7.5 GiB array. Not only is this wasteful, performance is severely compromised. The assign_ngb_cells takes ~3seconds while the actual pair-counting takes only ~0.5s.

This issue came up while trying to compare to the range_search routine in @mlpack.

Python set via an alias

I have encountered two cases where python is set via an alias. There is essentially no foolproof way of solving this alias issue -- since aliases are not available in non-interactive shells (as are the ones run by make with $(shell).

PYTHON_VERSION_FULL := $(wordlist 2,4,$(subst ., ,$(shell python --version 2>&1)))

However, the user needs to be at least warned about this scenario, and a potential fix suggested. One way to solve it is to alias | grep python, and then checking if the RHS contains python as the last word. In that case, the user has to replace all instances of python with this RHS.

Or, I could just see if there is an alias and define a variable PYTHONCOMMAND appropriately and then use PYTHONCOMMAND exclusively in place of python.

Include all headers required during python install

Headers like defs.h, function_precision.h need to copied to appropriate locations. Should be straightforward to specify in MANIFEST.in. Other option is to recursively read the library header, e.g., countpairs.h and include all of the non-standard included files (and the includes from the includes and so on).

Compatibility with Big-Endian systems

The *.ff data files under data directory are all written on litte-endian systems. ftread.c needs to be modified to always convert from little-endian to host-order before using the data read in from disk.

Reorganize Makefiles

There is a lot of repetition of rules in the Makefiles. One easy way to fix that would be to have a common naming scheme and then add the rules into common.mk. The include common.mk line would have to moved to the end of Makefiles, after all the sources and types of targets have been assigned.

Suppress all informational output

Running the codes produces some details about the data-set + progress info. Add ability in Makefile to suppress all such output.

Ability to do so already exists for utils/gridlink.c -- if the compilation option -DSILENT is present, then the info output is suppressed. Just needs to be propagated to the individual functions in countpairs*.c files. All the calls to progressbar + variables used by progressbar just need to be within #ifndef SILENT. Ideally, the progressbar.c dependency should be removed if the -DSILENT option is in effect.

Thanks to @andrew-zentner for pointing this out.

Reduce repo size

The repo has ballooned to 200 MB - needs to be aggressively cleaned. Looks like this might have a solution but requires rewriting the entire history and rebasing all clones.

Make a conda installable python package

Creating a separate issue for conda. I am having serious trouble creating a conda installable package -- probably because I do all of the symbol resolution + relative paths myself.

Order only attributes crashing make

Reported by @andrew-zentner. Make crashes with error:

make: *** No rule to make target |', needed by install'. Stop.

$ make --version
GNU Make version 3.79.1, by Richard Stallman and Roland McGrath.
Built for i386-apple-darwin8.5.1

Mac OS 10.7.5

Make an pip/conda installable python package

Create a setup.py such that the usual workflow of pip install or conda install works. Ideally, the package should be called `Corrfunc' and contain submodules for the theory and mocks routines.

* Corrfunc
    > __init__.py
    > xi_theory
        >> countpairs
        >> countpairs_rp_pi
        >> countpairs_xi
        >> countpairs_wp
        >> countspheres_vpf
    > xi_mocks
        >> countpairs_rp_pi_mocks
        >> countpairs_theta_mocks
        >> countspheres_vpf_mocks

Carry the compiled datatype through to python

Needs a theory_dtype and a mocks_dtype in Corrfunc/__init__.py and corresponding sed within the Makefiles in the two python_bindings directories. Similar to the ones I have already in place for installing the headers.

Remove most flags in Makefile

That way, any possible version of the code can be invoked at runtime. This behaviour would also require a wrapper over all API functions. Particularly, DOUBLE_PREC and PERIODIC options need to be removed.

The python layer would check the type of the supplied arrays + keywords and call the appropriate function. Or better yet, call the underlying C wrapper which then resolves the correct routine.

Ctrl-C does not abort when C extension is running under python

Seems that if the C function is running in the main python thread, then it is cumbersome to capture Ctrl-C. Easiest work-around might be spawn two python threads and run the calculations in the background thread while the main thread continues to run. If a KeyBoardInterrupt (or some other exception) is raised, then the main thread can send a SIGINT to the C function.

Avoid possible memory-leak with returned result

All of the APIs return a heap-allocated struct result. This has potential for a memory-leak, in case the user does not free the memory afterwards. I am probably doing that from the python interfaces. An easy fix would be to replace those with stack-allocated structs, this would require replacing all results-> with results.

Check if CPU supports AVX at compile time

In a reasonably likely scenario, the compiler might be capable of supporting AVX instructions but the CPU is not. In such cases, there will be a runtime error: "Illegal instruction".

I have encountered this already on the TRAVIS OSX Workers and there is manual fix in the common.mk file

Add code to return pairwise indices

The implementation is fairly straightforward but the memory requirements are quite large! So, the distance separation needs to be a fairly small fraction of the doman.

Convert Readme to rst format

Should be trivial with pandoc. Need to adjust setup.py to no longer run the conversion while creating the source distribution

Declination grid binning fails for countpairs_rp_pi_mocks.

When calling countpairs_rp_pi_mocks with lower redshift bin edges close to 0.1 or below, Corrfunc returns the error:

Error in file: ../../utils/gridlink_mocks.c func: gridlink3D line: 593 with expression `idec >=0 && idec < ngrid_dec[iz]'

Declination index for particle position = -1 must be within [0, 0) for cz-bin = 0

even when called on catalogs with >1 galaxy, and whose declination range is non-zero.

Add python3 support

The C extension module is only valid for python2 and will fail for python3

Add travis CI for OSX

Travis CI seems to never run with a "missing config" whenever I add in os: osx in the .travis.yml file.

Allow runtime function selection

Currently, all the count*driver functions dispatch to the function compiled with the highest instruction set (e.g., when AVX and SSE are both available, only the AVX function will be called). By using varargs, this should be implementable reasonably easily.

Add GPU support

The codes would be so much faster on the GPU. Unfortunately, I hardly know anything about GPUs

API change

v2.0 needs to have an additional thin struct wrapper for passing extra quantities (e.g., weights for particles). This will break the v1 API, since all countpairs* routines will need to accept this extra input arg.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.