bccp / nbodykit Goto Github PK

View Code? Open in Web Editor NEW

110.0 20.0 61.0 21.42 MB

Analysis kit for large-scale structure datasets, the massively parallel way

Home Page: http://nbodykit.rtfd.io

License: GNU General Public License v3.0

Python 99.98% Shell 0.02%

astrophysics cosmology large-scale-structure clustering data-analysis python mpi mpi4py parallel-computing

nbodykit's Introduction

nbodykit: a massively parallel large-scale structure toolkit

nbodykit is an open source project and Python package providing a set of algorithms useful in the analysis of cosmological datasets from N-body simulations and large-scale structure surveys.

Driven by the optimism regarding the abundance and availability of large-scale computing resources in the future, the development of nbodykit distinguishes itself from other similar software packages (i.e., nbodyshop, pynbody, yt, xi) by focusing on :

a unified treatment of simulation and observational datasets by insulating algorithms from data containers
reducing wall-clock time by scaling to thousands of cores
deployment and availability on large, super-computing facilities
an interactive user interface that performs as well in a Jupyter notebook as on super-computing machines

All algorithms are parallel and run with Message Passing Interface (MPI).

Build Status

We perform integrated tests of the code, including all built-in algorithms, in a miniconda environment for Python 2.7, 3.5, and 3.6.

Documentation

The official documentation is hosted on ReadTheDocs at http://nbodykit.readthedocs.org/.

Cookbook Recipes

Users can dive right into an interactive cookbook of example recipes using binder. We've compiled a set of Jupyter notebooks to help users learn nbodykit by example — just click the launch button below to get started!

Users can also view a static version of the cookbook recipes in the documentation.

Installation

We recommend using the Anaconda distribution of Python. To obtain the dependencies and install a package on OSX or Linux, use

$ conda install -c bccp nbodykit

We are considering support for Windows, but this depends on the status of mpi4py.

Using nbodykit on NERSC

On the Cori and Edison machines at NERSC, we maintain a nightly conda build of the latest stable release of nbodykit. See the documentation for using nbodykit on NERSC for more details.

Bumping to a new version

git pull - confirm that the master branch is up-to-date
Edit Changelog (CHANGES.rst) - Make sure to include all issues which have arisen since the last version. (git add ... -> git commit -m "Update Changelog" -> git push)
Edit version.py -> git push ("bump version to ...")
Go to https://travis-ci.org/bccp/nbodykit and make sure it merged without any problems.
Go to bccp/conda-channel-bccp repo and do "Restart build"
git tag 0.3.? -> git push --tags
bump to a development version (0.3.?dev0)

Acknowledgement

The work on nbodykit is supported by the Berkeley Center for Cosmological Physics, University of California Berkeley, by the National Energy Research Scientific Computing Center of the Lawrence Berkeley National Lab via the allocation m3035, and by users and contributors.

nbodykit's People

Contributors

Stargazers

Watchers

Forkers

nickhand energy-berkeley melissajoseph rainwoodman donregan manodeep mjvakili mschmittfull modichirag lheizmann zdplayground ericchill xyh-cosmo sjforeman cosmonomad eelregit minhmpa adematti caoxiaoyue cbyrohl hantaoliu fractional-ray dantenavarro laguer lang22 jkuruvilla minaskar florpi akrolewski svonhausegger venkumod michaeljwilson dylancromer oliverphilcox ji-ping-dai mehdirezaie priyankajalan14 twobombs muntazirabidi uendert rodrigovonmarttens sierraporta eickenberg kkogai soleyman samcom12 alulujasmine jayashree-behera he-in surhudm ruilan-zh qezlou wentaoluo j-dr junaidtownsend yizheng-cosmos hellogithubc uwontknowme yzhengsong

nbodykit's Issues

odd power spectrum multipoles

computing odd multipoles are currently broken by the abs(mu) in measurestats.py

Get rid of real and complex from particlemesh object.

They are quite painful to work with.
pmesh needs to be fixed and a lot of places of r2c and c2r needs to be modified.

Generic Pkmuresult

Error message of failed tests are hard to parse.

It is hard to see what command the pipeline has ran, making it hard to reproduce the failed test case or debug.

[yfeng1@waterfall nbodykit]$ py.test -x nbodykit 
======================================================================================= test session starts =======================================================================================
platform linux2 -- Python 2.7.11 -- py-1.4.30 -- pytest-2.7.3
rootdir: /home/yfeng1/source/nbodykit, inifile: setup.cfg
plugins: pipeline, cov
collected 93 items 

nbodykit/test/test_batch.py .F

============================================================================================ FAILURES =============================================================================================
____________________________________________________________________________________ TestStdin.test_exit_code _____________________________________________________________________________________

self = <nbodykit.test.test_batch.TestStdin testMethod=test_exit_code>

    def test_exit_code(self):
>       asserts.test_exit_code(self)

/home/yfeng1/source/nbodykit/nbodykit/test/test_batch.py:35: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <nbodykit.test.test_batch.TestStdin testMethod=test_exit_code>

    def test_exit_code(self):
        """
        Assert that the exit code is equal to 0
        """
>       assert self.run_fixture.exit_code == 0
E       AssertionError: assert 127 == 0
E        +  where 127 = RunClass1(run_id=RunClass1_78d72c5f-e0f8-4531-82d9-cb2b99f6b83c, ...).exit_code
E        +    where RunClass1(run_id=RunClass1_78d72c5f-e0f8-4531-82d9-cb2b99f6b83c, ...) = <nbodykit.test.test_batch.TestStdin testMethod=test_exit_code>.run_fixture

/home/yfeng1/source/nbodykit/nbodykit/test/asserts.py:25: AssertionError
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================================== 1 failed, 1 passed in 0.91 seconds ================================================================================

Print seconds in the timestamps.

We shall print the seconds with %.2f -- Because it gives a better measurement of wallclock time.

query selection on vector quantities, i.e., magnitudes

@rainwoodman, do you have a thought on how to do a selection when the column is a vector? The MBII painter can now read the AB magnitudes but it reads into vector of 5 quantities for u, g, r, i z, and if you say something like "-select= magnitude < -15" the code will crash.

I'm not sure there's a simple way to deal with this. We can solve it by reading the magnitude data into separate columns, i.e., 'magnitude_u', 'magnitude_r', etc. I don't really like that solution, but I don't see any others at the moment.

Reading Gadget Groups / SubFind Files

completeness weights + FKP weights in Bianchi algorithm

We need to clarify want the "Weight" column used in FKP painter is. There are actually two different weights:

completeness weights can applied to the "data" or "randoms" field when painting
FKP weights are designed to weight the field difference w_fkp * [n_data - alpha * n_ran]

& breaks command line parsing of power.py

The choice of & as the optional argument prefix in input file names is inconvinient, since in bash & will end a command line and put the command to background.

It needs to be escaped with & in bash. Maybe we want to use a non special character instead, '-' or '/'.

PaintGrid fails with nprocs > 2

It seems the partitioning of the input dataset is broken. When nprocs > 2 I get this error from pmesh

File "/Users/nhand/Research/Programs/nbodykit/bin/nbkit.py", line 164, in main
result = alg.run()
File "/Users/nhand/Research/Programs/nbodykit/nbodykit/core/algorithms/PaintGrid.py", line 82, in run
real2.unsort(ds[start:end])
File "/Users/nhand/anaconda/lib/python2.7/site-packages/pmesh/pm.py", line 119, in unsort
assert len(flatiter) == self.size

Particle Species

To deal with multiple particle species (with different mass) the DataSource object needs to be able to return 'mass' correctly.

support grid interlacing

This would be a nice feature, especially to note in the paper in comparison to Roman's work

It seems like this would need to modify the ParticleMesh class to support this? @rainwoodman

Deprecate positional arguments in Plugin init

@nickhand I just realized this backtrace is totally uncomprehensible. Looks like if we disable positional arguments in Plugin init, then at least we can print out what keywords are expected and what keywords are provided.


    main()
  File "/dev/shm/local/bin/nbkit.py", line 100, in main
    params, extra = Algorithm.parse_known_yaml(alg_name, stream)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/extensionpoints.py", line 544, in parse_known_yaml
    return ReadConfigFile(stream, klass.schema)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 104, in ReadConfigFile
    fill_namespace(ns, schema[name], extra, missing)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 50, in fill_namespace
    fill_namespace(subns, arg[k], subconfig, missing)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 41, in fill_namespace
    raise ConfigurationError("unable to cast '%s' value: %s" %(arg.name, str(e)))
nbodykit.utils.config.ConfigurationError: unable to cast 'field.DataSource' value: failure to parse plugin:

Traceback (most recent call last):
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/extensionpoints.py", line 208, in from_config
    return cls.create(plugin_name, use_schema=True, **kwargs)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/extensionpoints.py", line 175, in create
    toret = klass(**kwargs)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 369, in wrapper
    raise TypeError(msg)
TypeError: error initializing __init__ for 'FOFDataSource':

Traceback (most recent call last):
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 363, in wrapper
    return init(self, *args, **kwargs)
TypeError: __init__() takes at least 3 arguments (4 given)

Define a list of targets for v0.1.0.

Pipeline / Filter

A filter takes a source object and provides a data source interface

DataSource -- source
DataSink -- Algorithm

Can we extend the config file such that it can specify a chain like this:

FastPMSim -> FOFFilter -> Zheng2013HOD -> BrightnessCut -> FFTPowerSpectrum

Without nesting deep?

Remove dependency on MaskedArray

Just ran into a this:

numpy/numpy#6106

This re-enforced my impression that the stinky MaskedArray in numpy shall go. We shall not depend on it.

I will take a stab.

Test suites and testing datasets.

This is becoming urgent.

normalization of correlation function from power.py is off.

The norm of correlation function from power.py contains a dimension of L**3. This traces back to the multiply by L**3 in compute_power_3d().

Do we want to enforce compute_power_3d always return the power in correct dimension? @nickhand

theory-based DataSource mocks + power spectrum dependency

The code for the Gaussian, lognormal + Zel'dovich mocks themselves has been done for awhile -- the issue is that they all start from an input linear power spectrum. The user could pass this in to the DataSource, but the cleanest solution is to use our existing cosmology framework and generate it from that.

I think I can actually extract a basic version of my python wrapper of CLASS to do this; it would be a useful community tool either way. We could then include this external package like an optional dependency for certain DataSource classes, similar to halotools.

If I did this, I should probably make the CLASS wrapper interface well with the astropy cosmology class, since it could be more broadly used, i.e., something like an astropy-affiliated package.

Thoughts, @rainwoodman?

cic crashes with mode='raise' and all particles within box

if you set mode='raise' in ParticleMesh.paint and run any of the nbodykit power examples (outside of BianchiFFTPower), the paint will crash, despite all particles being in the box. There seems to be a possible issue in _cic.py where particles that are close to the outer box size are placed outside the box

see:
https://github.com/rainwoodman/pmesh/blob/master/pmesh/_cic.py#L90

@rainwoodman, is this related to the fact that when using CIC, particles at the edges get distributed to edge +/- 1 cell, and need to be wrapped properly? If so, I would have thought this issue would also show up for particles near 0, but I don't see any evidence of that

Case Sensitive

Nick could you take a look how to make the config parser case insensitive on 'names'? I couldn't find a place to start.

Painting momentum

Marcel and Doyeon requested calculating the momentum power spectrum. We can add an option to existing plugins for momentum.

Shall not write to .nbodykit

I find it strange the data files for testing are downloaded to ~/.nbodykit . Is there a good reason for doing so?

If it is just testing data then wouldn't it make more sense to keep them in either a temporary directory or in the source code tree?

formalizing DataSource columns

I think we need to formalize the process by which we define readable columns for each DataSource. By default, all readers must return 'Position', but some also return 'Weight', 'Mass', etc + any columns that are named in the input data. And right now RaDecRedshift returns 'Position' as (ra, dec, z), which should really be named 'AngularPosition' or something, and also be able to return Ra, Dec, Redshift separately

Any good ideas on how to structure this, @rainwoodman ?

We could have each DataSource define a list of preset columns, i.e., Position, Ra, Dec, Redshift, with an associated function that returns that column, given the data array

subsample.py

This will be a tool that generates the equivalent of @martinjameswhite 's pm.c subsample file.

The idea is to replace 2lpt_ic.c and pm.c with fastPM + nbodykit for the current analysis chain @melissajoseph is using.

Then we can explore QPM sampling vs stepping. qpm_calc_xi is still a limiting factor in the chain..

measure real and imaginary components of power spectrum

for my velocity correlator measurements, I need to be able to measure the real and imaginary components of the power spectrum -- not sure the best way to do this, i.e., another flag to change the estimator or perhaps we always measure real + complex (complex will usually be zero)

The data fields are definitely real, which guarantees that the negative kz modes will simply be the complex conjugate of the positive frequency. We currently only compute the real part of the power spectrum, so we only need to multiply by a factor of 2.

Things get more complicated if the power spectrum is purely imaginary, as in the case of , which will have mu^1 dependency. It's trivial to add another estimator for the imaginary part of the power spectrum from the two input fields: just do c1.imag_c2.real - c1.real_c2.imag

This would complicate the code more (for perhaps a unique use-case), but on the other hand, the generalization might actually make the code easier to understand.

Thoughts, @rainwoodman ?

BianchiFFTPower crashing on Edison

I am suddenly seeing crashes on Edison running BianchiFFTPower. The error message is mysterious:

run: error: nid00018: tasks 0-23: Killed
srun: Terminating job step 1020397.4
srun: Force Terminated job step 1020397.4

It occurs when doing the Bianchi transfer function when (i, j, k) are nonzero:

https://github.com/bccp/nbodykit/blob/master/nbodykit/measurestats.py#L133

I think this is possibly memory-related. Thoughts, @rainwoodman ?

Reading Gadget Snapshot Files

remove auto assign?

I think in theory this is nice, but some potentially downsides:

complicates things for users by hiding code
as we implement subclasses of plugin base classes and the code becomes more complicated, I think it will hinder development

I think we should still verify the schema for each class, but just not do the automatic setting of attributes.

Also, if we want to have abstract attributes, I think we should add a check when during the schema verification process, rather than using the abc.abstractproperty decorator (which requires property setter/getter functions in the subclasses)

Thoughts, @rainwoodman?

Pick a format for the paper draft.

I really liked the .rst format used in scipy_proceeding. But if we are for mnras we may need to patch their build system a bit to use that.

The good thing about rst is we can probably share quite a bit of text between the online document and the paper.

Is it worth the effort?

@nickhand

Pandas still failing on my laptop. though it passes on travis!?

The error message is just like before. This is strange. I checked I am using 0.0.15 of pmesh which has a test case asserting the fix.. Also, why is the parameter using AnisotropicCIC to correct for a TSC paintbrush?

                assert_array_almost_equal(this.attrs[name], ref.attrs[name])

        except Exception as e:
>           raise _make_exc(self, str(e))
E           AssertionError: 
E           Not equal to tolerance rtol=0.01, atol=1e-05
E           
E           (mismatch 100.0%)
E            x: array([[           nan+nanj,            nan+nanj,            nan+nanj,
E                              nan+nanj,       0.000000 +0.j],
E                  [  63138.386719 +0.j,            nan+nanj,   32610.351562 +0.j,...
E            y: array([[          nan+nanj,           nan+nanj,           nan+nanj,
E                             nan+nanj,      0.000000 +0.j],
E                  [ 63163.355469 +0.j,           nan+nanj,  32710.017578 +0.j,... 
E           Cmdline
E            mpirun -n 2 python /home/yfeng1/source/nbodykit/bin/nbkit.py FFTPower /home/yfeng1/source/nbodykit/examples/power/test_pandas_hdf.params
E           stderr:
E           [ 000000.46 ]   0:waterfall 08-13 02:53  FFTPower        INFO     importing done
E           [ 000000.65 ]   0:waterfall 08-13 02:53  Pandas          INFO     total number of objects selected is 50000 / 50000
E           [ 000000.67 ]   0:waterfall 08-13 02:53  DefaultPainter  INFO     Mean = 0.00298023
E           [ 000000.67 ]   0:waterfall 08-13 02:53  measurestats    INFO     Painting done
E           [ 000000.81 ]   0:waterfall 08-13 02:53  measurestats    INFO     r2c done
E           [ 000001.44 ]   0:waterfall 08-13 02:53  FFTPower        INFO     measurement done; saving result to /home/yfeng1/source/nbodykit/examples/output/test_power_pandas_hdf.dat

Explain the extension / Plugin Framework.

mpsort not found error on Edison

Looks like mpsort is not built into the dependency bundle at edison. @nickhand

Could you look at the cron log files what is going on?

Resampling still doing crap.

Looks like Resampling is not really working, at least not on Edison.

I am looking into this.

Lanczos Window

Expose the Lanczos window from pmesh to nbodykit.

difference between N objects read and N from layout.exchange in paint

when running with more than one process, the all reduce for length of position returned by the layout.exchange in the paint() function is not equal to the total number of objects read
this was also the case before the paint/read change, but we didn't notice because the old Painters returned the number of objects read, not the all reduce answer

I'll note that when running with only one process, the two numbers described above agree. Also, it seems that the power spectrum doesn't change when running with 1 process or several processes.

this seems like a problem, but maybe not? @rainwoodman, any idea what's going on?

shot noise removal on cross

Shouldn't the shot noise be zero in the case of cross power spectrum rather than

shotnoise = pm.BoxSize ** 3 / (1.0 * Ntot1 * Ntot2) ** 0.5

Add the communicator to mpsort.sort

transfer "chain" as plugins or attribute of painter?

to properly implement a MomentumPainter using power.py, we need to be able specify transfer functions via the command line

they seem to be directly tied to the painter, i.e., a density painter should basically always do [NormalizeDC, RemoveDC, AnisotropicCIC], while a momentum painter or velocity painter only needs AnisotropicCIC.

Thoughts on how to best implement this, @rainwoodman? We could make them plugins that could be set via the command line, or just define a set "transfer_chain" for each painter

PaintGrid test case in for k-space input failing on python 2 (but not 3??)

In PR #225, the test case for the Fourier grid fails on Travis. I have no idea why this is happening. Perhaps related to #228 ?

I cannot reproduce this on my Mac. All the tests pass just fine. I've deleted the Travis caches and re-run so that doesn't appear to be the issue.

master doesn't work on NERSC.

Even FOF fails with mysterious errors.

Painting to grid file and reading from a grid file

store shotnoise in power spectrum files even without --remove-shotnoise

Currently without --remove-shotnoise the shotnoise is not properly stored in the file(0).

Shall we change this to always store the shotnoise regardless of removal -- actually, what about just deprecate --remove-shotnoise altogether since the shotnoise is always stored in the file?

@nickhand , what do you think?

conda-forge?

worth checking out?

domain decomposition crash in pmesh when using power-parallel.py

This is a weird bug I ran into when running power-parallel.py. It seems to depend on the total number of processors available, as I can change the number of cpus and avoid this. I would guess it has something to do with splitting the domain weirdly over the processors, but I haven't investigated much. It doesn't happen if I run with fewer than the maximum available number of processors.

  File "/dev/shm/local/lib/python2.7/site-packages/pmesh/particlemesh.py", line 205, in decompose
    layout = pm.decompose(position)
    transform=self.transform0)
  File "/dev/shm/local/lib/python2.7/site-packages/pmesh/domain.py", line 323, in decompose
    return default_paint(field, pm)
    sil[j, s] = self._digitize(tmp - smoothing, self.grid[j]) - 1
  File "/dev/shm/local/lib/python2.7/site-packages/pmesh/domain.py", line 251, in _digitize
    layout = pm.decompose(position)
    return numpy.digitize(data, bins)
ValueError: The bins must be monotonically increasing or decreasing

they include the six package for 2 to 3 compatibility, which could clean up some of our code
they do also provide some versioning tools, etc, but they are perhaps overly complicated
seems to be stable now and catching on in the community, which could potentially open up the code to a wider audience, i.e. "astropy-affiliated"?
we are already using it for the cosmology class, so might as well take advantage of anything else we want
would help with #197

thoughts, @rainwoodman?

Default Transfer functions.

The default transfer is no transfer.

Shall we modify this such that when -transfer is not given in the yaml file, the transfer is
[NormalizeDC, RemoveDC, AnisotropicCIC]

And to specify 'no transfer', one use

transfer: []