Code Monkey home page Code Monkey logo

nbodykit's Introduction

nbodykit: a massively parallel large-scale structure toolkit


nbodykit is an open source project and Python package providing a set of algorithms useful in the analysis of cosmological datasets from N-body simulations and large-scale structure surveys.

Driven by the optimism regarding the abundance and availability of large-scale computing resources in the future, the development of nbodykit distinguishes itself from other similar software packages (i.e., nbodyshop, pynbody, yt, xi) by focusing on :

  • a unified treatment of simulation and observational datasets by insulating algorithms from data containers
  • reducing wall-clock time by scaling to thousands of cores
  • deployment and availability on large, super-computing facilities
  • an interactive user interface that performs as well in a Jupyter notebook as on super-computing machines

All algorithms are parallel and run with Message Passing Interface (MPI).

Build Status

We perform integrated tests of the code, including all built-in algorithms, in a miniconda environment for Python 2.7, 3.5, and 3.6.

Build Status Test Coverage Conda PyPi

Documentation

The official documentation is hosted on ReadTheDocs at http://nbodykit.readthedocs.org/.

Cookbook Recipes

Users can dive right into an interactive cookbook of example recipes using binder. We've compiled a set of Jupyter notebooks to help users learn nbodykit by example โ€” just click the launch button below to get started!

binder

Users can also view a static version of the cookbook recipes in the documentation.

Installation

We recommend using the Anaconda distribution of Python. To obtain the dependencies and install a package on OSX or Linux, use

$ conda install -c bccp nbodykit

We are considering support for Windows, but this depends on the status of mpi4py.

Using nbodykit on NERSC

On the Cori and Edison machines at NERSC, we maintain a nightly conda build of the latest stable release of nbodykit. See the documentation for using nbodykit on NERSC for more details.

Bumping to a new version

  1. git pull - confirm that the master branch is up-to-date
  2. Edit Changelog (CHANGES.rst) - Make sure to include all issues which have arisen since the last version. (git add ... -> git commit -m "Update Changelog" -> git push)
  3. Edit version.py -> git push ("bump version to ...")
  4. Go to https://travis-ci.org/bccp/nbodykit and make sure it merged without any problems.
  5. Go to bccp/conda-channel-bccp repo and do "Restart build"
  6. git tag 0.3.? -> git push --tags
  7. bump to a development version (0.3.?dev0)

Acknowledgement

The work on nbodykit is supported by the Berkeley Center for Cosmological Physics, University of California Berkeley, by the National Energy Research Scientific Computing Center of the Lawrence Berkeley National Lab via the allocation m3035, and by users and contributors.

nbodykit's People

Contributors

adematti avatar akrolewski avatar eelregit avatar eickenberg avatar modichirag avatar nickhand avatar rainwoodman avatar ruilan-zh avatar sjforeman avatar svonhausegger avatar ybh0822 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nbodykit's Issues

Error message of failed tests are hard to parse.

It is hard to see what command the pipeline has ran, making it hard to reproduce the failed test case or debug.

[yfeng1@waterfall nbodykit]$ py.test -x nbodykit 
======================================================================================= test session starts =======================================================================================
platform linux2 -- Python 2.7.11 -- py-1.4.30 -- pytest-2.7.3
rootdir: /home/yfeng1/source/nbodykit, inifile: setup.cfg
plugins: pipeline, cov
collected 93 items 

nbodykit/test/test_batch.py .F

============================================================================================ FAILURES =============================================================================================
____________________________________________________________________________________ TestStdin.test_exit_code _____________________________________________________________________________________

self = <nbodykit.test.test_batch.TestStdin testMethod=test_exit_code>

    def test_exit_code(self):
>       asserts.test_exit_code(self)

/home/yfeng1/source/nbodykit/nbodykit/test/test_batch.py:35: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <nbodykit.test.test_batch.TestStdin testMethod=test_exit_code>

    def test_exit_code(self):
        """
        Assert that the exit code is equal to 0
        """
>       assert self.run_fixture.exit_code == 0
E       AssertionError: assert 127 == 0
E        +  where 127 = RunClass1(run_id=RunClass1_78d72c5f-e0f8-4531-82d9-cb2b99f6b83c, ...).exit_code
E        +    where RunClass1(run_id=RunClass1_78d72c5f-e0f8-4531-82d9-cb2b99f6b83c, ...) = <nbodykit.test.test_batch.TestStdin testMethod=test_exit_code>.run_fixture

/home/yfeng1/source/nbodykit/nbodykit/test/asserts.py:25: AssertionError
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================================== 1 failed, 1 passed in 0.91 seconds ================================================================================

query selection on vector quantities, i.e., magnitudes

@rainwoodman, do you have a thought on how to do a selection when the column is a vector? The MBII painter can now read the AB magnitudes but it reads into vector of 5 quantities for u, g, r, i z, and if you say something like "-select= magnitude < -15" the code will crash.

I'm not sure there's a simple way to deal with this. We can solve it by reading the magnitude data into separate columns, i.e., 'magnitude_u', 'magnitude_r', etc. I don't really like that solution, but I don't see any others at the moment.

completeness weights + FKP weights in Bianchi algorithm

We need to clarify want the "Weight" column used in FKP painter is. There are actually two different weights:

  1. completeness weights can applied to the "data" or "randoms" field when painting
  2. FKP weights are designed to weight the field difference w_fkp * [n_data - alpha * n_ran]

& breaks command line parsing of power.py

The choice of & as the optional argument prefix in input file names is inconvinient, since in bash & will end a command line and put the command to background.

It needs to be escaped with & in bash. Maybe we want to use a non special character instead, '-' or '/'.

PaintGrid fails with nprocs > 2

It seems the partitioning of the input dataset is broken. When nprocs > 2 I get this error from pmesh

File "/Users/nhand/Research/Programs/nbodykit/bin/nbkit.py", line 164, in main
result = alg.run()
File "/Users/nhand/Research/Programs/nbodykit/nbodykit/core/algorithms/PaintGrid.py", line 82, in run
real2.unsort(ds[start:end])
File "/Users/nhand/anaconda/lib/python2.7/site-packages/pmesh/pm.py", line 119, in unsort
assert len(flatiter) == self.size

Particle Species

To deal with multiple particle species (with different mass) the DataSource object needs to be able to return 'mass' correctly.

support grid interlacing

This would be a nice feature, especially to note in the paper in comparison to Roman's work

It seems like this would need to modify the ParticleMesh class to support this? @rainwoodman

Deprecate positional arguments in Plugin __init__

@nickhand I just realized this backtrace is totally uncomprehensible. Looks like if we disable positional arguments in Plugin init, then at least we can print out what keywords are expected and what keywords are provided.


    main()
  File "/dev/shm/local/bin/nbkit.py", line 100, in main
    params, extra = Algorithm.parse_known_yaml(alg_name, stream)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/extensionpoints.py", line 544, in parse_known_yaml
    return ReadConfigFile(stream, klass.schema)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 104, in ReadConfigFile
    fill_namespace(ns, schema[name], extra, missing)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 50, in fill_namespace
    fill_namespace(subns, arg[k], subconfig, missing)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 41, in fill_namespace
    raise ConfigurationError("unable to cast '%s' value: %s" %(arg.name, str(e)))
nbodykit.utils.config.ConfigurationError: unable to cast 'field.DataSource' value: failure to parse plugin:

Traceback (most recent call last):
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/extensionpoints.py", line 208, in from_config
    return cls.create(plugin_name, use_schema=True, **kwargs)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/extensionpoints.py", line 175, in create
    toret = klass(**kwargs)
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 369, in wrapper
    raise TypeError(msg)
TypeError: error initializing __init__ for 'FOFDataSource':

Traceback (most recent call last):
  File "/dev/shm/local/lib/python2.7/site-packages/nbodykit/utils/config.py", line 363, in wrapper
    return init(self, *args, **kwargs)
TypeError: __init__() takes at least 3 arguments (4 given)

Pipeline / Filter

A filter takes a source object and provides a data source interface

DataSource -- source
DataSink -- Algorithm

Can we extend the config file such that it can specify a chain like this:

FastPMSim -> FOFFilter -> Zheng2013HOD -> BrightnessCut -> FFTPowerSpectrum

Without nesting deep?

theory-based DataSource mocks + power spectrum dependency

The code for the Gaussian, lognormal + Zel'dovich mocks themselves has been done for awhile -- the issue is that they all start from an input linear power spectrum. The user could pass this in to the DataSource, but the cleanest solution is to use our existing cosmology framework and generate it from that.

I think I can actually extract a basic version of my python wrapper of CLASS to do this; it would be a useful community tool either way. We could then include this external package like an optional dependency for certain DataSource classes, similar to halotools.

If I did this, I should probably make the CLASS wrapper interface well with the astropy cosmology class, since it could be more broadly used, i.e., something like an astropy-affiliated package.

Thoughts, @rainwoodman?

cic crashes with mode='raise' and all particles within box

if you set mode='raise' in ParticleMesh.paint and run any of the nbodykit power examples (outside of BianchiFFTPower), the paint will crash, despite all particles being in the box. There seems to be a possible issue in _cic.py where particles that are close to the outer box size are placed outside the box

see:
https://github.com/rainwoodman/pmesh/blob/master/pmesh/_cic.py#L90

@rainwoodman, is this related to the fact that when using CIC, particles at the edges get distributed to edge +/- 1 cell, and need to be wrapped properly? If so, I would have thought this issue would also show up for particles near 0, but I don't see any evidence of that

Case Sensitive

Nick could you take a look how to make the config parser case insensitive on 'names'? I couldn't find a place to start.

Painting momentum

Marcel and Doyeon requested calculating the momentum power spectrum. We can add an option to existing plugins for momentum.

Shall not write to .nbodykit

I find it strange the data files for testing are downloaded to ~/.nbodykit . Is there a good reason for doing so?

If it is just testing data then wouldn't it make more sense to keep them in either a temporary directory or in the source code tree?

formalizing DataSource columns

I think we need to formalize the process by which we define readable columns for each DataSource. By default, all readers must return 'Position', but some also return 'Weight', 'Mass', etc + any columns that are named in the input data. And right now RaDecRedshift returns 'Position' as (ra, dec, z), which should really be named 'AngularPosition' or something, and also be able to return Ra, Dec, Redshift separately

Any good ideas on how to structure this, @rainwoodman ?

We could have each DataSource define a list of preset columns, i.e., Position, Ra, Dec, Redshift, with an associated function that returns that column, given the data array

subsample.py

This will be a tool that generates the equivalent of @martinjameswhite 's pm.c subsample file.

The idea is to replace 2lpt_ic.c and pm.c with fastPM + nbodykit for the current analysis chain @melissajoseph is using.

Then we can explore QPM sampling vs stepping. qpm_calc_xi is still a limiting factor in the chain..

measure real and imaginary components of power spectrum

for my velocity correlator measurements, I need to be able to measure the real and imaginary components of the power spectrum -- not sure the best way to do this, i.e., another flag to change the estimator or perhaps we always measure real + complex (complex will usually be zero)

The data fields are definitely real, which guarantees that the negative kz modes will simply be the complex conjugate of the positive frequency. We currently only compute the real part of the power spectrum, so we only need to multiply by a factor of 2.

Things get more complicated if the power spectrum is purely imaginary, as in the case of , which will have mu^1 dependency. It's trivial to add another estimator for the imaginary part of the power spectrum from the two input fields: just do c1.imag_c2.real - c1.real_c2.imag

This would complicate the code more (for perhaps a unique use-case), but on the other hand, the generalization might actually make the code easier to understand.

Thoughts, @rainwoodman ?

remove auto assign?

I think in theory this is nice, but some potentially downsides:

  1. complicates things for users by hiding code
  2. as we implement subclasses of plugin base classes and the code becomes more complicated, I think it will hinder development

I think we should still verify the schema for each class, but just not do the automatic setting of attributes.

Also, if we want to have abstract attributes, I think we should add a check when during the schema verification process, rather than using the abc.abstractproperty decorator (which requires property setter/getter functions in the subclasses)

Thoughts, @rainwoodman?

Pick a format for the paper draft.

I really liked the .rst format used in scipy_proceeding. But if we are for mnras we may need to patch their build system a bit to use that.

The good thing about rst is we can probably share quite a bit of text between the online document and the paper.

Is it worth the effort?

@nickhand

Pandas still failing on my laptop. though it passes on travis!?

The error message is just like before. This is strange. I checked I am using 0.0.15 of pmesh which has a test case asserting the fix.. Also, why is the parameter using AnisotropicCIC to correct for a TSC paintbrush?

                assert_array_almost_equal(this.attrs[name], ref.attrs[name])

        except Exception as e:
>           raise _make_exc(self, str(e))
E           AssertionError: 
E           Not equal to tolerance rtol=0.01, atol=1e-05
E           
E           (mismatch 100.0%)
E            x: array([[           nan+nanj,            nan+nanj,            nan+nanj,
E                              nan+nanj,       0.000000 +0.j],
E                  [  63138.386719 +0.j,            nan+nanj,   32610.351562 +0.j,...
E            y: array([[          nan+nanj,           nan+nanj,           nan+nanj,
E                             nan+nanj,      0.000000 +0.j],
E                  [ 63163.355469 +0.j,           nan+nanj,  32710.017578 +0.j,... 
E           Cmdline
E            mpirun -n 2 python /home/yfeng1/source/nbodykit/bin/nbkit.py FFTPower /home/yfeng1/source/nbodykit/examples/power/test_pandas_hdf.params
E           stderr:
E           [ 000000.46 ]   0:waterfall 08-13 02:53  FFTPower        INFO     importing done
E           [ 000000.65 ]   0:waterfall 08-13 02:53  Pandas          INFO     total number of objects selected is 50000 / 50000
E           [ 000000.67 ]   0:waterfall 08-13 02:53  DefaultPainter  INFO     Mean = 0.00298023
E           [ 000000.67 ]   0:waterfall 08-13 02:53  measurestats    INFO     Painting done
E           [ 000000.81 ]   0:waterfall 08-13 02:53  measurestats    INFO     r2c done
E           [ 000001.44 ]   0:waterfall 08-13 02:53  FFTPower        INFO     measurement done; saving result to /home/yfeng1/source/nbodykit/examples/output/test_power_pandas_hdf.dat

difference between N objects read and N from layout.exchange in paint

  • when running with more than one process, the all reduce for length of position returned by the layout.exchange in the paint() function is not equal to the total number of objects read
  • this was also the case before the paint/read change, but we didn't notice because the old Painters returned the number of objects read, not the all reduce answer

I'll note that when running with only one process, the two numbers described above agree. Also, it seems that the power spectrum doesn't change when running with 1 process or several processes.

this seems like a problem, but maybe not? @rainwoodman, any idea what's going on?

shot noise removal on cross

Shouldn't the shot noise be zero in the case of cross power spectrum rather than

shotnoise = pm.BoxSize ** 3 / (1.0 * Ntot1 * Ntot2) ** 0.5

transfer "chain" as plugins or attribute of painter?

to properly implement a MomentumPainter using power.py, we need to be able specify transfer functions via the command line

they seem to be directly tied to the painter, i.e., a density painter should basically always do [NormalizeDC, RemoveDC, AnisotropicCIC], while a momentum painter or velocity painter only needs AnisotropicCIC.

Thoughts on how to best implement this, @rainwoodman? We could make them plugins that could be set via the command line, or just define a set "transfer_chain" for each painter

domain decomposition crash in pmesh when using power-parallel.py

This is a weird bug I ran into when running power-parallel.py. It seems to depend on the total number of processors available, as I can change the number of cpus and avoid this. I would guess it has something to do with splitting the domain weirdly over the processors, but I haven't investigated much. It doesn't happen if I run with fewer than the maximum available number of processors.

  File "/dev/shm/local/lib/python2.7/site-packages/pmesh/particlemesh.py", line 205, in decompose
    layout = pm.decompose(position)
    transform=self.transform0)
  File "/dev/shm/local/lib/python2.7/site-packages/pmesh/domain.py", line 323, in decompose
    return default_paint(field, pm)
    sil[j, s] = self._digitize(tmp - smoothing, self.grid[j]) - 1
  File "/dev/shm/local/lib/python2.7/site-packages/pmesh/domain.py", line 251, in _digitize
    layout = pm.decompose(position)
    return numpy.digitize(data, bins)
ValueError: The bins must be monotonically increasing or decreasing

what about getting rid of ':' in the commandline?

After introducing the mini selection language the command line for painters need to be wrapped with quotes all the time anyways. So maybe we can simplify this by getting rid of the columns.

If we do this change we want to do it early.
What do you think?
@nickhand

File Formats Plugins

Several new toplevel scripts currently relies on files.Snapshot to read the files. I will convert this to a set of file format plugins. Then maybe we shall look into converting painters into these file format plugins.

increased usage of astropy?

There could be a few advantages to using more of the tools provided by astropy:

  • they include the six package for 2 to 3 compatibility, which could clean up some of our code
  • they do also provide some versioning tools, etc, but they are perhaps overly complicated
  • seems to be stable now and catching on in the community, which could potentially open up the code to a wider audience, i.e. "astropy-affiliated"?
  • we are already using it for the cosmology class, so might as well take advantage of anything else we want
  • would help with #197

thoughts, @rainwoodman?

Default Transfer functions.

The default transfer is no transfer.

Shall we modify this such that when -transfer is not given in the yaml file, the transfer is
[NormalizeDC, RemoveDC, AnisotropicCIC]

And to specify 'no transfer', one use

  • transfer: []

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.