Code Monkey home page Code Monkey logo

baobab's Introduction

Baobab

https://travis-ci.com/jiwoncpark/baobab.svg?branch=master Documentation Status https://coveralls.io/repos/github/jiwoncpark/baobab/badge.svg?branch=master

Training data generator for hierarchically modeling strong lenses with Bayesian neural networks

The baobab package can generate images of strongly-lensed systems, given some configurable prior distributions over the parameters of the lens and light profiles as well as configurable assumptions about the instrument and observation conditions. It supports prior distributions ranging from artificially simple to empirical.

A major use case for baobab is the generation of training and test sets for hierarchical inference using Bayesian neural networks (BNNs). The idea is that Baobab will generate the training and test sets using different priors. A BNN trained on the training dataset learns not only the parameters of individual lens systems but also, implicitly, the hyperparameters describing the training set population (the training prior). Such hierarchical inference is crucial in scenarios where the training and test priors are different, so that techniques such as importance weighting can be employed to bridge the gap in the BNN response.

Installation

  1. You'll need a Fortran compiler and Fortran-compiled fastell4py, which you can get on a debian system by running
$sudo apt-get install gfortran
$git clone https://github.com/sibirrer/fastell4py.git <desired location>
$cd <desired location>
$python setup.py install --user
  1. Virtual environments are strongly recommended, to prevent dependencies with conflicting versions. Create a conda virtual environment and activate it:
$conda create -n baobab python=3.6 -y
$conda activate baobab
  1. Now do one of the following.

Option 2(a): clone the repo (please do this if you'd like to contribute to the development).

$git clone https://github.com/jiwoncpark/baobab.git
$cd baobab
$pip install -e . -r requirements.txt

Option 2(b): pip install the release version (only recommended if you're a user).

$pip install baobab
  1. (Optional) To run the notebooks, add the Jupyter kernel.
$python -m ipykernel install --user --name baobab --display-name "Python (baobab)"
  1. (Optional) To enable online data augmentation for machine learning, install the relevant dependencies.
$pip install torch torchvision
$pip install tensorflow-gpu

Usage

  1. Choose your favorite config file among the templates in the configs directory and copy it to a directory of your choice, e.g.
$mkdir my_config_collection
$cp baobab/configs/tdlmc_diagonal_config.py my_config_collection/my_config.py
  1. Customize it! You might want to change the name field first with something recognizable. Pay special attention to the components field, which determines which components of the lensed system (e.g. lens light, AGN light) become sampled from relevant priors and rendered in the image.
  2. Generate the training set, e.g. continuing with the example in #1,
$generate my_config_collection/my_config.py

Although the n_data (size of training set) value is specified in the config file, you may choose to override it in the command line, as in

$generate my_config_collection/my_config.py 100

Feedback

Please message @jiwoncpark with any questions.

There is an ongoing document that details our BNN prior choice, written and maintained by Ji Won.

Attribution

baobab heavily uses lenstronomy, a multi-purpose package for modeling and simulating strongly-lensed systems (see source). When you use baobab for your project, please cite lenstronomy with Birrer & Amara 2018 as well as Park et al. 2019 (in prep).

baobab's People

Contributors

jiwoncpark avatar swagnercarena avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

sibirrer aymgal

baobab's Issues

Create docs page for user-defined config entries

Right now, they're in comment form in various *_config.py templates but we'd like the user to be able to define the config in-script (right above the line that runs generate) without having to look at the templates.

Make Travis CI pass

I spent too long trying to diagnose the 139 segfault in Travis and came out unsuccessful... I'll be excluding the TF from the test suite for now.

Here is what I've learned so far.

Disk space not the issue
Running df -h in Travis, I see that the dependencies together take up ~14GB of space in the Python image (~9GB in minimal). This is within the Travis allocation of ~18GB for Python images. So switching to the language-agnostic minimal image (setting language: minimal) wouldn't have helped. The minimal one also only has a Xenial only allowing Python3.5. See the Travis environment overview for details.

Updating Python modules known to cause segfault, not the issue
Sqlite and Scipy are two modules that are known to segfault upon version conflict. Explicitly updating them didn't help, though.

Culprit: TF noise tests
When I ran each test script manually, the baobab/tests/test_data_augmentation/test_noise_tf.py script exited with segfault. Apparently this is a known issue for TF v2.0. Possible causes are

  • some commands like tf.stack known to do this
  • memory leak leading Travis to exceed the allocated 7.5GB
    Travis updates are underway to address these issues.

Side changes

  • Silenced apt-get and pip to prevent exceeding the logging maximum in Travis
  • Switched from nose to unittest for test discovery, as the former is unmaintained
  • Forced TF to use CPU at the TF test class init stage, instead of in one of the tests

generate.py now crashes if agn light is not used

generate.py no longer runs after the magnification fixes for configs without agn light. The code returns the following error:

File "/Users/sebwagner/Documents/Grad_School/Research/Phil/baobab/baobab/sim_utils/image_utils.py", line 128, in generate_image unlensed_total_flux = get_unlensed_total_flux_numerical(self.kwargs_src_light, self.kwargs_unlensed_unmagnified_amp_ps, self.unlensed_image_model) AttributeError: 'Imager' object has no attribute 'kwargs_unlensed_unmagnified_amp_ps'

Setting it to None in the init does not solve the issue either because the Imager class in image_utils now always has an unlensed point source model (even if there is no lensed point source model). I was able to hack together a workaround by changing the Imager init to this:

def init(self, components, lens_mass_model, src_light_model, lens_light_model=None, ps_model=None, kwargs_numerics={'supersampling_factor': 1}, min_magnification=0.0, for_cosmography=False):

    self.components = components
    self.kwargs_numerics = kwargs_numerics
    self.lens_mass_model = lens_mass_model
    self.src_light_model = src_light_model
    self.lens_light_model = lens_light_model
    self.ps_model = ps_model
    if ps_model is not None:
        self.unlensed_ps_model = PointSource(point_source_type_list=['SOURCE_POSITION'], fixed_magnification_list=[False])
    else:
        self.unlensed_ps_model = None
    self.lens_eq_solver = LensEquationSolver(self.lens_mass_model)
    self.min_magnification = min_magnification
    self.for_cosmography = for_cosmography
    self.kwargs_unlensed_unmagnified_amp_ps=None
    self.img_features = {} # Initialized to store metadata of images, will get updated for each lens

Make a more flexible Imager class

Don't hardcode number of lenses/sources. Wrap more generally around Lenstronomy. For now, I want to use this as the simulator for my BNN tutorials. Make a separate issue later for integration to Baobab.

Pip package for v0.1

Basic checklist

  • include data files
  • set up entry point for script generate.py
  • complete dependencies list

Connect to GalSim

Accommodate GalSim's features as per PR #48, but include modules for deconvolution and flux rescaling.

Vectorized Sampling

baobab currently only supports drawing individual samples. For some applications, it would be useful to be able to sample in a vectorized fashion.

Sample populations using SkyPy

Class KCorrector accepts the SED, dust, and input/output filter throughputs and numerically integrates to get the K-corrected apparent magnitude

Clean up `generate.py`

Move some utility functions to a separate module. Cleaning up will be necessary for readability as we enable multi-plane deflectors, multi-plane sources, more profiles...

Index from 0?

The code defaults to indexing from 1. Isn't it be better to have it index from 0?

Enable "no noise" option

In generating the training set, we sometimes want to NOT add any noise so we can add them online during training. Now you can do this by setting, in your user config file, cfg.image.add_noise = False. Note that this is different from setting cfg.observation.background_noise = 0.0 because background noise only includes sky and readout noise and not CCD noise -- so Poisson noise will still be added. Current hack is another if statement in sim_utils/image_utils.py -- I'll wrap this in a class so things are set upon instantiation. @swagnercarena

Migrate some of the user-defined config entries to BaobabConfig

The user-defined config should be kept minimal and in dictionary form. When it becomes ingested by BaobabConfig, there should be various checks of validity and defaulting. Later, for reproducibility, the user would read in their config using BaobabConfig to see what exact specifications were passed to generate.py.

Bug in eval_lognorm

If a lower bound less than 0 is passed in, the CDF calculation will crash and return nan.

noise_tf fails with placeholder shapes / doesn't have a good interface with baobab config files.

Two small issues with noise_tf

  1. return tf.random.normal(img.shape)*self.get_noise_sigma2(img)**0.5

This line will cause a tensorflow error if the image tensor that is passed in has placeholder dimensions. This makes the code incompatible with tf_datasets. A simple change to:

return tf.random.normal(tf.shape(img))*self.get_noise_sigma2(img)**0.5

should resolve this issue.

There is currently no way to map from a baobab config to the inputs required for NoiseModelTF. A function that converts the portions of the baobab config relevant to NoiseModelTF in a dictionary should resolve this issue.

def get_noise_kwargs(baobab_cfg):
	"""
	Return the noise kwargs defined in the babobab config

	Parameters
	----------
		baobab_cfg (BaobabConfig): A BaobabConfig object containing the desired
			noise parameters.

	Returns
	-------
		(dict): A dict containing the noise kwargs to be passed to the noise
			model.
	"""
	# Go through the baobab config and pull out the noise kwargs one by one.
	noise_kwargs = {}
	noise_kwargs.update(baobab_cfg.instrument)
	noise_kwargs.update(baobab_cfg.bandpass)
	noise_kwargs.update(baobab_cfg.observation)
	return noise_kwargs

Look into distributions in magnification

Look at the distributions in magnification vs. the source positions and modify the source position prior and/or apply a magnification cut. We want to save the magnification and image positions metadata in any case, in case we want to predict those.

Set up Travis CI

Tests to include:

  • accessing data files, e.g. PSF maps
  • running generate.py for each config file

Make MCMC-friendly changes to `distributions.py`

Sebastian's hierarchical reweighting could benefit from formatting changes to the way PDF evaluation was set up in distributions.py.

  • Enable log PDF evaluations
  • Separate out lognormal from normal PDF evaluation
  • Add an importable dictionary of hyperparameters for each distribution in distributions.py
  • Explore numba options for scipy functions

Implement and test CovBNNPrior

User specifies the list of parameters that will be modelled as multivariate Gaussian, a Boolean vector of is_log (for marginally lognormal parameters), a mean vector, and the covariance matrix. Remaining parameters default to diagonal.

Subclass for non-diagonal BNN priors

Class structure I have in mind:
BNNPrior (has definitions of distributions and array manipulations mapped from cfg.bnn_omega used by the subclasses, never itself instantiated)
|----SmoothBNNPrior
|----|----DiagonalBNNPrior (what we have now with independent params, keep config structure)
|----EmpiricalBNNPrior (implements fundamental plane relation, selection cuts, etc.)

nans outputted in some images

When running the attached config file (converted to txt to be able to attach)
train_diagonal.txt, some of the images that were generated have nans. I attached X_0001650.npy
as an example. During the image generation the following error is outputted:

/anaconda3/lib/python3.6/site-packages/lenstronomy/SimulationAPI/observation_api.py:198: RuntimeWarning: invalid value encountered in sqrt
  noise = np.sqrt(variance) / self.exposure_time

X_0001650

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.