hagabbar / vitamin_c Goto Github PK

This will be the first official public release of the VItamin code base. VItamin is a python package for producing fast gravitational wave posterior samples.

License: GNU General Public License v3.0

Python 59.95% Makefile 0.27% CSS 3.91% JavaScript 15.21% HTML 20.27% Batchfile 0.33% Shell 0.07%

machine-learning gravitational-waves conditional-variational-autoencoder data-science gravity physics ligo vitamin virgo black-hole

vitamin_c's Introduction

VItamin_C: A Machine Learning Library for Fast Gravitational Wave Posterior Generation

⭐ Star us on GitHub it helps!

Welcome to VItamin_B, a python toolkit for producing fast gravitational wave posterior samples.

This repository is the official implementation of Bayesian Parameter Estimation using Conditional Variational Autoencoders for Gravitational Wave Astronomy.

Hunter Gabbard, Chris Messenger, Ik Siong Heng, Francesco Tonlini, Roderick Murray-Smith

Official Documentation can be found at https://hagabbar.github.io/vitamin_c.

Check out our Blog (to be made), Paper and Interactive Demo.

Note: This repository is a work in progress. No official release of code just yet.

Requirements

VItamin requires python3.6. You may use python3.6 by initializing a virtual environment.

virtualenv -p python3.6 myenv
source myenv/bin/activate
pip install --upgrade pip

Optionally, install basemap and geos in order to produce sky plots of results.

For installing basemap:

Install geos-3.3.3 from source
Once geos is installed, install basemap using pip install git+https://github.com/matplotlib/basemap.git

Install VItamin using pip:

pip install vitamin-b

Training

To train an example model from the paper, try out the demo.

Full model definitions are given in models directory. Data is generated from gen_benchmark_pe.py.

Results

We train using a network derived from first principals:

We track the performance of the model during training via loss curves:

Finally, we produce posteriors after training and other diagnostic tests comparing our approach with 4 other independent methods:

Posterior example:

KL-Divergence between posteriors:

PP Tests:

vitamin_c's People

Contributors

Stargazers

Watchers

Forkers

mmdocherty jcbayley sidajith nj2nu niksterg gracospa zhangyuewei98

vitamin_c's Issues

Wrong location for params files.

In run_vitamin the default location is set to "./params.txt" etc. but in your current directory structure should be "./params_files/params.txt".

I would further suggest to use
params = os.path.join(os.getcwd(), 'params_files', 'params.txt')
instead of
params = "./params_files/params.txt"
as it is more readable and less system-dependent.
(Same is true for bounds and fixed_vals.)

corner params label variable repeated

Describe the solution you'd like
Why are there corner_parnames and cornercorner_parnames? These things should probably be dealt with in the code and not specified by the user. All the user should do is define rand_pars and inf_pars,

Remove sample_from_gaussian_dist in q network

I don't think we ever call this function so best to remove it.

Turn off test data posterior loading during training.

Is your feature request related to a problem? Please describe.
Would like the user to have the option to turn off loading in the test sample posteriors during training. Only plot the vitamin predictions corner plots as a figure of merit.

Compatibility documentation

Describe the solution you'd like
Some people do have python 3.7 so you need to say that the installation will not work for them and they shouldn’t even try UNLESS you are able to fix it. You certainly need to say that it won’t work on 2.7

and you should add that network training is only sensible if you have a good GPU. Testing is accelerated by a GPU but CPU is still super-fast.

json file comments

Is your feature request related to a problem? Please describe.
Would be nice if I could add comments to the json configuration files.

Describe the solution you'd like
Solution could be to use jsonnet (or some other json alternative with comments enabled).

CVAE_model.py - von mises reconstruction loss

Why is the reconstruction loss summed on line 581 over the 1st and 2nd elements of the 2nd dimension?

It looks like this a hard coded assumption that there are 2 von mises parameters. I think at present that is true with phi_jl and phi_12 but it isn't always the case.

Just sum it over the 2nd dimension.

Generation of bilby posteriors optional

Describe the solution you'd like
The generation of bilby posteriors needs to be optional and the default should be to NOT generate bilby posteriors. We are trying to show that things are fast.

Time to run in quickstart

Describe the solution you'd like
You should add how long each step is likely to take in the quickstart guide.

One test sample bug

Describe the bug
Code returns reshaping error when only using one test sample during both training and testing of model.

To Reproduce
(myenv) [hunter.gabbard@dgx1 vitamin_b]$ python run_vitamin.py --train True
05:18 bilby INFO : Running bilby version: 0.5.5:
module 'basemap' is not installed
Skyplotting functionality is automatically disabled.

... mass_1 will be inferred
... mass_2 will be inferred
... luminosity_distance will be inferred
... geocent_time will be inferred
... theta_jn will be inferred
... ra will be inferred
... dec will be inferred
Traceback (most recent call last):
File "run_vitamin.py", line 1249, in
train(params,bounds,fixed_vals)
File "run_vitamin.py", line 610, in train
x_data_test, y_data_test_noisefree, y_data_test,_,snrs_test = load_data(params,bounds,fixed_vals,params['test_set_dir'],params['inf_pars'],load_condor=True)
File "run_vitamin.py", line 283, in load_data
data['x_data'][:,i]=(data['x_data'][:,i] - bounds[par_min]) / (bounds[par_max] - bounds[par_min])

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

run return fails when doPE=False

When using run from the gen_benchmark_pe module, setting training=True and do_pe=False the function returns a 3-tuple instead of a 4-tuple. Needs to also return snr.

h5py writing files python3.7

Describe the bug
h5py doesn't have permission to write files in python3.7.

To Reproduce
So I ran the gen_train() command and it certainly does try to generate 1000 signals but I get this error at the end.

Made waveform 999/1000
Generated: ./training_sets_3det_9par_256Hz/tset_tot-1000_split-1000/data_1000-1000.h5py ...
Traceback (most recent call last):
File "", line 1, in
File "/home/chrism/vitamin_b/vitamin_b/run_vitamin.py", line 461, in gen_train
hf = h5py.File('%s/data_%d-%d.h5py' % (params['train_set_dir'],(i+params['tset_split']),params['tot_dataset_size']), 'w')
File "/home/chrism/vitamin_b/myenv/lib/python3.7/site-packages/h5py/_hl/files.py", line 394, in init
swmr=swmr)
File "/home/chrism/vitamin_b/myenv/lib/python3.7/site-packages/h5py/_hl/files.py", line 176, in make_fid
fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 105, in h5py.h5f.create
OSError: Unable to create file (unable to lock file, errno = 5, error message = 'Input/output error’)

Something hard coded won’t allow a general user to run this.

gen_train noisy time series unecessary

Is your feature request related to a problem? Please describe.
It is unnecessary to include the noisy time series in the saved training samples.

Describe the solution you'd like
Remove the saving of these time series. This would effectively reduce the RAM requirements by half.

Batchnorm in r1

When using batchnorm you seem to be passing it zeros for the mean and ones for the variance with no offset or scale parameter. There don't look to be any trainable parameters here and the normalisation appears to not actually be doing anything according to the equation here

https://www.tensorflow.org/api_docs/python/tf/nn/batch_normalization

batchnorm in q network

The name given to the hidden layer batchnorm is not correct. It is being labelled with the same name as used in the conversation layer batchnorm.

If this is a trainable parameter (which I'm unsure about) then this will be a bug.

Spin parameters

Describe the bug
I tried customising the params and fixed_vals and bounds files. I copied your examples from the guide and added spin magnitudes as parameters.

However, when generating training data it just ignores those files even though I add the full path as stated, I get this error

To Reproduce
run_vitamin.gen_train('/home/chris.messenger/vitamin_b/new_params.txt','/home/chris.messenger/vitamin_b/new_bounds.txt','/home/chris.messenger/vitamin_b/new_fixed_vals.txt')
Traceback (most recent call last):
File "", line 1, in
File "/home/chris.messenger/vitamin_b/vitamin_b/run_vitamin.py", line 427, in gen_train
params = eval(data)
File "", line 1
make_corner_plots = True, # if True, make corner plots
^
SyntaxError: invalid syntax

Expected behavior
To not have this error when copying the params files from the documentation.

test set files with .resume in filename

Describe the bug
Any files other than .h5py files in the test set posterior directory will crash the program.

Will need to either automatically delete those files (easiest) or ignore them somehow.
To Reproduce

ValueError Traceback (most recent call last)
in ()
----> 1 run_vitamin.train(params='params_files_1kHz/params.txt',bounds='params_files_1kHz/bounds.txt',fixed_vals='params_files_1kHz/fixed_vals.txt')

/home/hunter.gabbard/CBC/public_VItamin/provided_models/myenv/lib/python3.6/site-packages/vitamin_b/run_vitamin.py in train(params, bounds, fixed_vals, resume_training)
624 dataLocations = ["%s" % input_dir]
625
--> 626 filenames = sorted(os.listdir(dataLocations[0]), key=lambda x: int(x.split('.')[0].split('_')[-1]))
627 if len(filenames) < num_finished_post:
628 sampler_loc = i + str(j+1)

/home/hunter.gabbard/CBC/public_VItamin/provided_models/myenv/lib/python3.6/site-packages/vitamin_b/run_vitamin.py in (x) 624 dataLocations = ["%s" % input_dir]
625
--> 626 filenames = sorted(os.listdir(dataLocations[0]), key=lambda x: int(x.split('.')[0].split('_')[-1]))
627 if len(filenames) < num_finished_post:
628 sampler_loc = i + str(j+1)

ValueError: invalid literal for int() with base 10: 'resume'

Move detector list out of fixed vals

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Importance sampling

Is your feature request related to a problem? Please describe.
Build in importance sampling into code.

CVAE_model.py in train batch_norm never used

No need to define batch_norm since it is never used.

Training screen output

Describe the solution you'd like
Can you sort out the information that’s output to screen when training (actually for all of the steps)

run_vitamin fails

python run_vitamin.py --gen_train True --train True fails to execute. I used the current master-branch to run this command and used the test_sets provided by you. The only thing I changed in params is tot_dataset_size to 5000, so that it generates a bit quicker. I get the output:

17:10 bilby INFO    : Running bilby version: 0.5.5:
module 'basemap' is not installed
Skyplotting functionality is automatically disabled.

... Making training set

Generated: ./training_sets_3det_9par_256Hz/tset_tot-100000_split-1000/data_1000-5000.h5py ...
Generated: ./training_sets_3det_9par_256Hz/tset_tot-100000_split-1000/data_2000-5000.h5py ...
Generated: ./training_sets_3det_9par_256Hz/tset_tot-100000_split-1000/data_3000-5000.h5py ...
Generated: ./training_sets_3det_9par_256Hz/tset_tot-100000_split-1000/data_4000-5000.h5py ...
Generated: ./training_sets_3det_9par_256Hz/tset_tot-100000_split-1000/data_5000-5000.h5py ...

... mass_1 will be inferred
... mass_2 will be inferred
... luminosity_distance will be inferred
... geocent_time will be inferred
... theta_jn will be inferred
... ra will be inferred
... dec will be inferred


... Loading test sample -> ./test_sets/all_4_samplers/test_dynesty1/all_4_samplers_0.h5py

... Loading test sample -> ./test_sets/all_4_samplers/test_dynesty1/all_4_samplers_1.h5py
Traceback (most recent call last):
  File "run_vitamin.py", line 1227, in <module>
    train(params,bounds,fixed_vals)
  File "run_vitamin.py", line 706, in train
    XS_all = np.vstack((XS_all,np.expand_dims(XS[:params['n_samples'],:], axis=0)))
  File "<__array_function__ internals>", line 6, in vstack
  File "/home/marlin/environments/vitaminb/lib/python3.6/site-packages/numpy/core/shape_base.py", line 283, in vstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 262 and the array at index 1 has size 1713

Not sure what this bug is caused by.

Print plot location to screen

Can you let the user know where the plots have been saved to. Both in the output to the screen when the code is running and in the guide.

PSD customize

Describe the solution you'd like
We can specify detectors but you should also be able to specify the specific PSD. Right now its hardcoded as the advanced design PSD for LIGO and Virgo.

Params file get overwritten

Describe the bug
Everytime vitamin is imported, the params file is overwritten.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Import error for Geos

Don't know if this is a special case, but I have Basemap installed, but not working because Geos failed. In that case I don't get a ModuleNotFoundError but an ImportError. I would therefore suggest to replace line 28 in __init__.py of the main folder
except ModuleNotFoundError:
by
except (ModuleNotFoundError, ImportError):

Remove unnecessary params in params file

Describe the solution you'd like
I also notice that the params file contains a lot of things it shouldn’t contain. For example, you can drop the boost params and the weighted_pars

Add tensorflow warning flag

add flag in params for disabling tensorflow warnings.

Corner parameter labels

Describe the bug
Cornercorner variable used for corner plot labels is a bit broken. Needs to be in right order I believe. Will need to check if this is the case.

Initialise gaussian reconstruction loss in CVAE_model.py

On line 571 you probably want to have an else statement that sets the reconstruction_loss_gauss = 0.0 if gauss_len is not >0.

y normscale hardcoding

Describe the solution you'd like
Also y_normscale looks strange and can probably be hardcoded rather than an option.

Cannot run on custom data

When trying to train the CVAE on data I created on my own, I get the error posted below.
Note that I'm trying to run the code in Python 3.7, which is officially not supported. However, I got the same error when I was trying to run the code locally in Python 3.6. On this local machine on the other hand I do not have access to a graphics card and was running it on the CPU, which is not supported for training if I remember correctly.

As it is a re-shaping error I guess it is due to the data-format I'm using and that the training data isn't quite in the correct shape. I deduced, that your code expects the training data to be of shape (number training samples, number detectors, number samples per timeseries). My custom training-data contains only the keys ['rand_pars', 'snrs', 'x_data', 'y_data_noisefree', 'y_data_noisy', 'y_normscale'] with respective shapes [(9,), (1000,3), (1000,1,9), (1000, 3, 256), (1000, 3, 256), ()]
For the test data I deduced that the code expects single samples and thus the data to be of shape (number detectors, number samples per timeseries). My test set therefore contains the same keys as the training set, which have the respective shapes [(9,), (3,), (1,9), (3, 256), (3, 256), ()].

The full error message:

WARNING:tensorflow:AutoGraph could not transform <function train.<locals>.truncnorm at 0x14c61ddb7c80> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: Bad argument number for Name: 4, expecting 3
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

... Training Inference Model

Traceback (most recent call last):
  File "/work/marlin.schaefer/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/work/marlin.schaefer/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/work/marlin.schaefer/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 128 values, but the requested shape has 64
	 [[{{node Reshape}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/work/marlin.schaefer/projects/collab_glasgow/vitamin_b/vitamin_b/run_vitamin.py", line 1471, in <module>
    train(params,bounds,fixed_vals)
  File "/work/marlin.schaefer/projects/collab_glasgow/vitamin_b/vitamin_b/run_vitamin.py", line 888, in train
    XS_all,snrs_test) 
  File "/work/marlin.schaefer/projects/collab_glasgow/vitamin_b/vitamin_b/models/CVAE_model.py", line 734, in train
    session.run(minimize, feed_dict={bs_ph:bs, x_ph:next_x_data, y_ph:next_y_data, ramp:rmp}) 
  File "/work/marlin.schaefer/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 958, in run
    run_metadata_ptr)
  File "/work/marlin.schaefer/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1181, in _run
    feed_dict_tensor, options, run_metadata)
  File "/work/marlin.schaefer/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/work/marlin.schaefer/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 128 values, but the requested shape has 64
	 [[node Reshape (defined at /projects/collab_glasgow/vitamin_b/vitamin_b/models/CVAE_model.py:618) ]]

Original stack trace for 'Reshape':
  File "/projects/collab_glasgow/vitamin_b/vitamin_b/run_vitamin.py", line 1471, in <module>
    train(params,bounds,fixed_vals)
  File "/projects/collab_glasgow/vitamin_b/vitamin_b/run_vitamin.py", line 888, in train
    XS_all,snrs_test)
  File "/projects/collab_glasgow/vitamin_b/vitamin_b/models/CVAE_model.py", line 618, in train
    con = tf.reshape(tf.math.reciprocal(temp_var_r2_sky),[bs_ph])   # modelling wrapped scale output as log variance - only 1 concentration parameter for all sky
  File "/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py", line 193, in reshape
    result = gen_array_ops.reshape(tensor, shape, name)
  File "/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8087, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
    attrs=attr_protos, op_def=op_def)
  File "/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3327, in _create_op_internal
    op_def=op_def)
  File "/envs/vitamin3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1791, in __init__
    self._traceback = tf_stack.extract_stack()

Documentation params file description

Describe the solution you'd like
On the website the comments next to the params in the example params file wraps round the page and looks strange.

r1 init inconsistency

When initialising r1 n_hlayers and n_conv are passed but not used.

y_normscale number change

Change y_normscale number to a more reasonalble number (e.g. 36.0).

Training NaNs

Describe the bug
During training, if the model has been running a long time it will occasionally return NaNs. I suspect this is related to the SMALL_CONSTANT variable in CVAE_model.py.

hardcoded number of samples per file

I think that the train_file_idx variable is generated assuming that there are 1000 time series stored per training data file.

This may be correct (now) but should be fixed to be general.

VICI variable names still present

Best to remove all VICI references

Surprise bilby output

Describe the solution you'd like
Can you try to surprise the bilby output to the screen? It looks strange and actually slows the generation down since it takes longer to print to the screen than it does to generate the signals.

r1 weights and biases naming redundancy

The convolutional weight and bias names have historical + '1' added to them which is not required. This is also the case in the q and r2 networks.

Plus, there are lots of commented out code that can get deleted when creating the weights.

Add a box with spin

Be careful about the bounds. Can't go too high (e.g. 0.8 or so).

Priors might not have to be uniform.

Angles are going to be geometric priors

Amplitude might be an issue.

Look in ligo papers to find out how each of these parameters is distributed.

move make_params_files up to main directory

Should make_params_files.py be moved to its parent directory, as it tries to create the params_files directory it is stored in? Or should it simply not try to create that directory?

CVAE_model.py repeated/overwritten calculation

Lines 618 to 621 seem to do a similar thing twice overwriting the first thing.

It puts the r2 means and scales back into a single object but one version uses the sky and the other doesn't.

Units to fixed vals and bounds files

Describe the solution you'd like
Can you add units to the bounds and fixed val file comments.

Params files .json format

change the format of the params files into a .json format.

In CVAE_model.py the truncated normal reconstruction loss is ugly

I thought that I'd changed this to be less ugly so that you can just call TruncatedNormal once over the appropriate parameters and then evaluate the log_prob and sum over the dimensions (as done in the von Mises case).

I think it would be sensible to do that here.

Can't install geos on some machines

Describe the bug
Geos will not install on some machines due to a known bug within geos3.3.3.

You can get around this bug by the following solution: https://askubuntu.com/questions/465550/can-not-compile-without-isnan-function-or-macro-when-trying-to-compile-geos-on

The whole basemap and geos installation takes so long and we only use it because ligo skymap was such a pain to use. We should get someone to help us just use ligo skymap so we can ditch basemap and geos.

reference geocent time is still static

Describe the bug
Reference time is still fixed.

Have the code NOT deal in RA and DEC, but rather deal within an Earth-centric sky coordinate system. (e.g. latitude and longitude on the Earth).

To compute that take the gps time and sky position and convert that to the location above the earth using lal.

Latitude and longitude then gets computed back to RA and DEC.

Stop using bilby for data generation

Describe the solution you'd like
We currently use bilby for making the data and generating comparison posteriors. Ultimately we don’t want to be reliant on bilby at all. The data generation shouldn’t be done using bilby in the future - we only did it so that we could compare directly. At some point we need to move to just using lalsimulation. The job of comparing with bilby should also be separate to vitamin in the future.

In CVAE_model.py do check on data and bounds

When normalising the data to be within the bounds and then scaling to be on between 0 and 1, it would be safe to check if the data is actually within the bounds.