b-shields / edbo Goto Github PK

View Code? Open in Web Editor NEW

117.0 117.0 40.0 14.84 MB

Experimental Design via Bayesian Optimization

License: MIT License

Python 15.13% Jupyter Notebook 84.83% Shell 0.04%

edbo's People

Contributors

Stargazers

Watchers

Forkers

sudouodo jagarridotorres beef-broccoli zihang-deng 18893483820 alexbw nsf-c-cas graemeclemens cecilepereiratotal lazaridoue jun-li-2020 faight4869 pseudonium v13inc yuhuiharry stephenokj78 dot23 sgbaird zuhalcakir deniszhu1991 armenbeck rnaimehaom aaaaqq7 rvanputt s-a-mattei poclab-web sedaoturak lingjiebao1998 cateye2016 michaelcshn harel-coffee ismaeldiazupm mjzhu-p mariarodrius populuscathayana mrkaapil18 nxccc alan-xh-chem

edbo's Issues

RuntimeWarning: invalid value encountered in sqrt stdev = np.sqrt(model.variance(domain)) +1e-6

Dear guys:
In the demo notebook "edbo_demo_and_simulations.ipynb", I tested for many times that the astitled RuntimeWarning occurs when batch_size or iteration times grows.
Besides, the ploted figure seems weird that the points are out of the curve.
How could I fix this point？

Non stationary kernel

Does EDBO support non-stationary kernel like linear kernel?

Rdkit installation

According to the author (Greg Landrum) I believe the preferred way to install RDKit is
conda install -c conda-forge rdkit
which has the advantage that way you can use Pyton versions 3.8 and higher as well.

Regression ability of gaussian process

Hi Ben,

I am using EDBO to study results of one type of experiments. There are four dimensions in the parameter space. I find that the gaussian process in EDBO has very poor regression performance even for the training set (4 dimensions, around 200 data points in total), and the dataset can be well learned by neural networks as a comparison. I think the reason for the failure of gaussian process is that my dataset is highly "discontinous" with many delta-function like jump between 0 and some finite values, and around 1/3 of data are zeros. Do you think it is reasonable that gaussian process cannot perform very well for a four-dimensional dataset? Thank you very much!

Decoding back to categorical values?

The code doesn't seem to do any conversion from categorical to continuous space, but only in the other direction (e.g. SMILES >> DFT.) That is, it seems to assume that the continuous space, to which the categoricals are encoded, is 'real' space and do all the work in that.
If we were to use this tool to generate a batch of real-world experiments to be carried out, how should we decode the continuous vectors back to actual ligands etc.?

Consider updating README with links to your examples

I've been reading through the paper and got sucked into reading the peer review content as well 😅. What a journey, and nice work!

https://github.com/b-shields/edbo/blob/master/examples/deoxyfluorination_optimization/optimization.ipynb
https://github.com/b-shields/edbo/blob/master/examples/mitsunobu_optimization/optimization.ipynb

I think including links to above in the README would help with visibility

Can we put constraints on the search space?

Hello, I can see in the examples that the chemical space is built in a hypercubic way. However, in my case I cannot reach all the points in this hypercube, instead I will need to put some constraints (e.g. the summation of coords value on all the axis is less than 1).

Just wondering if we are allowed to do that using edbo, thanks!

Passing empty strings as components

Hello,

I have been using edbo to try and optimize some sol-gel reactions in the lab, with some promising first results. I have encountered one issue, however, which is that it doesn't seem possible to pass empty strings as reaction components.
The reason why I'm interested in doing this, is because I would like to try to compare the same reaction with and without ligands for stabilization. Thus I would like to pass a list of the form " 'ligands' : [' ', 'ligand_1', 'ligand_2, ... , 'ligand_n'] ".
Would there be any way to achieve a similar result, or would it be better to pass the surface functional group (i.e. OH- in the case of hydroxide nanoparticles) as a ligand in the case of not adding a ligand?

I hope this question makes sense; if anything was unclear, please feel free to ask for further clarification.

Best,
Fabien

convert the maximum value into the minimum value

Thank you very much for your work. Now I have a question for you, edbo can help us find a maximum point, but now I want to convert the maximum value into the minimum value, what operations do I need to do? Hope to get some specific ideas from you. thank you very much

Input continuous parameter

Hi, thank you for the great tool. This is very useful !!

In the example codes, there are only discrete or categorical (with or without descrioptos) parameters and I cannot find how to use continuous paramters without limitation of grid points.

Is it possible to use continuous parameters in this tool ?

Also I wonder if it is possible to set different grid points from the past experiments conditions. For example, if there are 1.3, 1.5, and 1.7 equivalent of reagents are used in the past experiments, but we want to set 1.2, 1.4, 1.6 equivalent as a candidate values for the next experiment. (I tried and it seems to work, but I'd like to confirm this tool also concider such cases.)

conda error for GPU integration?

Not sure if this is required for GPU or not (I am still at install phase), but
conda install cudatookit=10.1, torchvision -c pytorch
gives following error:
CondaValueError: invalid package specification: cudatookit=10.1,

Is this actually necessary, considering that
conda install -c pytorch pytorch=1.3.1
Seems to install cudatoolkit already?

This is part of the output for me from that install command:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
cffi-1.14.4                |   py37hcd4344a_0         243 KB
cudatoolkit-10.1.243       |       h74a9793_0       300.3 MB
ninja-1.10.2               |   py37h6d14046_0         246 KB
pytorch-1.3.1              |py3.7_cuda101_cudnn7_0       479.7 MB  pytorch
------------------------------------------------------------
                                       Total:       780.5 MB

The following NEW packages will be INSTALLED:

cffi pkgs/main/win-64::cffi-1.14.4-py37hcd4344a_0
cudatoolkit pkgs/main/win-64::cudatoolkit-10.1.243-h74a9793_0
ninja pkgs/main/win-64::ninja-1.10.2-py37h6d14046_0
pycparser pkgs/main/noarch::pycparser-2.20-py_2
pytorch pytorch/win-64::pytorch-1.3.1-py3.7_cuda101_cudnn7_0

Importing unindexed external results

(Posted here at Ben's request from private correspondence. Thanks, Ben!)

I have a question about importing external results: I've looked through all the example notebooks and bro.py but I'm still unable to figure out how to import a .csv file containing existing results, either one with the experimental index numbers or ideally one without them. Specifically, here's what I'm trying to do:

Use BO_express module to easily encode some components with Mordred from the SMILES strings (e.g., ligand, base, solvent, while other variables use numeric encoding)
Specify an external initialization (init_method='external') so that I can include pre-existing data from earlier screening (e.g., a ligand screen with all other variables held constant at levels which are included in the search space)
Populate a .csv file with data from (e.g.) external ligand screen in the same format as the "init" or "round0" files, but ideally not requiring the experiment index numbers since the design is created after the ligand screen was run.
Import the existing results .csv file into BO and use this to initialize the first round of screening.

Where is the dft descriptor extractor?

Hi. Thank you for such a great codes and paper. I read it and I'm trying to use edbo on my project.

But I cannot find the way how to extract descriptor csv without autoqchem.
It's useful to upload molecule's gaussian result log file to autoqchem indeed but my molecule is a kind of classified information so I couldn't get the permission to upload it on the public site.

I hope anyone let me know the way how to get descriptors from gaussian log file without autoqchem.

Thank you.