ur-whitelab / exmol Goto Github PK

View Code? Open in Web Editor NEW

278.0 278.0 43.0 273.61 MB

Explainer for black box models that predict molecule properties

Home Page: https://ur-whitelab.github.io/exmol/

License: MIT License

Python 38.51% Jupyter Notebook 61.49%

exmol's People

Contributors

Stargazers

Watchers

Forkers

a1ip minghao2016 napoler hgandhi2411 sandipgiri576 maclandrol wxfsd bbyun28 opaya hgandhi-nurix varnerj31 gustavlagneborgold crispae rnaimehaom sxie22 unixjunkie jamesthesnake shiyx409 adityanandy sailfish009 pablo-arantes tsenapathi oiao mwang87 humblesituation164 eltociear stefanomuscat kimist99 mdcao yingli2009 counterfactuals antonsperera mukhtarbayerouniversity violetzihui joseteofilo gaybro8777 apollohuang1 ardeat fl65inc amirtha-montai kntkb

exmol's Issues

Skunk

Refactor out so we don't have code duplication with skunk

While messing around with CODEX, I noticed it wants to compute ECFP4 fingerprints using a different method and this gives slightly different similarities. @geemi725 could you double-check the ECFP4 implementation we have is correct, or is the CODEX one correct?

pypi figures

@hgandhi2411 you need to follow the style of the other figures in README file to get them to render on pypi homepage.

Dataclass Print

Can we override print method in Example to print nicer?

Docs clean-up

Change log for v0.3
return value for zinced
Notes about ZINC requiring internet

Add feature to use multiple base molecules for ECFP descriptors

Would like to exclude ECFP substructures with 1 or 2 atoms, since they are not that useful.

Remove P from alphabet

Badges

@geemi725 can you add a pypi and github (link to repo) badge to top of markdown? Thanks!

Separate explain and space

Maybe we should have too functions:

all_exps = cs.explore(...)
exps  = cs.explain(all_exps)

cs.plot_space(exps, all_exps)
cs.plot_exp(exps)

Sanitizing SMILES removes chirality information

On this line of sample_space(), chirality information of origin_smiles is removed. The output is then unsuitable as input to a chirality-aware ML model, e.g. to distinguish L vs. D amino acids which are important in models of binding affinity. Could the option to skip this sanitization step be provided to the user?

PS: Great code base and beautiful visualizations! We're finding it very useful in explaining our Gaussian Process models. The future of SAR ←→ ML looks exciting.

num_samples consistency

num_samples is counter-intuitive because it is multiplied by number of mutations. Should change it to be what is expected.

Add table of contents

Move from rdkit-pypi to rdkit

Error while plotting counterfactuals using plot_cf()

plot_cf() function errors out with the following error. This behavior is also consistent across all notebooks in paper/.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-b6c8ed26216e> in <module>
      1 fkw = {"figsize": (8, 6)}
      2 mpl.rc("axes", titlesize=12)
----> 3 exmol.plot_cf(exps, figure_kwargs=fkw, mol_size=(450, 400), nrows=1)
      4 
      5 plt.savefig("rf-simple.png", dpi=180)

/gpfs/fs2/scratch/hgandhi/exmol/exmol/exmol.py in plot_cf(exps, fig, figure_kwargs, mol_size, mol_fontsize, nrows, ncols)
    682         title += f"\nf(x) = {e.yhat:.3f}"
    683         axs[i].set_title(title)
--> 684         axs[i].imshow(np.asarray(img), gid=f"rdkit-img-{i}")
    685         axs[i].axis("off")
    686     for j in range(i, C * R):

~/.local/lib/python3.7/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1359     def inner(ax, *args, data=None, **kwargs):
   1360         if data is None:
-> 1361             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1362 
   1363         bound = new_sig.bind(ax, *args, **kwargs)

~/.local/lib/python3.7/site-packages/matplotlib/axes/_axes.py in imshow(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, filternorm, filterrad, resample, url, **kwargs)
   5607                               resample=resample, **kwargs)
   5608 
-> 5609         im.set_data(X)
   5610         im.set_alpha(alpha)
   5611         if im.get_clip_path() is None:

~/.local/lib/python3.7/site-packages/matplotlib/image.py in set_data(self, A)
    699                 not np.can_cast(self._A.dtype, float, "same_kind")):
    700             raise TypeError("Image data of dtype {} cannot be converted to "
--> 701                             "float".format(self._A.dtype))
    702 
    703         if self._A.ndim == 3 and self._A.shape[-1] == 1:

TypeError: Image data of dtype <U14622 cannot be converted to float

Type Annotation Check

Need to actually use mypy to check type annotations.

TypeError: object of type 'DatasetV1Adapter' has no len()

Hi,

im trying to replicate existing notebooks on solubility.

----> 2 N = len(data)
3 split = int(0.1 * N)
4 test_data = data.take(split).batch(config.batch_size)
5 nontest = data.skip(split)

TypeError: object of type 'DatasetV1Adapter' has no len()

Can you please let me know how to rectify or address.

Thank you in advance.

Use better for tool for displaying whole space

https://twitter.com/ssiddhant_/status/1427335039281352737?s=20

molecule attribution depiction in README

ECFP plot is not returning svg

Error after installation

Hi,

First at all, thank you for your work!. I am obtaining a problem installing your library, o better say when I do "import exmol", I obtaing one error:"No module named 'dataclasses'".

I have installed as: pip install exmol...

Thanks!

Custom Alphabet

Some of the counterfactuals use metals, ions, carbocations. We should have a "basic" alphabet that has no radicals/metals.

Docs Improvements

Fix headings in LIME notebooks
Fix github button at the top
New citation

rdkit install

Instead of making rdkit-pypi a dependency, we could give start-up message on import fail?

API Docs Misaligned

We must have broken something in API sphinx config because argument descriptions aren't cooperating with types. See here

SELFIES 2.0

Unit tests
Paper models
bump to 1.0, since not backwards compatible(?)

Add NOTICE file

per terms in STONED license

Add quiet mode

Sometimes I do not want lots of progress bars.

Output pngs from `plot_descriptors` for notebook display

rectify stoned randint

Original stoned code uses np.random.randint(a,b) which excludes b.
random.randint(a,b) includes b.

Easier usage of explain

Working through some examples, I've noted the following things:

Descriptor type should have a default - maybe MACCS since the plots will show-up
Maybe we should only save SVGs, rather than return unless prompted
We should do string comparison for descriptor types using lowercase strings, so that classic and Classic and ecfp are valid.
We probably shouldn't save without a filename - it is unexpected

Replace [Nop] token with [nop] in solubility example

Make padding token be consistent with Selfies. Also [Nop] is invalid in SELFIES(?)

Release 0.5.0 on pypi

Are you planning to release 0.5.0 on pypi? I am maintaining the conda package of exmol and I would like to bump it to 0.5.0. See https://github.com/conda-forge/exmol-feedstock

Thanks!

Add sphinx docs

Update GCN predictor function

Add ["Nop"] token in solubility example

Currently molecules are padded with zeroth token which is not necessarily ['nop'] token

Paper notebooks

Hi @geemi725 I modified the plotting code a little to enable composing the plots into more complex figures (rather than having it always create a new figure). This broke some of the paper plotting functions where colorbars are. Can you take a look? See CI log

Improve README

Need to use PNGs for README because they do not render on sphinx (do they work on pypi?) as SVGs.
Should make example draw the actual structures shown in README

Simplify model notation

Wrap model functions in lambda x,y: model(x) if we detect it only takes 1 argument.

The module 'exmol' has no attribute 'lime_explain'

In the notebook RF-lime.ipynb, the command

exmol.lime_explain(space, descriptor_type=descriptor_type)

gives a error module 'exmol' has no attribute 'lime_explain'

Please, let me know how to fix this error. Thanks.

STONED Selfies Results

Many of the structures are still a bit unreasonable. Can we use ZINC query with tanimoto similarity to replace when valid CFs are essential?

chemed not in pubchem

If the starting molecule is not in pubchem, it just dies.

Add GNN-based example

Generate vectorized images for for the paper

Use insert_svg to generate new figs.

Paper CI

@geemi725 Can you fix the CI errors with paper job? Check the output on main.

BBB Example

The blood-brain barrier notebook calls the dataset "tox" frequently, which can be confusing. Should change name to be BBB

Target molecule frequently on the edge of sample space visualization

In your example provided in the code, the target molecule is on the edge of the sampled distribution (in the PCA plot). I also find this happens very frequently with my experiments on my model. I think this suggests that the sampling produces molecules that are not evenly distributed around the target. I just want to verify that this is a property of the STONED sampling algorithm, and not an artifact of the visualization code (which it does not seem to be). I've attached an example of my own, for both "narrow" and "medium" presets.

preset="narrow", nmols=10

preset="medium", nmols=10

Add notebooks to docs

Use same approach as maxent

Make specifying models easier

Maybe take in label and model function and category of model (e.g., classification or regression)

e = cs.class_evaluator(model_fxn)
cs.explain(model_fxn, x, e)

Improve README figs

Current RF-simple figure has unwanted white space.

Custom Surrogate Models

Can we enable custom surrogate model features? Just add descriptors? Add docs on this too - @hgandhi2411 thoughts?