Code Monkey home page Code Monkey logo

exmol's People

Contributors

aditis44 avatar eltociear avatar geemi725 avatar hgandhi2411 avatar maclandrol avatar navneeth3005 avatar whitead avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

exmol's Issues

Skunk

Refactor out so we don't have code duplication with skunk

CODEX Example

While messing around with CODEX, I noticed it wants to compute ECFP4 fingerprints using a different method and this gives slightly different similarities. @geemi725 could you double-check the ECFP4 implementation we have is correct, or is the CODEX one correct?

image

Docs clean-up

  • Change log for v0.3
  • return value for zinced
  • Notes about ZINC requiring internet

Badges

@geemi725 can you add a pypi and github (link to repo) badge to top of markdown? Thanks!

Separate explain and space

Maybe we should have too functions:

all_exps = cs.explore(...)
exps  = cs.explain(all_exps)

cs.plot_space(exps, all_exps)
cs.plot_exp(exps)

Sanitizing SMILES removes chirality information

On this line of sample_space(), chirality information of origin_smiles is removed. The output is then unsuitable as input to a chirality-aware ML model, e.g. to distinguish L vs. D amino acids which are important in models of binding affinity. Could the option to skip this sanitization step be provided to the user?

PS: Great code base and beautiful visualizations! We're finding it very useful in explaining our Gaussian Process models. The future of SAR โ†โ†’ ML looks exciting.

num_samples consistency

num_samples is counter-intuitive because it is multiplied by number of mutations. Should change it to be what is expected.

Error while plotting counterfactuals using plot_cf()

plot_cf() function errors out with the following error. This behavior is also consistent across all notebooks in paper/.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-b6c8ed26216e> in <module>
      1 fkw = {"figsize": (8, 6)}
      2 mpl.rc("axes", titlesize=12)
----> 3 exmol.plot_cf(exps, figure_kwargs=fkw, mol_size=(450, 400), nrows=1)
      4 
      5 plt.savefig("rf-simple.png", dpi=180)

/gpfs/fs2/scratch/hgandhi/exmol/exmol/exmol.py in plot_cf(exps, fig, figure_kwargs, mol_size, mol_fontsize, nrows, ncols)
    682         title += f"\nf(x) = {e.yhat:.3f}"
    683         axs[i].set_title(title)
--> 684         axs[i].imshow(np.asarray(img), gid=f"rdkit-img-{i}")
    685         axs[i].axis("off")
    686     for j in range(i, C * R):

~/.local/lib/python3.7/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1359     def inner(ax, *args, data=None, **kwargs):
   1360         if data is None:
-> 1361             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1362 
   1363         bound = new_sig.bind(ax, *args, **kwargs)

~/.local/lib/python3.7/site-packages/matplotlib/axes/_axes.py in imshow(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, filternorm, filterrad, resample, url, **kwargs)
   5607                               resample=resample, **kwargs)
   5608 
-> 5609         im.set_data(X)
   5610         im.set_alpha(alpha)
   5611         if im.get_clip_path() is None:

~/.local/lib/python3.7/site-packages/matplotlib/image.py in set_data(self, A)
    699                 not np.can_cast(self._A.dtype, float, "same_kind")):
    700             raise TypeError("Image data of dtype {} cannot be converted to "
--> 701                             "float".format(self._A.dtype))
    702 
    703         if self._A.ndim == 3 and self._A.shape[-1] == 1:

TypeError: Image data of dtype <U14622 cannot be converted to float

TypeError: object of type 'DatasetV1Adapter' has no len()

Hi,

im trying to replicate existing notebooks on solubility.

----> 2 N = len(data)
3 split = int(0.1 * N)
4 test_data = data.take(split).batch(config.batch_size)
5 nontest = data.skip(split)

TypeError: object of type 'DatasetV1Adapter' has no len()

Can you please let me know how to rectify or address.

Thank you in advance.

Error after installation

Hi,

First at all, thank you for your work!. I am obtaining a problem installing your library, o better say when I do "import exmol", I obtaing one error:"No module named 'dataclasses'".

I have installed as: pip install exmol...

Thanks!

Custom Alphabet

Some of the counterfactuals use metals, ions, carbocations. We should have a "basic" alphabet that has no radicals/metals.

Docs Improvements

  • Fix headings in LIME notebooks
  • Fix github button at the top
  • New citation

rdkit install

Instead of making rdkit-pypi a dependency, we could give start-up message on import fail?

API Docs Misaligned

We must have broken something in API sphinx config because argument descriptions aren't cooperating with types. See here

image

SELFIES 2.0

  • Unit tests
  • Paper models
  • bump to 1.0, since not backwards compatible(?)

rectify stoned randint

Original stoned code uses np.random.randint(a,b) which excludes b.
random.randint(a,b) includes b.

Easier usage of explain

Working through some examples, I've noted the following things:

  1. Descriptor type should have a default - maybe MACCS since the plots will show-up
  2. Maybe we should only save SVGs, rather than return unless prompted
  3. We should do string comparison for descriptor types using lowercase strings, so that classic and Classic and ecfp are valid.
  4. We probably shouldn't save without a filename - it is unexpected

Paper notebooks

Hi @geemi725 I modified the plotting code a little to enable composing the plots into more complex figures (rather than having it always create a new figure). This broke some of the paper plotting functions where colorbars are. Can you take a look? See CI log

Improve README

  • Need to use PNGs for README because they do not render on sphinx (do they work on pypi?) as SVGs.
  • Should make example draw the actual structures shown in README

The module 'exmol' has no attribute 'lime_explain'

In the notebook RF-lime.ipynb, the command

exmol.lime_explain(space, descriptor_type=descriptor_type)

gives a error module 'exmol' has no attribute 'lime_explain'

Please, let me know how to fix this error. Thanks.

STONED Selfies Results

Many of the structures are still a bit unreasonable. Can we use ZINC query with tanimoto similarity to replace when valid CFs are essential?

Paper CI

@geemi725 Can you fix the CI errors with paper job? Check the output on main.

BBB Example

The blood-brain barrier notebook calls the dataset "tox" frequently, which can be confusing. Should change name to be BBB

Target molecule frequently on the edge of sample space visualization

In your example provided in the code, the target molecule is on the edge of the sampled distribution (in the PCA plot). I also find this happens very frequently with my experiments on my model. I think this suggests that the sampling produces molecules that are not evenly distributed around the target. I just want to verify that this is a property of the STONED sampling algorithm, and not an artifact of the visualization code (which it does not seem to be). I've attached an example of my own, for both "narrow" and "medium" presets.

preset="narrow", nmols=10

explain_narrow_0 05_10

preset="medium", nmols=10

explain_medium_0 05_10

Make specifying models easier

Maybe take in label and model function and category of model (e.g., classification or regression)

e = cs.class_evaluator(model_fxn)
cs.explain(model_fxn, x, e)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.