Code Monkey home page Code Monkey logo

mso's Introduction

Molecular Swarm Optimization (MSO)

Implementation of the method proposed in the paper "Efficient Multi-Objective Molecular Optimization in a Continuous Latent Space" by Robin Winter, Floriane Montanari, Andreas Steffen, Hans Briem, Frank Noé and Djork-Arné Clevert.1

Dependencies

Installing

cd mso
pip install .

Getting Started

As a first simple experiment, we will optimize a query molecule with respect to the drug likeness score (QED Bickerton et al.). We will start the optimization from a simple benzene molecule which has a QED score of 0.31.

from mso.optimizer import BasePSOptimizer
from mso.objectives.scoring import ScoringFunction
from mso.objectives.mol_functions import qed_score
from cddd.inference import InferenceModel
infer_model = InferenceModel() # The CDDD inference model used to encode/decode molecular SMILES strings to/from the CDDD space. You might need to specify the path to the pretrained model (e.g. default_model)
init_smiles = "c1ccccc1" # SMILES representation of benzene
scoring_functions = [ScoringFunction(func=qed_score, name="qed", is_mol_func=True)] # wrap the drug likeness score inside a scoring function instance

After loading some packages and defining the inference model, starting point and objective function, we can create an instance of the Particle Swarm Optimizer. Here we only utilize one swarm with 100 particles.

opt = BasePSOptimizer.from_query(
    init_smiles=init_smiles,
    num_part=200,
    num_swarms=1,
    inference_model=infer_model,
    scoring_functions=scoring_functions)

Now we can run the optimization just for a few steps.

opt.run(20)

The best results are summarized in opt.best_solutions. The optimization history (best solution at each step in each swarm) is summarized in opt.best_fitness_history. Most of the time, the optimizer should be able to find a solution with a score higher than 0.8 already after a few steps.

Desirability Scaling

Often, the goal is not to maximize a function as much as possible but to keep a molecular property within a certain range. To account for this, the ScoringFunction class can rescale the output of an objective function with respect to a desirability curve. To demonstrate this functionality, here we optimize the number of heavy atoms in a molecule. We would like to generate molecules that have a certain number (or range) of heavy atoms. In this case, generated molecules should have between 20 and 25 heavy atoms. To achieve this, we define a desirability curve that has its peak in this range and assigns lower scores below and above:

from mso.objectives.mol_functions import heavy_atom_count
hac_desirability = [{"x": 0, "y": 0}, {"x": 5, "y": 0.1}, {"x": 15, "y": 0.9}, {"x": 20, "y": 1.0}, {"x": 25, "y": 1.0}, {"x": 30, "y": 0.9,}, {"x": 40, "y": 0.1}, {"x": 45, "y": 0.0}]
scoring_functions = [ScoringFunction(heavy_atom_count, "hac", desirability=hac_desirability, is_mol_func=True)]

The resulting curve looks like this:

And indeed, running the optimizer for a few steps results in a molecules with the optimal amound of heavy atoms.

Multi-Objective Optimization

To optimize multiple objective functions at the same time, they can be append to the same list.

scoring_functions = [ScoringFunction(heavy_atom_count, "hac", desirability=hac_desirability, is_mol_func=True), ScoringFunction(qed_score, "qed", is_mol_func=True)]

Optionally, an individual weight can be assigned to each scoring function to balance their importance.

Constrained Optimization

Sometimes it might be of interest to constrain the chemical space to a certain region during the optimization. This can be done, for example, by applying a substructure constrain. In this example optimize again for QED and a defined range of heavy atoms but penalize for solutions that have a benzene substructure. Moreover, to avoid generating large macrocycles we also penalize for them. The necesarry functions are included in the mol_functions module:

from mso.objectives.mol_functions import substructure_match_score, penalize_macrocycles
from functools import partial
substructure_match_score = partial(substructure_match_score, query=Chem.MolFromSmiles("c1ccccc1")) # use partial to define the additional argument (the substructure) 
miss_match_desirability = [{"x": 0, "y": 1}, {"x": 1, "y": 0}] # invert the resulting score to penalize for a match.
scoring_functions = [
    ScoringFunction(heavy_atom_count, "hac", desirability=hac_desirability, is_mol_func=True),
    ScoringFunction(qed_score, "qed", is_mol_func=True),
    ScoringFunction(substructure_match_score, "miss_match",desirability=miss_match_desirability, is_mol_func=True),
    ScoringFunction(penalize_macrocycles, "macro", is_mol_func=True)
]


Writing your own Scoring Function

The ScoringFunction class can wrap any function that has following properties:

  • Takes a RDKit mol object as input and returns a number as score.
  • Takes the CDDD positions of the particles in a swarm as input [num_particels, num_dim] and returns an array of scores [num_particels].

For examples, see the modules mso.objectives.mol_functions and mso.objectives.emb_functions.

References

[1] Chemical Science, 2019, DOI: 10.1039/C9SC01928F https://pubs.rsc.org/en/content/articlelanding/2019/SC/C9SC01928F#!divAbstract

mso's People

Contributors

jrwnter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mso's Issues

failed to load reward_penalized_log_p score

There seems to be an import cdddswarm.data.sascorer here which throws an exception and prevents the penalized logp reward from loading. networkx does not throw the exception as is suggested. It looks like this should just be from mso.data import sascorer

sklearn error [fixed], and bace calculation error

I'm trying to use the bace_score_512 and created it as the only scoring function: ScoringFunction(func=bace_score_512, name="BACE", is_mol_func=False)

However, I'm receiving this error:

Traceback (most recent call last): File "test.py", line 33, in <module> swarms, EGFR = opt.run(100) File "C:\Users\nicka\anaconda3\envs\cddd\lib\site-packages\mso\optimizer.py", line 119, in run self.update_fitness(swarm) File "C:\Users\nicka\anaconda3\envs\cddd\lib\site-packages\mso\optimizer.py", line 51, in update_fitness unscaled_scores, scaled_scores, desirability_scores = scoring_function(swarm.x) File "C:\Users\nicka\anaconda3\envs\cddd\lib\site-packages\mso\objectives\scoring.py", line 89, in __call__ unscaled_scores = self.func(input) File "C:\Users\nicka\anaconda3\envs\cddd\lib\site-packages\sklearn\svm\_base.py", line 317, in predict return predict(X) File "C:\Users\nicka\anaconda3\envs\cddd\lib\site-packages\sklearn\svm\_base.py", line 335, in _dense_predict X, self.support_, self.support_vectors_, self._n_support, AttributeError: 'SVR' object has no attribute '_n_support'

What version of scikit-learn was used? I installed cddd first as instructed, downloaded the inference model, then installed mso, and now I'm receiving this error perhaps due to incorrect versioning.

EDIT: Correct version is provided in the comment below.

possible redundancy in _next_step_and_evaluate

Is there a reason why this cyclic conversion takes place: swarm.x -> swarm.smiles-> swarm.x and not update swarm.x directly? In other words, is line 70 really necessary?

mso/mso/optimizer.py

Lines 68 to 71 in 992b46d

smiles = self.infer_model.emb_to_seq(swarm.x)
swarm.smiles = smiles
swarm.x = self.infer_model.seq_to_emb(swarm.smiles)
swarm = self.update_fitness(swarm)

Bug with SA scorer

The sascorer in the following code does not seem to be imported:

def sa_score(mol):
"""
Synthetic acceptability score as proposed by Ertel et al..
"""
try:
score = sascorer.calculateScore(mol)
except:
score = 0
return score

Given that all exceptions are caught, this fails silently and sa_score always returns a score of 0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.