Code Monkey home page Code Monkey logo

magnet's Introduction

MAGNet: Motif-Agnostic Generation of Molecules from Shapes

Recent advances in machine learning for molecules exhibit great potential for facilitating drug discovery from in silico predictions. Most models for molecule generation rely on the decomposition of molecules into frequently occurring substructures (motifs), from which they generate novel compounds. While motif representations greatly aid in learning molecular distributions, such methods struggle to represent substructures beyond their known motif set. To alleviate this issue and increase flexibility across datasets, we propose MAGNet, a graph-based model that generates abstract shapes before allocating atom and bond types. To this end, we introduce a novel factorisation of the molecules' data distribution that accounts for the molecules' global context and facilitates learning adequate assignments of atoms and bonds onto shapes. Despite the added complexity of shape abstractions, MAGNet outperforms most other graph-based approaches on standard benchmarks. Importantly, we demonstrate that MAGNet's improved expressivity leads to molecules with more topologically distinct structures and, at the same time, diverse atom and bond assignments.

If you use this repository for your research, please cite:

@article{hetzel2023magnet,
  title={MAGNet: Motif-Agnostic Generation of Molecules from Shapes},
  author={Hetzel, Leon and Sommer, Johanna and Rieck, Bastian and Theis, Fabian and G{\"u}nnemann, Stephan},
  journal={arXiv preprint arXiv:2305.19303},
  year={2023}
}

Molecule Generation Baselines

Besides the MAGNet code, we provide the same interface, package structure and conda environment for all evaluated baselines in hopes of facilitating ease of reproducibility.

Supported Molecule Generation Models:

This repository does not reimplement the models but only provides a wrapper / unified interface to access them. We would thus like to thank the authors of the aforementioned papers for making their code public.

Setup

As a first step, please create the conda environment with

mamba env create -f config/env.yaml
conda activate baselines
pip install -e .

Currently, there are severe problems with installing torchdyn. For the time being, you can do this via

pip install -c config/constraints.txt git+https://github.com/DiffEqML/torchdyn.git

Either you have cloned the repository with the command git clone xyz --recurse-submodules and have the code for required resources, or you can get neccessary repositories by interating through the resources/ folder with the commands

git submodule init
git submodule update
pip install -e .

Further, please adjust the path configuration at baselines/global_utils.py according to your file structure as well as W&B config.

General Structure

We have currently structured the repository s.t. wrappers exist for the individual baselines. We provide the following functionality:

Preprocessing: Before running any training or inference, you should execute experiments/preprocessing.py for any models you would like to work with. In case no ZINC smiles files are provided, this will download and store them autmatically and then extract a vocabulary and preprocess datapoints where neccessary.

Training: After running the preprocessing, you can start training with default hyperparameters for ZINC with experiments/training.py. Where possible, this will log metrics to Weights&Biases and save checkpoints of the trained models.

Reconstruction: With the file experiments/reconstruction.py, you can perform reconstruction on a given set of SMILES.

GuacaMol Distribution Learning Benchmarks: experiments/guacamol_distribution.py executes the distribution learning benchmarks for any trained model by sampling a sufficient amount of molecules and evaluating them.

GuacaMol Goal-Directed Benchmark: experiments/guacamol_goal_directed.py executes the goal-directed benchmarks for models that were not previously trained for any optimization task. This can be done either via MSO or gradient ascent, please refer to the script's arguments for more details and hyperparameters. Please note that the hyperparameters provided for defaults are intended for debugging, to conduct the benchmark properly these have to be adjusted (e.g. the number of MSO steps would have to be increased).

Interpolation: experiments/interpolation.ipynb performs interpolation in the latent space as well as plotting of the trajectories.

magnet's People

Contributors

johannasommer avatar mxmstrmn avatar

Stargazers

John Rachwan avatar  avatar  avatar Tom Wollschläger avatar Mohamed Amine Ketata avatar Nicholas Gao avatar Tobias Schmidt avatar Sirine Ayadi avatar  avatar  avatar  avatar

Watchers

Aleksandar Bojchevski avatar Dominik Fuchsgruber avatar  avatar

magnet's Issues

The data is missing

Hi,

Thank you for sharing the code. I want to run it but I found the filefold data is empty. Could you please provide the data?

Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.