Code Monkey home page Code Monkey logo

reinvent's Introduction

REINVENT

Molecular De Novo design using Recurrent Neural Networks and Reinforcement Learning

Searching chemical space as described in:

Molecular De Novo Design through Deep Reinforcement Learning

Video demonstrating an Agent trained to generate analogues to Celecoxib

Notes

The current version is a PyTorch implementation that differs in several ways from the original implementation described in the paper. This version works better in most situations and is better documented, but for the purpose of reproducing results from the paper refer to Release v1.0.1

Differences from implmentation in the paper:

  • Written in PyTorch/Python3.6 rather than TF/Python2.7
  • SMILES are encoded with token index rather than as a onehot of the index. An embedding matrix is then used to transform the token index to a feature vector.
  • Scores are in the range (0,1).
  • A regularizer that penalizes high values of total episodic likelihood is included.
  • Sequences are only considered once, ie if the same sequence is generated twice in a batch only the first instance contributes to the loss.
  • These changes makes the algorithm more robust towards local minima, means much higher values of sigma can be used if needed.

Requirements

This package requires:

  • Python 3.6
  • PyTorch 0.1.12
  • RDkit
  • Scikit-Learn (for QSAR scoring function)
  • tqdm (for training Prior)

Usage

To train a Prior starting with a SMILES file called mols.smi:

  • First filter the SMILES and construct a vocabulary from the remaining sequences. ./data_structs.py mols.smi - Will generate data/mols_filtered.smi and data/Voc. A filtered file containing around 1.1 million SMILES and the corresponding Voc is contained in "data".

  • Then use ./train_prior.py to train the Prior. A pretrained Prior is included.

To train an Agent using our Prior, use the main.py script. For example:

  • ./main.py --scoring-function activity_model --num-steps 1000

Training can be visualized using the Vizard bokeh app. The vizard_logger.py is used to log information (by default to data/logs) such as structures generated, average score, and network weights.

reinvent's People

Contributors

marcusolivecrona avatar getmolmap avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.