Code Monkey home page Code Monkey logo

protlgn's Introduction

ProtLGN

Protein Engineering with Lightweight Graph Denoising Neural Networks


Explore the docs »

View Demo · Report Bug · Request Feature

About The Project

ProtLGN is pre-trained on wild-type proteins for AA-type denoising tasks with equivariant graph neural networks to derive the joint distribution of the recovered AA types (red).

For a protein to mutate, the predicted probabilities suggest the fitness score for associated mutations (blue).

With additional mutation evaluations from wet biochemical assessments, the pre-trained model can be updated to better fit the specific protein and protein functionality (green).

Logo

(back to top)

Getting Started

Please follow these simple example steps to get start! 😊

Prerequisites

see requirements.txt for more detail.

Pre-train ProtLGN

Step 1: get raw dataset

We use the dataset from CATH 4.2, you can download from https://www.cathdb.info/.

cd <your dir>
wget 
mkdir -p data/cath_k10/raw

Step 2: build graph dataset

see script/build_cath_dataset.sh

Step 3: run pre-train

see run_pretrain.sh

Zero-shot prediction for mutant sequences

You can use your own checkpoint for zero-shot inference.

Step 1: Prepare mutant dataset

Data map:

|—— eval_dataset
|——|—— DATASET
|——|——|—— Protein1
|——|——|——|—— Protein1.tsv (DMS file)
|——|——|——|—— Protein1.pdb (pdb file)
|——|——|——|—— Protein1.fasta (sequence)
|——|——|—— Protein2
|——|——|——|...

see script/build_mutant_dataset.sh

Step 2: Zero-shot

see script/mutant_predict.sh

Contributing

Please cite our paper:

@article {Zhou2023ProtLGN,
	author = {Bingxin Zhou and Lirong Zheng and Banghao Wu and Yang Tan and Outongyi Lv and Kai Yi and Guisheng Fan and Liang Hong},
	title = {Protein Engineering with Lightweight Graph Denoising Neural Networks},
	elocation-id = {2023.11.05.565665},
	year = {2023},
	doi = {10.1101/2023.11.05.565665},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2023/11/05/2023.11.05.565665},
	eprint = {https://www.biorxiv.org/content/early/2023/11/05/2023.11.05.565665.full.pdf},
	journal = {bioRxiv}
}

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

protlgn's People

Contributors

tyang816 avatar bzho3923 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.