ProtLGN

Protein Engineering with Lightweight Graph Denoising Neural Networks

Explore the docs »

View Demo · Report Bug · Request Feature

About The Project

ProtLGN is pre-trained on wild-type proteins for AA-type denoising tasks with equivariant graph neural networks to derive the joint distribution of the recovered AA types (red).

For a protein to mutate, the predicted probabilities suggest the fitness score for associated mutations (blue).

With additional mutation evaluations from wet biochemical assessments, the pre-trained model can be updated to better fit the specific protein and protein functionality (green).

(back to top)

Getting Started

Please follow these simple example steps to get start! 😊

Prerequisites

see requirements.txt for more detail.

Pre-train ProtLGN

Step 1: get raw dataset

We use the dataset from CATH 4.2, you can download from https://www.cathdb.info/.

cd <your dir>
wget 
mkdir -p data/cath_k10/raw

Step 2: build graph dataset

see script/build_cath_dataset.sh

Step 3: run pre-train

see run_pretrain.sh

Zero-shot prediction for mutant sequences

You can use your own checkpoint for zero-shot inference.

Step 1: Prepare mutant dataset

Data map:

|—— eval_dataset
|——|—— DATASET
|——|——|—— Protein1
|——|——|——|—— Protein1.tsv (DMS file)
|——|——|——|—— Protein1.pdb (pdb file)
|——|——|——|—— Protein1.fasta (sequence)
|——|——|—— Protein2
|——|——|——|...

see script/build_mutant_dataset.sh

Step 2: Zero-shot

see script/mutant_predict.sh

Contributing

Please cite our paper:

@article {Zhou2023ProtLGN,
	author = {Bingxin Zhou and Lirong Zheng and Banghao Wu and Yang Tan and Outongyi Lv and Kai Yi and Guisheng Fan and Liang Hong},
	title = {Protein Engineering with Lightweight Graph Denoising Neural Networks},
	elocation-id = {2023.11.05.565665},
	year = {2023},
	doi = {10.1101/2023.11.05.565665},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2023/11/05/2023.11.05.565665},
	eprint = {https://www.biorxiv.org/content/early/2023/11/05/2023.11.05.565665.full.pdf},
	journal = {bioRxiv}
}

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

zinc-x / protlgn Goto Github PK

protlgn's Introduction

ProtLGN

Protein Engineering with Lightweight Graph Denoising Neural Networks

About The Project

Getting Started

Prerequisites

Pre-train ProtLGN

Step 1: get raw dataset

Step 2: build graph dataset

Step 3: run pre-train

Zero-shot prediction for mutant sequences

Step 1: Prepare mutant dataset

Step 2: Zero-shot

Contributing

License

protlgn's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent