ProtLGN is pre-trained on wild-type proteins for AA-type denoising tasks with equivariant graph neural networks to derive the joint distribution of the recovered AA types (red).
For a protein to mutate, the predicted probabilities suggest the fitness score for associated mutations (blue).
With additional mutation evaluations from wet biochemical assessments, the pre-trained model can be updated to better fit the specific protein and protein functionality (green).
Please follow these simple example steps to get start! 😊
see requirements.txt
for more detail.
We use the dataset from CATH 4.2
, you can download from https://www.cathdb.info/.
cd <your dir>
wget
mkdir -p data/cath_k10/raw
see script/build_cath_dataset.sh
see run_pretrain.sh
You can use your own checkpoint for zero-shot inference.
Data map:
|—— eval_dataset
|——|—— DATASET
|——|——|—— Protein1
|——|——|——|—— Protein1.tsv (DMS file)
|——|——|——|—— Protein1.pdb (pdb file)
|——|——|——|—— Protein1.fasta (sequence)
|——|——|—— Protein2
|——|——|——|...
see script/build_mutant_dataset.sh
see script/mutant_predict.sh
Please cite our paper:
@article {Zhou2023ProtLGN,
author = {Bingxin Zhou and Lirong Zheng and Banghao Wu and Yang Tan and Outongyi Lv and Kai Yi and Guisheng Fan and Liang Hong},
title = {Protein Engineering with Lightweight Graph Denoising Neural Networks},
elocation-id = {2023.11.05.565665},
year = {2023},
doi = {10.1101/2023.11.05.565665},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2023/11/05/2023.11.05.565665},
eprint = {https://www.biorxiv.org/content/early/2023/11/05/2023.11.05.565665.full.pdf},
journal = {bioRxiv}
}
Distributed under the MIT License. See LICENSE.txt
for more information.