Code Monkey home page Code Monkey logo

massa's Introduction

MASSA

Implementation of paper:

Hu, F., Hu, Y., Zhang, W., Huang, H., Pan, Y., & Yin, P. (2023). A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks. Advanced Science, 2301223. https://doi.org/10.1002/advs.202301223

python >3.7.12

Install

scipy-1.7.3 numpy-1.21.5 pandas-1.3.0 scikit__learn-0.24.1 torch-1.10.1 torch_geometric-2.0.3

Data

The data can be downloaded from these links. If you have any question, please contact [email protected].

Pretrain dataset: https://drive.google.com/file/d/1xHUs0B9VuKviBzj-k-203p4a9vEoo1RW/view?usp=sharing Downstream dataset: https://drive.google.com/file/d/10yywJNTQ9Z30B_4uyNfQhnXQdhhdjK3W/view?usp=sharing GNN-PPI data: https://drive.google.com/file/d/1YSXNsTJo-Cdxo08cHLb6ghd6noJJ4y73/view?usp=sharing GNN-PPI pretrained embedding: https://drive.google.com/file/d/1sq2VQGAMWmWg02hqhyWju2xuiJ-oHbq0/view?usp=sharing

Checkpoint

The pre-trained model checkpoint can be downloaded from this link. If you have any question, please contact [email protected].

https://drive.google.com/file/d/1NVxB00THWxKdTZkLM7T6xdQJM_3TFMVr/view?usp=sharing

Usage

You can download this repo and run the demo task on your computing machine.

  • Pre-train model.
cd Multimodal_pretrain/
python src_v0/main.py
  • Fine-tune on downstream tasks using pre-trained models (downstream tasks: stability, fluorescence, remote homology, secondary structure, pdbbind, kinase).
# For example
cd Multimodal_downstream/
python src_stability/main.py
  • Fine-tune on gnn-ppi using pre-trained embedding.
cd Multimodal_downstream/GNN-PPI/
python src_v0/run.py
  • Guidance for hyperparameter selection.

You can select the hyperparameters of the Performer encoder based on your data and task in:

Hyperparameter Description Default Arbitrary range
seq_dim Size of sequence embedding vector 768
seq_hid_dim Size of hidden embedding on sequence encoder 512 [128, 256, 512]
seq_encoder_layer_num Number of sequence encoder layers 3 [3, 4, 5]
struc_hid_dim Size of hidden embedding on structure encoder 512 [128, 256, 512]
struc_encoder_layer_num Number of sequence encoder layers 2 [2, 4, 6]
go_input_dim Size of goterm embedding vector 64
go_dim Size of hidden embedding on goterm encoder 128 [128, 256, 512]
go_n_heads Number of attention heads of goterm encoder 4 [4, 8, 16]
go_n_layers Number of goterm encoder layers 3 [3, 4, 5]

massa's People

Contributors

guardian-in-the-wf avatar siat-code avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.