Code Monkey home page Code Monkey logo

semai's Introduction

SEMA

SEMA (Spatial Epitope Modelling with Artificial intelligence) is a tool for conformational B-cell eptiope prediction from the primary protein sequence or tertiary structure. SEMA involves the use of sequence-based (SEMA-1D) and structure-based (SEMA-3D) approaches. SEMA-1D model is based on an ensemble of Esm-1v transformer deep neural network protein language models. SEMA-3D model is based on an ensemble of inverse folding models, Esm-IF1. Both models were fine-tuned to predict the antigen interaction propensity of the amino acid (AA) residue with Fab regions of immunoglobulins. SEMA provides an interpretable score indicating the log-scaled expected number of contacts with antibody residues.

SEMA is also availble via web-interface.

Disclaimer:

This code is provided under MIT License

Data

The entire data set with contact number per residue can be downloaded from the link.
Dataset contains following columns:

  • pdb_id — identificator in the PDB database
  • resi_pos — the residue position in the PDB structure
  • resi_name — amino acid 3-letter name
  • res_aa — amino acid symbol
  • anigen_chain — name of the anigen chain in the PDB structure
  • fab_chains — names of the antibody chains in the PDB structure
  • contact_number_R1=i_R2=j — contact number values calcualted as the number of antibody residues in contact with any atom of antigen residues within the distance radius R1. Residues between R1 and R2 have a zero contact number. Residues, which located outside R2 distance radius, have a '-100' value.

You can generate your own dataset with different R1 and R2 using scripts in the dataset_generation directory.

The directory data contains example of training and test sets and example of pdb-file for SEMA-3D inference.

Environment creation

Python 3.8 is required.

virtualenv sema_env
source sema_env/bin/activate
pip install -r requirements.txt

Training the models

For trainig the model you can use Jupyter Notebooks SEMA-1D_finetuning or SEMA-3D_finetuning.

In case you will be training SEMA-3D model, you will need to download additional data:
pickle-file with a dataset of teritic structures or an original set of processed pdb-files.

Inference

Prepare your model weights or download ours:

  1. Create direcrory models: mkdir models.
  2. Go into the directory: cd models
  3. Download weights for SEMA-1D:
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_0.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_0.pth 
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_1.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_1.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_2.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_2.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_3.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_3.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_4.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_4.pth

or SEMA-3D:

wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_0.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_0.pt
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_1.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_1.pt 
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_2.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_2.pt 
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_3.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_3.pt 
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_4.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_4.pt

Next, run inference using Jupyter Notebooks SEMA-1D_inference or SEMA-3D_inference.

semai's People

Contributors

airi0salnikov avatar tanishai avatar tishashkova avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.