Code Monkey home page Code Monkey logo

deepair's Introduction

DeepAIR

python >3.8.13

DeepAIR: a deep-learning framework for effective integration of sequence and 3D structure to enable adaptive immune receptor analysis

Structural-docking-based binding between the adaptive immune receptors (AIRs), including T cell receptor (TCR) and B cell receptor (BCR), and the antigens is one of the most fundamental processes in adaptive immunity. However, current methods for predicting AIR-antigen binding largely rely on sequence-derived features of AIR. In this study, we present a deep-learning framework, termed DeepAIR, for the accurate prediction of AIR-antigen binding by integrating both sequence-derived and structure-derived features of AIRs. DeepAIR consists of three feature encoders, including a trainable-embedding-layer-based gene encoder, a transformer-based sequence encoder, and a pre-trained AlphaFold2-based structure encoder. DeepAIR deploys a gating-based attention mechanism to extract important features from the three encoders, and a tensor fusion mechanism to integrate obtained features for multiple tasks, including the prediction of AIR-antigen binding affinity, AIR-antigen binding reactivity, and the classification of the immune repertoire. We systematically evaluated the performance of DeepAIR on multiple datasets. DeepAIR shows outstanding prediction performance in terms of AUC (area under the ROC curve) in predicting the binding reactivity to various antigens, as well as the classification of immune repertoire for nasopharyngeal carcinoma (NPC) and inflammatory bowel disease (IBD). We anticipate that DeepAIR can serve as a useful tool for characterizing and profiling antigen-binding AIRs, thereby informing the design of personalized immunotherapy.

avatar

Flowchart of DeepAIR. DeepAIR has three major processing stages, including multi-channel feature extraction, multimodal feature fusion, and task-specific prediction. At the multi-channel feature extraction stage, three feature encoders are involved and used to extract informative features from the gene, sequence, and structure inputs. Then the resulting features produced by three different encoders are further integrated via a gating-based attention mechanism as well as the tensor fusion at the multimodal feature fusion stage to generate a comprehensive representation. Finally, at the task-specific prediction stage, specifically designed prediction layers are utilized to map the obtained representations to the output results.

System requirements

Hardware requirements

DeepAIR package requires only a standard computer with enough RAM and a NVIDIA GPU to support operations.

Software requirements

OS requirements

This tool is supported for Linux. The tool has been tested on the following systems:

  • CentOS Linux release 8.2.2.2004
  • Ubuntu 18.04.5 LTS

Python dependencies

DeepAIR mainly depends on the Python scientific stack.

  • The important packages including:
    umap-learn                   0.5.1
    scikit-learn                 0.23.2
    tensorflow-gpu               2.7.0
    biopython                    1.76    
    huggingface-hub              0.2.1
    matplotlib                   3.5.1
    numpy                        1.19.5
    pandas                       1.4.2
    tokenizers                   0.12.1
    transformers                 4.19.4

Pretrained model Requirements

Download ProtBert-BFD model

ProtBert-BFD is based on Bert model which pretrained on a large corpus of protein sequences in a self-supervised fashion, which is avaiable from [https://huggingface.co/Rostlab/prot_bert_bfd/tree/main]

The downloaded model should be stored as:
./ProtTrans/prot_bert_bfd  \
    config.json
    special_tokens_map.json
    tf_model.h5
    tokenizer_config.json
    vocab.txt

Install guide

For docker users

1-Pull docker images from docker-hub.(optional)

docker pull deepair1/deepair:latest
  • If you don't want to show the username, do as follows:
docker tag deepair1/deepair:latest deepair:latest
docker rmi deepair1/deepair:latest

2-Download docker file deepair.tar from GoogleDrive or OneDrive.(optional)

  • Add docker image by loading the docker file:
docker load < deepair.tar

3-Start your docker and run examples:

docker run --name deepair --gpus all -it --rm  deepair:latest /bin/bash
  • If there are multiple GPUs in your device and you just want to use only one GPU:
docker run --name deepair --gpus '"device=0"' -it --rm  deepair:latest /bin/bash
  • Test directly because all files are contained in this docker image:
python ./maincode/DeepAIR_BRP.py
python ./maincode/DeepAIR_BAP.py
python ./maincode/DeepAIR_MIL.py

For conda users

1-Configure the enviroment.

git clone https://github.com/TencentAILabHealthcare/DeepAIR.git 
cd ./DeepAIR
conda create -n deepair python=3.8
conda activate deepair
pip install umap-learn==0.5.1 tensorflow-gpu==2.7.0 scikit-learn==0.23.2 biopython==1.76 huggingface-hub==0.2.1 matplotlib==3.5.1 numpy==1.19.5 pandas==1.4.2 tokenizer==1.0.0 transformers==4.19.4 umap-learn==0.5.1 seaborn==0.10.1
conda install cuda -c nvidia
conda deactivate

2-Download pretrained model and test:

conda activate deepair 
CUDA_VISIBLE_DEVICES=0 python ./maincode/DeepAIR_BRP.py
CUDA_VISIBLE_DEVICES=0 python ./maincode/DeepAIR_BAP.py
CUDA_VISIBLE_DEVICES=0 python ./maincode/DeepAIR_MIL.py

Config file

(1) Edite the ./maincode/config.py file which provides the paths of the obtained DeepAIR models (A well-edited file is given as a default example.)

Runing

(1) For binding reactivity prediciton (BRP) (Classification)

python ./maincode/DeepAIR_BRP.py  \
    --input_data_file  \ # path to the input table 
    --result_folder  \ #  folder to save the results
    --epitope  \ # selected epitope for the evaluation, can be a epitope such as "--epitope A1101_AVFDRKSDAK_EBNA-3B_EBV" 
                 # or a list of epitopes such as "--epitope A1101_AVFDRKSDAK_EBNA-3B_EBV A0201_GILGFVFTL_Flu-MP_Influenza" 
                 # (default = None)
    --AF2_feature_folder  \ # AF2 feature folder
    --transformer_model_folder  \ # folder to save the pretrained BERT model

(2) For binding affinity prediciton (BAP) (Regression)

python ./maincode/DeepAIR_BAP.py 
    --input_data_file  \ # path to the input table 
    --result_folder  \ #  folder to save the results
    --epitope  \ # selected epitope for the evaluation, can be a epitope such as "--epitope A1101_AVFDRKSDAK_EBNA-3B_EBV" 
                 # or a list of epitopes such as "--epitope A1101_AVFDRKSDAK_EBNA-3B_EBV A0201_GILGFVFTL_Flu-MP_Influenza" 
                 # (default = None)
    --AF2_feature_folder  \ # AF2 feature folder
    --transformer_model_folder  \ # folder to save the pretrained BERT model

(3) For immune repertoire classification (Multiple instance learning (MIL))

python ./maincode/DeepAIR_MIL.py 
    --input_data_file  \ # path to the input table (an immnue repertoire of a subject)
    --result_folder  \ #  folder to save the results
    --AF2_feature_folder  \ # AF2 feature folder
    --transformer_model_folder  \ # folder to save the pretrained BERT model
    --task \ # can be one of 'IBD_BCR' (inflammatory bowel disease (BCR)), 'IBD_TCR' (inflammatory bowel disease (TCR)), 'NPC_BCR' (nasopharyngeal carcinoma (BCR)), or 'NPC_TCR'(nasopharyngeal carcinoma (TCR))

Runing examples

(1) For binding reactivity prediciton (BRP) (Classification)

python ./maincode/DeepAIR_BRP.py 

(2) For binding affinity prediciton (BAP) (Regression)

python ./maincode/DeepAIR_BAP.py 

(3) For immune repertoire classification (Multiple instance learning (MIL))

python ./maincode/DeepAIR_MIL.py 

Time cost

Typical install time on a "normal" desktop computer is about 30 minutes.

Exptected run time for infering every sample on a "normal" desktop computer is about 1 second.

Dataset:

Example data are given in ./data

Disclaimer

This tool is for research purpose and not approved for clinical use.

This is not an official Tencent product.

Coypright

This tool is developed in Tencent AI Lab.

The copyright holder for this project is Tencent AI Lab.

All rights reserved.

Citation

Zhao Y, He B, Li C, Xu Z, Su X, Rossjohn J, Song J, Yao J. DeepAIR: a deep-learning framework for effective integration of sequence and 3D structure to enable adaptive immune receptor analysis. bioRxiv.

deepair's People

Contributors

louis-yuzhao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.