Code Monkey home page Code Monkey logo

deepligand's Introduction

DeepLigand

Data

The 5-fold cross-validation split used in the paper can be downloaded from here. The DeepLigand model provided in this repository is trained on all the five folds combined.

Environment setup

Prerequisites

  • R > 3.3
  • CUDA 8.0 with cudnn 5.1

Conda environment

With the above prerequisites installed, install and activate a Conda environment with all necessary Python packages by:

conda env create -f environment.yml
source activate deepligand
python update_bilm.py

To deactivate this environment:

source deactivate

Preprocess

python preprocess.py -f $INFILE -o $OUTDIR
  • INFILE: a file of MHC-peptide pair to predict on (example). The names of the MHC supported are listed in the first column of this file.
  • OUTDIR: output directory

Predict

python main.py -p $OUTDIR/test.h5.batch -o $OUTDIR/prediction 
  • OUTDIR: output directory

The resulting predictions will be saved as HDF5 dataset under $OUTDIR/prediction in batches. Below is an example of access the dataset in the first batch:

import h5py
with h5py.File('$OUTDIR/prediction/h5.batch1', 'r') as f:
  pred = f['pred'][()]

The dataset (pred) has three columns. The first two columns correspond to the predicted mean and variance (2nd column) of binding affinity between the input peptide and MHC allele. The third column is the predicted probablity that the input peptide is a natural ligand of the input MHC allele.

deepligand's People

Contributors

haoyangz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepligand's Issues

Error when using 10mer peptide as a input

Hi my name is Jong hui Hong

I'm now trying to predict the binding between my peptide seq and mhc

It works well when the length of peptide is 9

But when peptide length was 10

It throws error like this

0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/awork10-3/DeepLigand/elmo_embed.py", line 72, in
context_ids = batcher.batch_sentences(tokenized_context, max_length=args.max_len)
TypeError: batch_sentences() got an unexpected keyword argument 'max_length'
data embedding
running python /awork10-3/DeepLigand/embed_plusrelation_elmo_massspec.py --mhcfile /awork10-3/DeepLigand/output_x/test.mhc --pepfile /awork10-3/DeepLigand/output_x/test.pep.padded --labelfile /awork10-3/DeepLigand/output_x/test.label --relationfile /awork10-3/DeepLigand/output_x/test.relation --masslabelfile /awork10-3/DeepLigand/output_x/test.masslabel --elmodir /awork10-3/DeepLigand/output_x/test.pep.token --elmotag elmo_embeddingds_alltrain.epitope.elmo --mapper /awork10-3/DeepLigand/data/onehot_first20BLOSUM50 --outfileprefix /awork10-3/DeepLigand/output_x/test.h5.batch --expected_pep_len 9
Traceback (most recent call last):
File "/awork10-3/DeepLigand/embed_plusrelation_elmo_massspec.py", line 107, in
embed_all(f1, f2, f3, f4, f5, args.elmodir, args.elmotag, mapper, args.outfileprefix)
File "/awork10-3/DeepLigand/embed_plusrelation_elmo_massspec.py", line 52, in embed_all
assert(exists(join(elmo_dir, 'batch'+str(elmo_cnt)+'.'+elmotag+'.hdf5')))
AssertionError

When I see the training data and example, I think there is no reason that peptide with 10mer
is not working at all
Can somebody suggest me what kind of mistake I made ?

Thanking in advance

Inconsistent Packages version in Prerequisite vs yml

Hello,

I have a query regarding CUDA and CUDNN versions. The prerequisite is "CUDA 8.0 with cudnn 5.1", however, the yml file says otherwise

  - cuda90=1.0=h6433d27_0
  - pytorch=0.3.1=py36_cuda9.0.176_cudnn7.0.5_2

I ignored the Prerequisites and created an environment with the yml file which ran fine also I was able to run the preprocess.py step but when I ran "main.py" I got error :
AssertionError: Found no NVIDIA driver on your system.
Any help on this is appreciated.

Best

How to retrain ELMo on the MS data?

How to retrain ELMo on the MS data? And would other pretrained model like BERT be optional for embedding the peptides amino acids sequence? I found it hard to fit the CBOW model on uniport protein datasets.

Assertion Error in embed_plusrelation_elmo_massspec.py

Hi! I tried to run the preprocess.py with some data however I get this error:

num of batches 95
  0%|                                                                                                                                                                              | 0/94 [00:00<?, ?it/s]data embedding

running python […]/embed_plusrelation_elmo_massspec.py […arguments…]

Traceback (most recent call last):
  File “./DeepLigand/embed_plusrelation_elmo_massspec.py", line 107, in <module>
    embed_all(f1, f2, f3, f4, f5, args.elmodir, args.elmotag, mapper, args.outfileprefix)
  File “./DeepLigand/embed_plusrelation_elmo_massspec.py", line 52, in embed_all
    assert(exists(join(elmo_dir, 'batch'+str(elmo_cnt)+'.'+elmotag+'.h5py')))
AssertionError

./test.pep.token contains all files from batch1 to batch94 but the files that should have been generated by elmo_embed.py seem to be missing. Do you have any idea why that is?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.