kusterlab / prosit Goto Github PK

Prosit offers high quality MS2 predicted spectra for any organism and protease as well as iRT prediction. When using Prosit is helpful for your research, please cite "Gessulat, Schmidt et al. 2019" DOI 10.1038/s41592-019-0426-7

Home Page: https://www.proteomicsdb.org/prosit/

License: Apache License 2.0

Dockerfile 0.57% Makefile 1.63% Python 97.80%

prosit's Introduction

⚠️ WARNING ⚠️
This repo is deprecated! It does not contain the latest updates regarding Prosit!
Code related to model training can be found at dlomix!
The latest models can be found & accessed on Koina!
More functionality (spectral-library generation & rescoring) is available with Oktoberfest!
This functionality is also available on the Prosit website!

Prosit

Prosit is a deep neural network to predict iRT values and MS2 spectra for given peptide sequences. You can use it at proteomicsdb.org/prosit/ without installation.

Hardware

Prosit requires

a GPU with CUDA support

Installation

Prosit requires

Docker 17.05.0-ce
nvidia-docker 2.0.3 with CUDA 8.0 and CUDNN 6 or later installed
make 4.1

Prosit was tested on Ubuntu 16.04, CUDA 8.0, CUDNN 6 with Nvidia Tesla K40c and Titan Xp graphic cards with the dependencies above.

The time installation takes is dependent on your download speed (Prosit downloads a 3GB docker container). In our tests installation time is ~5 minutes.

Model

Prosit assumes your models are in directories that look like this:

model.yml - a saved keras model
config.yml - a model specifying names of inputs and outputs of the model
weights file(s) - that follow the template weights_{epoch}_{loss}.hdf5

You can download pre-trained models for HCD fragmentation prediction and iRT prediction on https://figshare.com/projects/Prosit/35582.

Usage

The following command will load your model from /path/to/model/. In the example GPU device 0 is used for computation. The default PORT is 5000.

make server MODEL_SPECTRA=/path/to/fragmentation_model/ MODEL_IRT=/path/to/irt_model/

Currently two output formats are supported: a MaxQuant style msms.txt not including the iRT value and a generic text file (that works with Spectronaut)

Example

Please find an example input file at example/peptidelist.csv. After starting the server you can run the following commands, depending on what output format you prefer:

curl -F "peptides=@examples/peptidelist.csv" http://127.0.0.1:5000/predict/generic

curl -F "peptides=@examples/peptidelist.csv" http://127.0.0.1:5000/predict/msp

curl -F "peptides=@examples/peptidelist.csv" http://127.0.0.1:5000/predict/msms

The examples take about 4s to run. Expected output files (.generic, .msp and .msms) can be found in examples/.

Using Prosit on your data

You can adjust the example above to your own needs. Send any list of (Peptide, Precursor charge, Collision energy) in the format of /example/peptidelist.csv to a running instance of the Prosit server.

Please note: Sequences with amino acid U, O, or X are not supported. Modifications except "M(ox)" are not supported. Each C is treated as Cysteine with carbamidomethylation (fixed modification in MaxQuant).

Pseudo-code

Load the models given as in the MODEL_X environment variables
Start a server and wait for inputs
On incomming request
- transform peptide list to model input format (numpy arrays)
- predict fragment intensity and iRT with the loaded models for the given peptides
- transform prediction to the requested output format and return response

prosit's People

Contributors

Stargazers

Watchers

Forkers

zrolfs fabianaicheler aimeed90 tobigithub jesseil xiangrong7 andrewjmc tankmermaid latte193 heejongkim wj-zhang uestcmaker wenbostar bioshare deng1689 fgcz y-tagawa xuel12 feldman4 zerodarkzone jmueller95 yulingdai jj-umn bgruening qiangzims hatchingideas jspaezp matthewthe kevinkovalchik guo-xuan starry1998 tzom zzsnow arnscott trishorts benjaminnigjeh nhyda hayoung-hiro manuelperisdiaz liangzhendong123 enric-serra-sanz obryoni nyxflower akhileshkaushal peng113028

prosit's Issues

Runing Error

Error when trying to do prediction, seems like the graph is disconnected? Clone the last version and use your model from figshare.

File "/root/prosit/server.py", line 28, in predict
result = prediction.predict(tensor, model, model_config)
File "/root/prosit/prediction.py", line 14, in predict
model.compile(optimizer="adam", loss="mse")

[omitted.....]

ValueError: Tensor("out_target:0", shape=(?, ?), dtype=float32) must be from the same graph as Tensor("out/Reshape:0", shape=(?, ?), dtype=float32)

prosit prediction benchmark

Not an issue, just nice to know how prosit performs during predictions.

Quick take home:

prosit requires TF, Kreas and cuDNN and one GPU, predictions can not be made on CPU only
increasing batch size for prediction to 10k does not affect run time
adding multiple GPUs will not improve prediction run times
time difference between a 1080Ti and a new 2080Ti is minimal
the GPU during predictions is mostly (90%) idle
only one single CPU core is used all the time during predictions
on a modern GPU/CPU, one million tryptic peptides takes around 15 minutes to process
predicted spectral output file size for one million peptides is 786 Mbyte large
predictions for 560,296 peptides and their 19 million tryptic digests would take 5 hours
the fastest consumer CPU for predictions would be a Intel Core i9-9900KF @ 3.60GHz

Used configuration:
CPU: Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz (8 core, 16 threads)
GPU: NVIDIA GeForce RTX 2080Ti (11 GB memory)
Model: weight_32_0.10211.hdf5

Prosit prediction command for 1 million peptides:

~/prosit$ curl -F "peptides=@examples/prosit-1million.csv" http://127.0.0.1:5000/predict/ > msms-prosit-1million.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  806M  100  785M  100 20.7M   872k  23571  0:15:22  0:15:22 --:--:--  181M

CPU htop, only one single core running full time

GPU, First 1 minute full P2 energy use and utilization of all memory:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:02:00.0  On |                  N/A |
| 31%   55C    P2   239W / 250W |  10736MiB / 10988MiB |     87%      Default |
+-------------------------------+----------------------+----------------------+

GPU, from 2 to 15 minutes, no GPU utilization, only CPU single core:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:02:00.0  On |                  N/A |
| 29%   37C    P8     2W / 250W |  10736MiB / 10988MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Prosit prediction command for 19 million peptides, takes 5:48h and creates a 21 Gbyte file:

~/prosit$ curl -F "peptides=@fasta/uniprot-sprot.csv" http://127.0.0.1:5000/predict/ > msms-uniprot-sprot-19mio.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20.7G  100 20.2G  100  484M  1017k  24293  5:48:37  5:48:36  0:00:01  103M

Tobias

My cluster have Singularity but no Docker. Can I run prosit within Singularity by modifying some configs?

Trained model download

No data to download under 'figshare'

Predicted b/y-ions with ammonia or water loss

Hi,

I am currently comparing my confident experimental spectra with a corresponding predicted spectra from Prosit, for the same precursor. I noticed that the spectral library that I downloaded from the web server, does not provide y/b-ions with ammonia -or water losses. I just wonder if it would be possible to include these ions in the prediction, because it would definitely improve the similarity between experimental and predicted spectra.

I think the tool is amazing, but I just want to help you to make it even better :).

Best,

Marc

Error log 2

Hey !

I am back with some trouble using prosit... How I have tot interpretate this ?
An error occured. Status code: 2
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Traceback (most recent call last): File "oktoberfest/grpc_predict_peptidelist.py", line 36, in disable_progress_bar=True) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/predictPROSIT.py", line 153, in predict_to_hdf5 models=[irt_model, intensity_model]) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/predictPROSIT.py", line 123, in predict self.input.prepare_input(disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 15, in prepare_input self.sequences.prepare_sequences(flag_disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 143, in prepare_sequences self.character_to_array(flag_disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 118, in character_to_array total=len(self.character)): File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tqdm/std.py", line 1107, in iter for obj in iterable: File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/utils.py", line 88, in split_modstring raise ValueError(f"Unknown Element in string: {sequence}. Found Elements: {x}") NameError: name 'x' is not defined make: *** [grpc_predict] Error 8

handling of uncommon amino acids

The human fasta file has around 3447 occurrences of "X" or "uncommon amino acids".
They are not defined in prosit/prosit/constants.py

The code actually should either ignore or define them or use proper error handling when X is observed. Currently the code just breaks with KeyError: 'X' This would increase robustness of the code.

    tensor = tensorize.peptidelist(df)
  File "/root/prosit/tensorize.py", line 56, in peptidelist
    "sequence_integer": get_sequence_integer(df.modified_sequence),
  File "/root/prosit/tensorize.py", line 42, in get_sequence_integer
    array[i, j] = constants.ALPHABET[s]
KeyError: 'X'

Examples from human fasta files

XLAILLTFFHPFLVYRECRTWKESPSAIA
XEARRIKLYRETSIYHNETPDEDKINSYF
XDFRLGSESMTQRELNEKAGGICIAREGL
XMKLDLEDPNLDLNVFMSQEVLPAATSIL
XKDREVAEGGLPRAESPSPGLTPSRRSQF
XRFDVMVNGKGPRRQFPGGRGRGIGAGAI

error message: Unknown Element in string: {sequence}. Found Elements: {x}") NameError: name 'x' is not defined make

I'm submitting a small csv file (~1 MB) but I'm getting this error message. I can't make sense of what seems to be the problem. I searched my input file, and I don't have any sequence with "x" in it.

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Traceback (most recent call last): File "oktoberfest/grpc_predict_peptidelist.py", line 36, in <module> disable_progress_bar=True) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/predictPROSIT.py", line 153, in predict_to_hdf5 models=[irt_model, intensity_model]) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/predictPROSIT.py", line 123, in predict self.input.prepare_input(disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 15, in prepare_input self.sequences.prepare_sequences(flag_disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 143, in prepare_sequences self.character_to_array(flag_disable_progress_bar) File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/inputPROSIT.py", line 118, in character_to_array total=len(self.character)): File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/tqdm/std.py", line 1107, in __iter__ for obj in iterable: File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/__utils__.py", line 88, in split_modstring raise ValueError(f"Unknown Element in string: {sequence}. Found Elements: {x}") NameError: name 'x' is not defined make: *** [grpc_predict] Error 8

Rescoring only allowing one RAW-file in the MAxQuant search

Hi,

as I understand, when using the rescoring functionality of PROSIT, the submitted msms.txt file must come from a MaxQuant search where only a single RAW-file's been run. I think it is a pity though, because it is not very often it is useful to run a MaxQuant search with only one RAW-file. I hope I don´t sound too negative here, I just want to help out.

The rescoring function would be more useful if it could:
(1) Improve a standard database search where several RAW-files been searched in a quantitative workflow (like LFQ).
(2) Create a deeper and more accurate DDA-based spectral library. In this case, several RAW-files would be set up as fractions of a sample in a MaxQuant search, resulting in a single msms.txt file that could go into the PROSIT rescoring. This is what I would like to use the rescoring for, if possible.

I understand that problems can arise if one tries to rescore a MaxQuant search where RAW-files were acquired with different instrument methods or instruments. That could screw up the CE-calibration. But if all files were acquired on the same instrument, with the same instrument method I don't see why it would be a problem to include several RAW-files in the msms.txt. In many (if not most cases) the RAW-files used in a MaxQuant search are acquired by the same instrument method and instrument. So perhaps, one could just warn users of the rescoring function to not upload a msms.txt file where the RAW-files are not acquired by the same method and/or instrument.

Once again, I hope I didn't sound too pessimistic here. I think the idea of rescoring a MaxQuant search or any other search engine with PROSIT is superb.

Download of the human library not working

Hi,

I tried to download the .dlib library for human and apparently it does not work. I tried with several browsers (Safari, Firefox and Chrome).

Anything I'm doing wrong?

BTW, the yeast library download works.

Thanks for your help and have a great WE.

Emmanuel

An error occured so many times when I try to use Prosit to predict spectrum library.So I need your help!

Hi ,
Thank you for your good job!
I have read the article”Generating high quality libraries for DIA MS with empirically corrected peptide predictions ”.And I’m very interested to this project.And I want to predict a library as the article discribed.But ,maybe due to my poor computer setup,I have tried to run Prosit for many times but all failed.So I want to ask for your help to run this .csv file for me and send me the
results bu email [email protected]

Swissprot_human_20200929_reviewed_20375.fasta.trypsin.z3_nce33.zip

.Could you plesase help me ?I’m really sorry to trouble you.

I have uploaded the input file for prosit in the attachment.

Here is the error log:

Error logkeyboard_arrow_downWARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. Process Process-2: Traceback (most recent call last): File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/root/.pyenv/versions/3.6.0/src/converter/converter/spectronaut_conv/converter.py", line 148, in to_csv for spectrum in converted: File "/root/.pyenv/versions/3.6.0/src/converter/converter/spectronaut_conv/converter.py", line 136, in get_converted x = self.queue.get() File "", line 2, in get File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/managers.py", line 757, in _callmethod kind, result = conn.recv() File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer Traceback (most recent call last): File "oktoberfest/convert.py", line 20, in converter.iter_data() File "/root/.pyenv/versions/3.6.0/src/converter/converter/spectronaut_conv/converter.py", line 34, in iter_data conv.convert(pool) File "/root/.pyenv/versions/3.6.0/src/converter/converter/spectronaut_conv/converter.py", line 129, in convert self.queue.put(s) File "", line 2, in put File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/managers.py", line 756, in _callmethod conn.send((self._id, methodname, args, kwds)) File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes self._send(header + buf) File "/root/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe make: *** [create_output] Error 9

Include description of correct features (i.e. -D parameter) in Percolator

Hello,

I am curious about the parameters used in the web version of Prosit for the FDR calculation.

For your publication, I understand the pipeline sets up the flags -Y and -U... I was wondering why you did not also set up the -D flag to "Include description of correct features, i.e. features describing the difference between the observed and predicted isoelectric point, retention time and precursor mass"?

Wouldn't it improve the FDR calculation since Prosit is also using RT information as part of the prediction of fragment spectra and peptide to spectrum matching?

Maybe I am missing something regarding the way the algorithm works and I would love to better understand it.

Best wishes,
Miguel

Problems of training iRT prediction model from scratch using released data

Hi! Prosit is a fascinating method to predict both spectra and iRT. I am really interested in it.

In order to do further research based on Prosit model, I was trying to train the iRT prediction model from scratch using the released code and data. However, I met some difficulties as follows,

I downloaded the iRT model file (model_irt_prediction.zip) from the given link https://figshare.com/projects/Prosit/35582. I found the loss function in config.yml is masked_spectral_distance, but the paper says "the mean squared error was used as loss function".
I downloaded the iRT prediction data file (irt_PROSIT.hdf5) from https://figshare.com/projects/Prosit/35582. I found that "X_train" are iRT values whereas "Y_train" are peptide sequences. Most importantly, the numbers of training, validation, and holdout samples are 349136, 87455, 169339 respectively. And their ratios are 57.6%, 14.4%, and 27.9% respectively, but the paper says "The remaining data were split into 64% training data, 16% test data and 20% holdout data". Is there any misunderstanding I have?
Although the trained weight for iRT prediction is released, I want to get comparable results with released weight by training the model on the released data from scratch. I fixed the aforementioned problems such as changed "masked_spectral_distance" to "mean_squared_error" in the file config.yml and exchanged "X_train" and "Y_train" in irt_PROSIT.hdf5. And then I trained the iRT prediction model on the released data. However, for my trained weight, the loss values (mean squared error) on validation and holdout datasets are 0.0229 and 0.0126, while for released weight, the loss values on validation and holdout datasets are 0.0071 and 0.0054.

Can you give me some suggestions about the settings of training the iRT model and the used data?

How to add other fixed motification except for Cysteine with carbamidomethylation？

Dear developers,
Thank you for developing this tools.
I generated a spectral library by predicting a list of peptides given in a CSV file. I noticed each C is treated as Cysteine with carbamidomethylation and oxidized Methionine with "M(ox)". How about TMT modification (K, +229; N-terminus, +229 )? Can I add other motifications?

Web site and singularity

I am really excited to get started with Prosite, but it will take some time to get this working on HPC (singularity, no root access).

First question: any guidance on setting up with singularity?

Second question: on the website for calibration my file uploads are failing (both MSMS and Raw) -- I've disabled virus protection to no avail. When I open Chrome inspector I see an upload open but no data is transferred. I had no success in Firefox either. I have no issues with large file uploads to other sites (e.g. Dropbox). Any thoughts welcome.

Thanks for your help,

Andrew

Error while calibrating with msms.txt and raw file from orbitrap

Hello!!!

I am trying to create my library from some DDA runs I have searched using Maxquant. So I uploaded the msms.txt and one of my raw files (I don't know exactly which one of my raw file should I pick...) and I got this error:

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/context.py:23: UserWarning: The dask.set_options function has been deprecated. Please use dask.config.set instead warnings.warn("The dask.set_options function has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:326: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used head = reader(BytesIO(b_sample), **kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:64: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used df = reader(bio, **kwargs) WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/context.py:23: UserWarning: The dask.set_options function has been deprecated. Please use dask.config.set instead warnings.warn("The dask.set_options function has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:326: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used head = reader(BytesIO(b_sample), **kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:64: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used df = reader(bio, **kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/multi.py:421: FutureWarning: 'scan_number' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version indicator=indicator) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/base.py:835: UserWarning: The get= keyword has been deprecated. Please use the scheduler= keyword instead with the name of the desired scheduler like 'threads' or 'processes' warnings.warn("The get= keyword has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/local.py:255: FutureWarning: 'scan_number' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version return func(*args2) Traceback (most recent call last): File "oktoberfest/annotation.py", line 28, in <module> annotated = feynman.match.augment(merged_f, "yb", 6) File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/match.py", line 111, in augment matches[i] = match(row, ion_types, charge_max) File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/match.py", line 74, in match forward_sum, backward_sum = get_forward_backward(row.modified_sequence[1:-1]) File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/match.py", line 32, in get_forward_backward masses = [constants.AMINO_ACID[a] for a in amino_acids] File "/root/.pyenv/versions/3.6.0/src/feynman/feynman/match.py", line 32, in <listcomp> masses = [constants.AMINO_ACID[a] for a in amino_acids] KeyError: 'p' make: *** [annotation] Error 4

Please could you help me??

Thanks a lot!!
Trini

Trouble downloading finished Prosit Data

What happened: I started a Prosit search by uploading a CSV file. The file processed to completion and I was able to download the finished zip file. However, after downloading 1GB of 1.4GB, the download said it was finished. I tried to unzip the file but I got an error saying the data was corrupted.

What should have happened: The whole zip file should have downloaded.

Steps to reproduce: Here is the link to the data: https://www.proteomicsdb.org/prosit/task/7B5DCF81B8D4F1945C9FE82B5473177F. Try downloading this data. The 1st gig will download and then it will stop. I have tried to process this dataset two times now and am in the process of running it again. I can provide the CSV file if needed.

CE calibration and rescoring

Hi,
Are you going to share the code used for CE calibration and RESCORING in you prosit public server?
Thanks

runtime issues with different docker and nvidia-docker2 versions

Hi,
I had to upgrade/dowgrade docker and nvidia-docker because of missing images and version for ubuntu16 and currently cuda10. For the current version docker versions on https://download.docker.com/linux/ubuntu/dists/xenial/pool/edge/amd64/ there was no matching nvidia-docker-container so basically I pinned specific versions and installed 18.03 and NVIDIA Docker: 2.0.3

### get available versions
apt-cache madison nvidia-docker2 nvidia-container-runtime
nvidia-docker2 | 2.0.3+docker18.03.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker17.12.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker17.12.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker17.09.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker17.09.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker17.06.2-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker17.03.2-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker1.13.1-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.3+docker1.12.6-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages
nvidia-docker2 | 2.0.2+docker17.12.0-1 | https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64  Packages


nvidia-container-runtime | 2.0.0+docker17.12.1-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker17.12.0-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker17.09.1-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker17.09.0-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  Packages
nvidia-container-runtime | 2.0.0+docker17.06.2-1 | https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64  Packages

Docker and the nvidia-docker2 run fine.

Docker version 18.03.1-ce, build 9ee9f40

and

NVIDIA Docker: 2.0.3
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false

The installation runs fine, i also can use nvidia-smi inside docker

sudo docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi
Thu May 30 03:07:56 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:02:00.0  On |                  N/A |
| 29%   36C    P8     3W / 250W |  10456MiB / 10988MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

I have downloaded the prosit1 model with config.yml, model.yml and weight_32_0.10211.hdf5
however when I run make server MODEL=/home/xxx/prosit/prosit1/
the server will start and greet me, but uploading a file with curl will do so, but then break.
curl -F "peptides=@examples/peptidelist.csv" http://127.0.0.1:5000/predict/

sudo make server MODEL=/home/xxx/prosit/prosit1
nvidia-docker build -qf Dockerfile -t prosit .
sha256:d224c2ac898b32662b6265ae7a37dd1872dd98defe4cf86c6fd3acbe7f006c2e
nvidia-docker run -it \
    -v "/home/xxx/prosit/prosit1":/root/model/ \
    -e CUDA_VISIBLE_DEVICES=0 \
    -p 5000:5000 \
    prosit python3 -m prosit.server
Using TensorFlow backend.
/root/prosit/model.py:38: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
/usr/local/lib/python3.5/dist-packages/keras/engine/saving.py:349: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(yaml_string)
 * Serving Flask app "server" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
[2019-05-30 03:00:20,819] ERROR in app: Exception on /predict/ [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2311, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1834, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1737, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 36, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1832, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1818, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/root/prosit/server.py", line 28, in predict
    result = prediction.predict(tensor, model, model_config)
  File "/root/prosit/prediction.py", line 14, in predict
    model.compile(optimizer="adam", loss="mse")
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 333, in compile
    sample_weight, mask)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training_utils.py", line 403, in weighted
    score_array = fn(y_true, y_pred)
  File "/usr/local/lib/python3.5/dist-packages/keras/losses.py", line 14, in mean_squared_error
    return K.mean(K.square(y_pred - y_true), axis=-1)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 848, in binary_op_wrapper
    with ops.name_scope(None, op_name, [x, y]) as name:
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5770, in __enter__
    g = _get_graph_from_inputs(self._values)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5428, in _get_graph_from_inputs
    _assert_same_graph(original_graph_element, graph_element)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5364, in _assert_same_graph
    original_item))
ValueError: Tensor("out_target:0", shape=(?, ?), dtype=float32) must be from the same graph as Tensor("out/Reshape:0", shape=(?, ?), dtype=float32).

Not sure how to debug this could be Keras of TF incompatibility with CUDA10?
However I have Keras and TF running with different versions successfully outside docker.
Tobias

iRT released training dataset does not contain cysteine

Dear @gessulat , @tkschmidt ,

I download the released iRT training dataset from figshare, but I found that all the amino acids do not contain the cysteine. Is something wrong with me?

Thanks.

Web site spectral library generation - cysteine residue mass

Hi,
I have generated a Prosit spectral library via https://www.proteomicsdb.org/prosit/ and compared the synthetic library (in spectronaut format) to a DDA/OpenSwathWorkflow-based counterpart (n ~5500 peptides).

For most peptides with modifications (oxidized methionine or carbamidomethylation) the PrecursorMz for the Prosit version is lower than OpenSwath. The most common delta PrecursorMz of OpenSwath - Prosit when considering peptide charge state is 57.02. It appears as the web service is not considering all C as treated C with carbamidomethylation.

I am not sure if this is an issue or if I misunderstood something.

An example:

OpenSwath
ADC(UniMod:4)VQTLLLNQQR, PrecursorCharge 2
779.9039
ADC(UniMod:4)VQTLLLNQQR, PrecursorCharge 3
520.2717

Prosit
ADC[Carbamidomethyl (C)]VQTLLLNQQR, PrecursorCharge 2
751.3932
ADC[Carbamidomethyl (C)]VQTLLLNQQR, PrecursorCharge 3
501.2646

https://www.proteomicsdb.org/prosit/task/E62AFBA12ECBF8F4FEAF3D0D530A31BF

Thanks,
Christofer

Figshare model.yaml vs model.yml naming convention

Hi,
the file from figshare model_irt_prediction contains the file model.yaml
this file needs to be renamed in order to run Prosit to model.yml

Source: https://figshare.com/articles/Prosit_-_Model_-_iRT/6965801
File: model_irt_prediction.zip (24.32 MB)

The wrong extension leads to the following error:

make server MODEL_SPECTRA=/home/user/prosit/prosit-msms/ MODEL_IRT=/h
ome/user/prosit/prosit-iRT-old/
nvidia-docker build -qf Dockerfile -t prosit .
sha256:6e9b31cbdd7dae5f13dfa123305fed90ac8d1b39c15a10262e119cd7a68ed6b2
nvidia-docker run -it \
    -v "/home/user/prosit/prosit-msms/":/root/model_spectra/ \
    -v "/home/user/prosit/prosit-iRT-old/":/root/model_irt/ \
    -e CUDA_VISIBLE_DEVICES=0 \
    -p 5000:5000 \
    prosit python3 -m prosit.server
Using TensorFlow backend.
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/prosit/server.py", line 100, in <module>
    trained=True)
  File "/root/prosit/model.py", line 39, in load
    with open(model_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/root/model_irt/model.yml'
Makefile:17: recipe for target 'server' failed
make: *** [server] Error 1
user@nvidia-ngc-image-1-prosit-vm:~/prosit$

Reason: the source code defines the filename as model.yml, see prosit/model.py

Thank you.

HEK293 specific fasta file?

Hi,
Thank you for your good job!
Is there any HEK293 specific fasta file or .dlib format spectra library file (predicted by prosit)?I have noticed that there are some predicted libraries here.But I need HEK293 specific libraries.

Why do some entries have a paranthesis after the fragment charge state?

In some entries, I see a paranthes after what I believe is the fragment charge number, eg y2^2) whereas in other entries I don't see a paranthes, eg b10^2/. Could you clarify what the paranthesis means and why it appears in some entries but not others? The full example of a predicted spectra from which the above two situations has been pulled from is shown below:

Name: AAEGADTTGATPK/3
MW: 397.19469168128995
Comment: Parent=397.19469168128995 Collision_energy=35.0 Mods=0 ModString=AAEGADTTGATPK///3 iRT=-19.559999465942383
Num peaks: 26
147.11281       0.32989058      "y1/0.0ppm"
72.04439        0.047566075     "b1/0.0ppm"
244.16557       1.0     "y2/0.0ppm"
122.586426      0.0008872726    "y2^2)/0.0ppm"
143.0815        0.6619359       "b2/0.0ppm"
345.21326       0.24960361      "y3/0.0ppm"
272.12408       0.062890545     "b3/0.0ppm"
416.25037       0.048209973     "y4/0.0ppm"
329.14557       0.08895395      "b4/0.0ppm"
165.07642       0.0023177553    "b4^2)/0.0ppm"
473.27182       0.16780965      "y5/0.0ppm"
237.13956       0.0058048028    "y5^2)/0.0ppm"
400.18268       0.02851274      "b5/0.0ppm"
574.3195        0.093369454     "y6/0.0ppm"
515.2096        0.035596747     "b6/0.0ppm"
258.10846       0.002202321     "b6^2)/0.0ppm"
675.3672        0.03287529      "y7/0.0ppm"
616.2573        0.0065132547    "b7/0.0ppm"
308.6323        0.003955793     "b7^2)/0.0ppm"
790.3941        0.007055764     "y8/0.0ppm"
359.15613       0.004619954     "b8^2)/0.0ppm"
387.66687       0.0025268008    "b9^2)/0.0ppm"
918.4527        0.003008747     "y10/0.0ppm"
423.18542       0.005045848     "b10^2/0.0ppm"
473.70926       0.0041979477    "b11^2/0.0ppm"
522.23566       0.002945542     "b12^2/0.0ppm"

Rescoring Error: "need at least one array to concatenate make"

Hey,
I am trying to rescore a MaxQuant result and get the following error:

Traceback (most recent call last): 
File "oktoberfest/grpc_alignment.py", line 31, in <module> disable_progress_bar = True) 
File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/predictPROSIT.py", line 133, in predict pred_object.predict(disable_progress_bar) 
File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/PredObject.py", line 116, in predict disable_progress_bar=disable_progress_bar) 
File "/root/.pyenv/versions/3.6.0/src/prosit-grpc/prosit_grpc/PredObject.py", line 106, in send_requests predictions = np.vstack(predictions) 
File "<__array_function__ internals>", line 6, in vstack 
File "/root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/numpy/core/shape_base.py", line 283, in vstack return _nx.concatenate(arrs, 0) 
File "<__array_function__ internals>", line 6, in concatenate 
ValueError: need at least one array to concatenate make: *** [calibration] Error 6

handling of sequences with more than 30 amino acids

The current code as defined in prosit/prosit/constants.py only supports sequences with up to 30 amino acids.

A total of 81194 sequences exist in the human fast file. Only 703 sequences are smaller than 30 amino acids in the human fast file. 80491 sequences are larger than 30 amino acids. Basically 99.13% of the sequences in humans are not covered with the current prosit code version.

The error handling could be improved by stating that "only sequences with up to 30 amino acids are supported in prosit" instead of the rather cryptic error message "index 30 is out of bounds for axis 1".

Maybe it is even possible to just increase the sequence length in constants.py to 34475 letters, which is the longest sequence in the human fasta file? Not sure how that influences predictive accuracy or if that is possible.

Basically the current error message is not very descriptive for practitioners.

Traceback (most recent call last):
  File "jump.py", line 26, in <module>
    tensor = tensorize.peptidelist(df)
  File "/root/prosit/tensorize.py", line 56, in peptidelist
    "sequence_integer": get_sequence_integer(df.modified_sequence),
  File "/root/prosit/tensorize.py", line 42, in get_sequence_integer
    array[i, j] = constants.ALPHABET[s]
IndexError: index 30 is out of bounds for axis 1 with size 30

Example sequences with 30 and 30 letters

NGGRDYALRGPEHPGSGGAPEPQGWIHFI
MNERREQLRAKRPEVPFPEITRMLGNEWS
DKRQNSSRFSASNNRELQKLPSLKGPPTL
MVKLTAELIEQAAQYTNAVRDRELDLRVS
MYVAESTRKTLLYHMEFSELTSRYIKIIN
MDFVAGAIGGVCGVAVGYPLDTVKGLLALP
MTGELEVKNMDMKPGSTLKITGSIADGTDG
MEPEFLYDLLQLPKGVEPPAEEELSKGVCP
MEDALFLRKSPPYIFSPIPFLGHAIAFGKS
MSDILRELLCVSEKAANIARACRQQEALFQ

Add initial build info

Hi,
some additional info could be added to the readme for initial setup of prosit,
it basically helps people that are not that familiar with make and docker.
Also make build did not work, probably has to elevate with sudo.

sudo make build

which results in:

nvidia-docker build -qf Dockerfile -t prosit .
sha256:d224c2ac898b32662b6265ae7a37dd1872dd98defe4cf86c6fd3acbe7f006c2e
prosit@champagne:~/prosit$

PTMs in spectral libraries

I'm interested in generating a theoretical spectral library for dimethyl labeled samples, heavy medium and light. I can't see how to do this, is there a way to create these?

Thanks!

Error in building model

Hello,

I am facing an error while running "make server MODEL_SPECTRA=/home/csbmm/chinmaya/prosit-master/Fragmentation_Model/prosit1/ MODEL_IRT=/home/csbmm/chinmaya/prosit-master/iRT_Model/" in Centos7 system. Please find the attached screenshot.

I can understand that, this error is coming from Makefile, where it is looking for a directory or file "/root/model_spectra/" which doesn't exists at all.

Does that mean, we have to create that path with a directory in the root or can we edit the Makefile with new path to build the model?

Can someone help with this?

-- Chinmaya

Error

Hi, I was trying to process some data and got this error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/context.py:23: UserWarning: The dask.set_options function has been deprecated. Please use dask.config.set instead warnings.warn("The dask.set_options function has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:326: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used head = reader(BytesIO(b_sample), **kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:64: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used df = reader(bio, **kwargs) WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/context.py:23: UserWarning: The dask.set_options function has been deprecated. Please use dask.config.set instead warnings.warn("The dask.set_options function has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:326: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used head = reader(BytesIO(b_sample), **kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/io/csv.py:64: ParserWarning: Both a converter and dtype were specified for column Reverse - only the converter will be used df = reader(bio, **kwargs) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/dataframe/multi.py:421: FutureWarning: 'scan_number' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version indicator=indicator) /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/base.py:835: UserWarning: The get= keyword has been deprecated. Please use the scheduler= keyword instead with the name of the desired scheduler like 'threads' or 'processes' warnings.warn("The get= keyword has been deprecated. " /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/dask/local.py:255: FutureWarning: 'scan_number' is both an index level and a column label. Defaulting to column, but this will raise an ambiguity error in a future version return func(*args2) WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Using TensorFlow backend. Traceback (most recent call last): File "oktoberfest/tensor.py", line 14, in MASS_ANALYZER, comp=None File "/root/.pyenv/versions/3.6.0/src/pwyll/pwyll/tensorize.py", line 267, in prepare t = annotated(df, nlosses=1, z=3) File "/root/.pyenv/versions/3.6.0/src/pwyll/pwyll/tensorize.py", line 199, in annotated "sequence_integer": get_sequence_integer(df.modified_sequence), File "/root/.pyenv/versions/3.6.0/src/pwyll/pwyll/tensorize.py", line 43, in get_sequence_integer array[i, j] = ALPHABET[s] KeyError: 'N(de)' make: *** [tensor] Error 5

Thanks,

Yishai

Predicted iRTs are concentrated within a very narrow range

I tried to use Prosit to predict iRTs. The reference proteomes I used contains Human, Ecoli and Yeast (Navarro dataset). The peptides I used for testing are of size ~5M, of length [7,30]. I use both Prosit and DeepRT(+) to predict the RT, with the default parameter setting.

However, I observed that the predicted iRT from Prosit is very concentrated to a very small range around 56. I am wondering if I was doing something wrong.

An error occured. Status code: 2

Hello!

Anyone can help me ? I can not understand what PROSIT is saying... I am using it directly on site and I obtain this:
"WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Using TensorFlow backend. /root/.pyenv/versions/3.6.0/lib/python3.6/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f) Traceback (most recent call last): File "oktoberfest/tensor_peptidelist.py", line 23, in tensor = prosit.tensorize.peptidelist(df) File "/root/.pyenv/versions/3.6.0/src/prosit/prosit/tensorize.py", line 55, in peptidelist "collision_energy_aligned_normed": get_numbers(df.collision_energy) / 100., File "/root/.pyenv/versions/3.6.0/src/prosit/prosit/tensorize.py", line 27, in get_numbers a = numpy.array(vals).astype(dtype) ValueError: could not convert string to float: '28;2' make: *** [tensor_peptidelist] Error 5"

Thanks a lot for any help !

iRT definition

Hi,

Thanks for your great work, but I have a question: is the definition of the iRT predicted by prosit the same as the paper referred by the reference "Using iRT, a normalized retention time for more targeted measurement of peptides"? (doi: 10.1002/pmic.201100463)

There is a significant difference.

setup.py not include the 'converters '

Dear @gessulat :

I found that after I had install the prosit via 'python setup.py build/install', I can not 'import prosit' because of the missing of converters in site-packages. Sorry, I do not know how to fix it...

Collision Energy Optimization

Dear Prosit team,

I have been trying to replicate a workflow for optimizing collision energies as suggested in this presentation:
https://skyline.ms/_webdav/home/software/Skyline/events/2019%20User%20Group%20Meeting%20at%20ASMS/%40files/Presentations/02-Schmidt.pdf

The main picture for this is:

The image suggests to pick a collision energy of 28.

When I try to replicate this for an example peptide of mine, I get e.g.

This might tempt me to pick an energy of 28.

When I measure empirically though, and adjust the visualization a bit, the situation looks different:

Whereas strongest fragments per spectrum seem similar, it is revealed, that the absolute fragment intensities also have a strong dependence on the NCE.

The fact that Prosit only returns relative intensities makes it hard to pick the best collision energy for my work. Do you think it would be possible for Prosit to return intensities on an arbitrary scale that is somewhat comparable between spectra?

prosit does not release memory

Hi,
after running a number of prosits and processing several fasta files I observed that a large number
of prosit docker instances still linger around. It comes to the point were other tools can not be started anymore, because there is no memory left. All the calculations are finished successfully, no errors reported, just the RAM is never cleaned up.

What would be a possible quick solution to this problem (without stopping docker)?

Thanks
Tobias

Peaks Missing in Resulting DLIBs and Intermediate File Size Differences

Greetings, I'm from Proteome Software (creators of Scaffold software) and we've deployed Prosit to AWS and built an onsite server to generate libraries. We've been doing testing and some benchmarking, and we noticed something we can't explain: resulting *.spectronaut CSV files and the subsequent DLIBs generated using EncyclopeDIA are about half the size (in GB) of those generated with the same FASTAs on the Kuster lab public Prosit server.

We looked into the resulting DLIBs and did some spot-checking, using EncyclopeDIA, and we noticed peaks in most of the spectra were missing in our onsite-generated liibraries. The total number of spectra, however, in a library generated from the Human Uniprot Reviewed FASTA, were the same for both libraries (onsite vs public Prosit server). We also noticed that the ratio of peaks missing seems to increase as we approached the end of the list (eg, spectrum 1 might be none or just one peak, eg, b1, but spectrum 1000000 might be missing several, eg, b1, b2, y3, y4, and y5, etc).

Thoughts? Our goal is to replicate the public server results, if possible, so any clues as to what's happening would be greatly appreciated.

We have deployed Prosit with the following software:

Ubuntu 18.04
Docker 19.03
Nvidia 440.36
CUDA 10.2
Current Prosit repo code (as of about a week ago)

We're using these models:

Prosit - Model - Fragmentation
Prosit - Model - iRT

Hardware:

Aorus X390 Aorus Pro Motherboard
Intel i9-9900K CPU
Crucial NVMe M.2 SSD (1 TB)
64 GB RAM (2 x 32 GB) Crucial non-ECC un-buffered DDR4 RAM
Zotac GeForce GTX 1080 Ti GPU, 11 GB RAM

Tensor("prediction_target:0", shape=(?, ?), dtype=float32) must be from the same graph as Tensor("prediction/BiasAdd:0", shape=(?, 1), dtype=float32)

Hi,

very interested in Prosit. Hope that you can help me.
Using arch:

zeth@master ~/P/prosit> pacman -Qs | grep nvidia
local/libnvidia-container-bin 1.0.2-1
local/libnvidia-container-tools-bin 1.0.2-1
local/nvidia 430.26-5
local/nvidia-container-runtime-bin 2.0.0+3.docker18.09.6-1
local/nvidia-container-runtime-hook-bin 1.4.0-1
local/nvidia-docker 2.0.3-4
local/nvidia-utils 430.26-1
local/opencl-nvidia 430.26-1
zeth@master ~/P/prosit> pacman -Qs | grep docker
local/docker 1:18.09.6-1
local/docker-compose 1.24.0-1
local/nvidia-container-runtime-bin 2.0.0+3.docker18.09.6-1
local/nvidia-docker 2.0.3-4
local/python-docker 4.0.2-1
local/python-docker-pycreds 0.4.0-1
    Python bindings for the docker credentials store API
local/python-dockerpty 0.4.1-4
    Python library to use the pseudo-tty of a docker container

I freshly cloned prosit and ran

zeth@master ~/P/prosit> make server MODEL=/root/model

Under /root/model is your shared model from figshare.

When curling to the server:

zeth@master ~/P/prosit> curl -F "peptides=@examples/peptidelist.csv" http://127.0.0.1:5000/predict/

I get an internal server error of:

[2019-06-22 15:53:24,901] ERROR in app: Exception on /predict/ [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2311, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1834, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1737, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 36, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1832, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1818, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/root/prosit/server.py", line 28, in predict
    result = prediction.predict(tensor, model, model_config)
  File "/root/prosit/prediction.py", line 14, in predict
    model.compile(optimizer="adam", loss="mse")
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 333, in compile
    sample_weight, mask)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training_utils.py", line 403, in weighted
    score_array = fn(y_true, y_pred)
  File "/usr/local/lib/python3.5/dist-packages/keras/losses.py", line 14, in mean_squared_error
    return K.mean(K.square(y_pred - y_true), axis=-1)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 848, in binary_op_wrapper
    with ops.name_scope(None, op_name, [x, y]) as name:
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5770, in __enter__
    g = _get_graph_from_inputs(self._values)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5428, in _get_graph_from_inputs
    _assert_same_graph(original_graph_element, graph_element)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5364, in _assert_same_graph
    original_item))
ValueError: Tensor("prediction_target:0", shape=(?, ?), dtype=float32) must be from the same graph as Tensor("prediction/BiasAdd:0", shape=(?, 1), dtype=float32).

Do you have any idea?

Thanks!

Installing Prosit

Hello,

We have been very happy with our tests using Prosit on the proteomicsDB website but we would like to be able to scale-up our analysis capabilities by using Prosit locally.

We set up a Windows server (Windows Server 2019) with an NVIDIA Quadro P400 GPU (CUDA enabled) and I am trying to set up Prosit on it but it is proving to be of some challenge to me so far. I have little experience with Docker and containers so I would really appreciate any guidance you can offer.

So far, I have installed the Native Windows Docker client, which seems to work nicely on Windows Server 2019 and I am currently in the process of installing nvidia-docker. I am not sure if I understand this correctly, but I believe I would be possible to install nvidia-docker with the Windows Docker client. Am I correct?

From the nvidia-docker wiki:

I hope this is not terribly silly, but by looking the dockerd documentation I am still completely lost on the next steps I should follow to install nvidia-docker.

It seems that it would be simpler if I would be able to install Windows Subsystem for Linux 2 (WSL 2) and then install docker and all the dependencies throgh the linux subsystem, but it seems that WSL 2 (the distro version allowing docker installations) is still not part of the standard Windows Server 2019 build and it would be required to install a windows insiders preview version.

I would really appreciate any guidance.

All the best,
Miguel

Error when creating a Prosit predicted library

What happened: I uploaded a CSV file generated using EncyclopeDIA to create a predicted library. I got the error shown in the image below.

What should have happened: I expected a predicted library to be exported in the .CSV format.

Steps to reproduce: I can provide the file if needed. Basically, try to generate a lib using the CSV file using the 2019 model and export as a CSV file.

Unsupported amino acids should ideally yield warnings, not crash the program

From a usability perspective, I would suggest that sequences with unsupported amino acids be skipped and warnings be issued. The current behavior simply has the program crash with a KeyError. The note on the main page about these amino acids is clear but having the user sanitize their input seems like much more effort than handling this through Prosit.

Here is my quick and dirty patch (just to get the tool running for me). Probably not the best way to handle this but it does rely on utils.peptide_parser() to not duplicate logic.

diff --git a/prosit/tensorize.py b/prosit/tensorize.py
index d4eefdb..d94127f 100644
--- a/prosit/tensorize.py
+++ b/prosit/tensorize.py
@@ -1,4 +1,5 @@
 import collections
+import logging
 import numpy as np

 from . import constants
@@ -45,7 +46,7 @@ def get_precursor_charge_onehot(charges):
     return array


-def get_sequence_integer(sequences):
+def get_sequence_integer(sequences):
     array = np.zeros([len(sequences), MAX_SEQUENCE], dtype=int)
     for i, sequence in enumerate(sequences):
         for j, s in enumerate(utils.peptide_parser(sequence)):
@@ -89,6 +90,14 @@ def csv(df):
     assert "modified_sequence" in df.columns
     assert "collision_energy" in df.columns
     assert "precursor_charge" in df.columns
+
+    n_all = df.shape[0]
+    df = df[df.modified_sequence.apply(is_supported_sequence)]
+    n_supported = df.shape[0]
+    if n_supported < n_all:
+        logging.warning(' {0} / {1} sequences were skipped as they '
+                        'contained unsupported amino acids'.format(n_all - n_supported, n_all))
+
     data = {
         "collision_energy_aligned_normed": get_numbers(df.collision_energy) / 100.0,
         "sequence_integer": get_sequence_integer(df.modified_sequence),
@@ -107,3 +116,12 @@ def csv(df):
     data["masses_pred"] = masses_pred

     return data
+
+
+def is_supported_sequence(seq: str) -> bool:
+    try:
+        _ = [ALPHABET[s] for s in utils.peptide_parser(seq)]
+        return True
+    except KeyError as e:
+        return False
+

PS Good work on developing Prosit! 😄

Predict without Cysteine Carbamidomethylation

Hey all,
just started using the local version of Prosit - very exciting tool, thank you!
Is there a way to switch off the fixed Carbamidomethylation modification of Cysteines? I would like to use Prosit for some samples in which Cysteine was not treated with iodoacetamide.

Best wishes,
Julian

PS: I also noticed the precursor m/zs do not seem to account for the Carbamidomethylation. This has already been reported for the online version of the tool, I made a comment in the respective issue (#13).

Protein inference - Protein FDR/PEP calculation

Dear developers,

I have been happy with the rescoring results from Prosit, starting from a MaxQuant msms.txt file. I get > 2K target hits at the PSM level after rescoring.

I was wondering about what would you recommend for protein inference or protein FDR calculation, starting from the identified features after re-scoring.

Right now I am just using unique peptides to get the identified proteins but I would like to be able to assign an FDR value at the protein level.

Many thanks in advance.
Miguel

Submitted 42 peptides at 12PM, Jun. 28, 2019. The task is still in progress.

I would like to ask if it is normal? Is there something wrong with my peptides?

Here is the peptides.
test.zip

Thanks,

Fengchao

Incorrect file extension in a Model iRT file on figshare

I do not know if this is the correct place for this but in the readme, there is a link to the figshare where the models are depositied. In the model_irt_prediction folder, the model.yaml file should be model.yml.

This is indicated in line 9 of the model.py script. When the server tries to call the model, it says the model.yml does not exist. This is fixed by simply changing the file extension of the model file to model.yml from model.yaml

prosit interactive mode works while server mode fails

Hello,
the current interactive prosit version works, while the prosit server version creates multiple errors when curling the peptidelist to the server. So I think many of the errors mentioned in the initial #7 #4 are actually flask errors. Or maybe prosit makefile or prosit code issues. Or maybe related to passing flask arguments.

Proof, prosit in interactive mode creates the predictions:

root@8e29fa711bd2:~# ls -l
total 24
drwxr-xr-x 2 root root 4096 May 30 18:27 data.hdf5
-rw-r--r-- 1 root root 1012 Jun 24 06:00 jump.py
drwx------ 2 1000 1000 4096 Jun 24 05:21 model
-rw-r--r-- 1 root root 2027 Jun 24 06:00 msms_prediction.csv
-rw-r--r-- 1 root root  114 Jun 24 06:00 peptidelist.csv
drwxr-xr-x 5 root root 4096 Jun 24 06:00 prosit
root@8e29fa711bd2:~# cat msms_prediction.csv
Intensities     Masses  Matches Modified Sequence       Charge
1.0;0.41259626;0.73062277;0.30424523;0.11640168;0.093716405;0.08098759;0.1923069;0.08597337;0.15234001;0.08258821;0.012585764;0.034173176;0.013465268;0.0030653037;0.0014310256;0.008246724;0.004191969;0.00061403663;0.0013039891;0.0022206986;0.0005943045  175.118952167;322.15434716699997;435.238411167;548.322475167;619.359589167;690.3967031669999;761.4338171669999;263.088246467;360.141010467;431.178124467;502.21523846699995;573.2523524669999;218.122843817;429.74692881699997;132.047761467;180.574143467;216.092700467;251.61125746699997;207.12471403366666;230.80375203366665;254.48279003366665;229.45032313366664  y1;y2;y3;y4;y5;y6;y7;b2;b3;b4;b5;b6;y3(2+);y8(2+);b2(2+);b3(2+);b4(2+);b5(2+);y5(3+);y6(3+);y7(3+);b7(3+)  MMPAAALIM(ox)R  3
0.031838626;0.05340379;0.0008666609;0.16591543;0.6199195;1.0;0.3755796;0.006878452;0.8553704;0.18602312;0.009526462;0.1253908;0.6669085;0.056322724;0.00025479423;0.0041989647     147.112804167;294.148199167;407.232263167;504.285027167;601.3377911670001;698.3905551670001;769.4276691670001;882.5117331670001;245.131825467;316.168939467;413.221703467;301.17253381700004;349.69891581700006;385.21747281700004;158.58810796699998;207.114489967   y1;y2;y3;y4;y5;y6;y7;y8;b2;b3;b4;y5(2+);y6(2+);y7(2+);b3(2+);b4(2+)        MLAPPPIM(ox)K   2
0.6478142;0.7623719;1.0;0.78580517;0.24467614;0.037683975;0.98502636;0.46840557;0.30539873;0.5721372;0.27308717;0.025923852;0.31577614;0.6946651;0.4260823;0.0029871135;0.016204849;0.0243927;0.0019134118;0.027638009;0.008317641;0.0010335244       175.118952167;419.20711116699994;516.259875167;613.312639167;710.3654031670001;132.047761467;288.148872467;359.185986467;472.27005046700003;585.354114467;698.4381784670001;811.5222424670001;258.633575817;307.159957817;355.68633981700003;180.096631467;236.63866346700001;293.180695467;349.722727467;59.044501700333335;237.45998536700003;44.68743813366666        y1;y3;y4;y5;y6;b1;b2;b3;b4;b5;b6;b7;y4(2+);y5(2+);y6(2+);b3(2+);b4(2+);b5(2+);b6(2+);y1(3+);y6(3+);b1(3+)  MRALLLIPPPPM(ox)R       6
root@8e29fa711bd2:~#

code

# Example for manual start of prosit
# runfile for sudo make jump 
# sudo make jump MODEL=/home/xxx/prosit/prosit1

# once inside container
# cp prosit/peptidelist.csv .
# cp prosit/jump.py .
# python jump.py


from prosit import constants
from prosit import maxquant
from prosit import alignment
from prosit import prediction
from prosit import tensorize
import pandas

from prosit import model as model_lib
model_dir = constants.MODEL_DIR
global model
global model_config
model, model_config = model_lib.load(model_dir, trained=True)

# read peptides
df = pandas.read_csv("peptidelist.csv")
tensor = tensorize.peptidelist(df)
result = prediction.predict(tensor, model, model_config)
df_pred = maxquant.convert_prediction(result)

# write files to prediction.csv
# path = "{}prediction.csv".format(model_dir)
path="msms_prediction.csv"
maxquant.write(df_pred, path)

# copy file "msms_prediction.csv" out of container
# find docker ID: sudo docker ps
# sudo docker cp cbfc30fce3f1:/root/msms_prediction.csv .

when running

sudo make server MODEL=/home/xxx/prosit/prosit1

and running in another instance

curl -F "peptides=@examples/peptidelist.csv" http://127.0.0.1:5000/predict/

the following error occurs

* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
[2019-06-24 06:20:20,889] ERROR in app: Exception on /predict/ [POST]
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2311, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1834, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1737, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 36, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1832, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1818, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/root/prosit/server.py", line 28, in predict
    result = prediction.predict(tensor, model, model_config)
  File "/root/prosit/prediction.py", line 14, in predict
    model.compile(optimizer="adam", loss="mse")
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 333, in compile
    sample_weight, mask)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training_utils.py", line 403, in weighted
    score_array = fn(y_true, y_pred)
  File "/usr/local/lib/python3.5/dist-packages/keras/losses.py", line 14, in mean_squared_error
    return K.mean(K.square(y_pred - y_true), axis=-1)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 848, in binary_op_wrapper
    with ops.name_scope(None, op_name, [x, y]) as name:
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5770, in __enter__
    g = _get_graph_from_inputs(self._values)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5428, in _get_graph_from_inputs
    _assert_same_graph(original_graph_element, graph_element)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 5364, in _assert_same_graph
    original_item))
ValueError: Tensor("out_target:0", shape=(?, ?), dtype=float32) must be from the same graph as Tensor("out/Reshape:0", shape=(?, ?), dtype=float32).
172.17.0.1 - - [24/Jun/2019 06:20:20] "POST /predict/ HTTP/1.1" 500 -

There is also a keras yaml error and maybe flask can not handle it, or prosit does not contain the appropriate code to ignore this warning?

/usr/local/lib/python3.5/dist-packages/keras/engine/saving.py:349: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(yaml_string)
 * Serving Flask app "server" (lazy loading)

test

test123

msp libraries building method

Dear @gessulat,

I found the data of Prosit is everywhere ^_^. The spectra library files *.msp are deposited in the Sync + Share.

But I can not find the description of the process to build the library. In addition, Can msp libraries derive the hdf5 training dataset deposited in figshare?

Thanks.

Issues with the web server for predictions?

Hi,

I submitted a small CSV file (~20 000 precursors) for prediction of MS/MS and RT to the Prosit web server. The job is somehow still in process though it has been > 36 hours since submission.

I just wonder if there are any issues with the server right now? I have never experienced these waiting times before, even if I have submitted csv files with > 1 000 000 precursors for prediction.

Best,

Marc

R version

Hello, is there any plan to develop an R version?

kusterlab / prosit Goto Github PK

prosit's Introduction

Prosit

Hardware

Installation

Model

Usage

Example

Using Prosit on your data

Pseudo-code

prosit's People

Contributors

Stargazers

Watchers

Forkers

prosit's Issues

Recommend Projects

Recommend Topics

Recommend Org