Code Monkey home page Code Monkey logo

depthcharge's Introduction

depthcharge logo depthcharge logo

Depthcharge is a deep learning toolkit for building Transformer models to analyze mass spectrometry data.

About

Many deep learning tools have been developed for the analysis of mass spectra or mass spectrometry analytes, like peptides and small molecules. However, each one has had to reinvent the wheel.

Depthcharge aims to provide a flexible, but opinionated, framework for rapidly prototyping deep learning models for mass spectrometry data. Think of Depthcharge as a set of building blocks to get you started on a new deep learning project focused around mass spectrometry data. Depthcharge delivers these building blocks in the form of PyTorch modules, which can be readily used to assemble customized deep learning models for your task.

To learn more, visit our documentation.

depthcharge's People

Contributors

alfred-n avatar bercestedincer avatar bittremieux avatar justin-a-sanders avatar melihyilmaz avatar wfondrie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

depthcharge's Issues

Error in reversing peptides with PeptideTokenizer.detokenize

Hi all,

Detokenizing peptides with mods e.g. PEP+79.996 with let's say reversed tokenized tensor [1,2,3] is not returning the expected behavior. In the current implementation here https://github.com/wfondrie/depthcharge/blob/main/depthcharge/tokenizers/peptides.py#L201
The tokens [1,2,3] get first detokenized and joined to P+79.996EP before getting reversed to PEP69.97+P which is then returned. I think they should be first detokenized without joining (i.e. join=False in detokenize()), then reversed, finally joined.

Best,
Daniela

Problem with AnnotatedSpectrumDataset and n_workers>0

Hi there,
I started to use depthcharge and I really like it :)
However, I have a bit of a problem with the dataset AnnotatedSpectrumDataset from the latest depthcharge version. I am testing it with the function from https://github.com/wfondrie/depthcharge/blob/main/tests/unit_tests/test_data/test_loaders.py#L47
That works well. However, whenever I increase the number of workers it gets stuck and the code never terminates.
Have you encountered problems like this? Do you have any idea why this may happen?
I am using Python 3.10.12 on a Linux machine. Here are some of my package versions:

torch==2.1.0
pytorch-lightning==1.9.5
pylance==0.8.16
pyteomics==4.6.3

Thanks a lot in advance!

Example code to read .mgf file

Hi, I would like to know how to read and preprocess a .mgf file using the package. Can you please help me by providing an example code for that, which can then be used to pass on other package functions such as Encoder and Transformer? Thank You

Existing index doesn't know whether it's annotated

When trying to re-use an existing HDF5 index, I get the following error in hdf5.py on line 80:

AttributeError: 'AnnotatedSpectrumIndex' object has no attribute 'annotated'

Looking at the code, I don't know when the annotated attribute should be set. In fact, _handle is set to None in line 63, so I'm not fully understanding how this piece of code is supposed to work.

SpectrumDataset: is a IterableDataset,How to use shuffle in Dataloader?

When i run these codes

import torch
import pandas as pd
import polars as pl
import depthcharge as dc
import natsort
import re
import numpy as np 
import pyarrow as pa
from pyarrow import int32
from depthcharge.data import SpectrumDataset
from torch.utils.data import DataLoader
from depthcharge.transformers import SpectrumTransformerEncoder
from depthcharge.encoders import FloatEncoder
from depthcharge.data import CustomField

mzml_file = ["20190118_Q2_MD_ColQ2-51_AlexanderBull_P15_Fluide_4microscans.mgf"]

parse_kwargs = {
    "progress": False,
    "preprocessing_fn": [
        dc.data.preprocessing.set_mz_range(min_mz=0),
        dc.data.preprocessing.filter_intensity(max_num_peaks=200),
        dc.data.preprocessing.scale_intensity(scaling="root"),
        dc.data.preprocessing.scale_to_unit_norm,
    ],
    "custom_fields": [
      # CustomField("Seq", lambda x: x["params"]["seq"], pa.string()),
        CustomField("RT", lambda x: x["params"]["rtinseconds"], pa.float64()),
        CustomField("charge", lambda x: x['params']['charge'], pa.list_(int32())),
    ]
}

dataset = SpectrumDataset(mzml_file, batch_size=8, parse_kwargs=parse_kwargs)


from torch.utils.data import DataLoader

loader = DataLoader(dataset, batch_size=None,sampler=True)

for batch in loader:
    print(batch)

I get a error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[26], line 38
     33 dataset = SpectrumDataset(mzml_file, batch_size=8, parse_kwargs=parse_kwargs)
     36 from torch.utils.data import DataLoader
---> 38 loader = DataLoader(dataset, batch_size=None,shuffle=True)
     40 for batch in loader:
     41     print(batch)

File ~/miniconda3/envs/ttt/lib/python3.11/site-packages/torch/utils/data/dataloader.py:313, in DataLoader.__init__(self, dataset, batch_size, shuffle, sampler, batch_sampler, num_workers, collate_fn, pin_memory, drop_last, timeout, worker_init_fn, multiprocessing_context, generator, prefetch_factor, persistent_workers, pin_memory_device)
--> 308     raise ValueError(
    309         f"DataLoader with IterableDataset: expected unspecified shuffle option, but got shuffle={shuffle}")
    311 if sampler is not None:
    312     # See NOTE [ Custom Samplers and IterableDataset ]
    313     raise ValueError(
    314         f"DataLoader with IterableDataset: expected unspecified sampler option, but got sampler={sampler}")
    315 elif batch_sampler is not None:
    316     # See NOTE [ Custom Samplers and IterableDataset ]
    317     raise ValueError(
    318         "DataLoader with IterableDataset: expected unspecified "
    319         f"batch_sampler option, but got batch_sampler={batch_sampler}")

ValueError: DataLoader with IterableDataset: expected unspecified sampler option, but got shuffle=True

How to use shuffle in Dataloader?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.