mannlabs / alphatims Goto Github PK

An open-source Python package for efficient accession and visualization of Bruker TimsTOF raw data from the Mann Labs at the Max Planck Institute of Biochemistry.

Home Page: https://doi.org/10.1016/j.mcpro.2021.100149

License: Apache License 2.0

Python 96.65% Batchfile 0.51% Inno Setup 0.69% Shell 1.90% Dockerfile 0.25%

mass-spectrometry dda dia ms lc-tims-msms tof tims python cli gui

alphatims's People

Contributors

Stargazers

Watchers

alphatims's Issues

Error during file import

Describe the bug
I get the Type Error pasted below when loading an older diaPASEF file.

To Reproduce

import alphatims.utils
import alphatims.bruker
import alphatims.plotting
import importlib

bruker_d_folder_name = "20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d"
data = alphatims.bruker.TimsTOF(bruker_d_folder_name)

Temporary download link: datashare

Expected behavior
Successful file import.

Logs

2021-05-28 20:50:55> Importing data from 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d
2021-05-28 20:50:55> Reading frame metadata for 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d
2021-05-28 20:50:57> Reading 25,301 frames with 241,304,631 detector strikes for 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d
100%|███████████████████████████████████████████████████████████████████████████| 25301/25301 [01:41<00:00, 248.12it/s]
2021-05-28 20:52:39> Indexing 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d...
2021-05-28 20:52:39> Opening handle for 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d
2021-05-28 20:52:39> Fetching mobility values from 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d
2021-05-28 20:52:39> Closing handle for 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d
2021-05-28 20:52:39> Opening handle for 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d
2021-05-28 20:52:39> Fetching mz values from 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d
2021-05-28 20:52:40> Closing handle for 20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-98bb86a635c5> in <module>
      1 # reload()
      2 bruker_d_folder_name = "20190319_tims01_FlMe_SA_BSA_45min_diaPASEF_C1_01_5714.d"
----> 3 data = alphatims.bruker.TimsTOF(bruker_d_folder_name)

d:\jupyter\alphatims\alphatims\alphatims\bruker.py in __init__(self, bruker_d_folder_name, mz_estimation_from_frame, mobility_estimation_from_frame, slice_as_dataframe)
    893         logging.info(f"Importing data from {bruker_d_folder_name}")
    894         if bruker_d_folder_name.endswith(".d"):
--> 895             self._import_data_from_d_folder(
    896                 bruker_d_folder_name,
    897                 mz_estimation_from_frame,

d:\jupyter\alphatims\alphatims\alphatims\bruker.py in _import_data_from_d_folder(self, bruker_d_folder_name, mz_estimation_from_frame, mobility_estimation_from_frame)
   1015                 tof_intercept + tof_slope * np.arange(self.tof_max_index)
   1016             )**2
-> 1017         self._parse_quad_indptr()
   1018         self._intensity_min_value = int(np.min(self.intensity_values))
   1019         self._intensity_max_value = int(np.max(self.intensity_values))

d:\jupyter\alphatims\alphatims\alphatims\bruker.py in _parse_quad_indptr(self)
   1667             ]
   1668         )
-> 1669         self._precursor_max_index = int(np.max(self.precursor_indices)) + 1
   1670 
   1671     def index_precursors(

<__array_function__ internals> in amax(*args, **kwargs)

~\AppData\Local\conda\conda\envs\alphatims\lib\site-packages\numpy\core\fromnumeric.py in amax(a, axis, out, keepdims, initial, where)
   2731     5
   2732     """
-> 2733     return _wrapreduction(a, np.maximum, 'max', axis, None, out,
   2734                           keepdims=keepdims, initial=initial, where=where)
   2735 

~\AppData\Local\conda\conda\envs\alphatims\lib\site-packages\numpy\core\fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     85                 return reduction(axis=axis, out=out, **passkwargs)
     86 
---> 87     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     88 
     89 

TypeError: '>=' not supported between instances of 'int' and 'NoneType'

Version (please complete the following information):

Installation Type: Developer

2021-05-28 20:45:27> Platform information:
2021-05-28 20:45:27> system - Windows
2021-05-28 20:45:27> release - 10
2021-05-28 20:45:27> version - 10.0.19041
2021-05-28 20:45:27> machine - AMD64
2021-05-28 20:45:27> processor - Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
2021-05-28 20:45:27> cpu count - 8
2021-05-28 20:45:27> cpu frequency - 2901.00 Mhz
2021-05-28 20:45:27> ram - 25.9/31.8 Gb (available/total)

2021-05-28 20:45:27> Python information:
2021-05-28 20:45:27> alphatims - 0.2.7
2021-05-28 20:45:27> bokeh - 2.2.3
2021-05-28 20:45:27> click - 7.1.2
2021-05-28 20:45:27> datashader - 0.12.1
2021-05-28 20:45:27> h5py - 3.2.1
2021-05-28 20:45:27> hvplot - 0.7.1
2021-05-28 20:45:27> numba - 0.53.0
2021-05-28 20:45:27> pandas - 1.2.3
2021-05-28 20:45:27> psutil - 5.8.0
2021-05-28 20:45:27> python - 3.8.8
2021-05-28 20:45:27> python-lzf - 0.2.4
2021-05-28 20:45:27> pyzstd - 0.14.3
2021-05-28 20:45:27> selenium - 3.141.0
2021-05-28 20:45:27> tqdm - 4.59.0

Calibration

Calibration of mobility and mz over the gradient is not performed yet. This can be done with e.g. lockmasses such as 622.028960. Moreover, initial mapping of indices to mz and mobility values sometimes has a huge error

Frames dtype

When reading the frames from sql data. A dataframe with dtype=object is returned, making it impossible to save as hdf

Slice simplification

Exactly 4 slices are expected to slice the data. This should be simplified to allows 0-4 slices/lists/tuples, to draw e.g. multiple XICs

loading data that were collected without mobility acquisition

Describe the bug
I installed this python package with Pip in a windows machine. I am able to reproduce the data access and analysis following "tutorial.ipynb". I can also access my own ESI-TIMS-Q-TOF data. But we also have a need to process data without mobility acquisition (ESI-Q-TOF). A quick try using alphatims.bruker.TimsTOF() doesn't work, I have attached the error message below. Is there a solution to access this type of data using alphatims? Or could you recommend me with an alternative python tool? Many thanks!

Logs

file1 = 'C:\\Users\\Lab Admin\\Desktop\\1_2 LG mix_5uM_1.d'
D = alphatims.bruker.TimsTOF(file1)

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
File C:\Program Files\Python39\lib\site-packages\pandas\io\sql.py:2020, in SQLiteDatabase.execute(self, *args, **kwargs)
   2019 try:
-> 2020     cur.execute(*args, **kwargs)
   2021     return cur

OperationalError: no such table: GlobalMetaData

The above exception was the direct cause of the following exception:

DatabaseError                             Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 D = alphatims.bruker.TimsTOF(file1)

File ~\AppData\Roaming\Python\Python39\site-packages\alphatims\bruker.py:985, in TimsTOF.__init__(self, bruker_d_folder_name, mz_estimation_from_frame, mobility_estimation_from_frame, slice_as_dataframe, use_calibrated_mz_values_as_default, use_hdf_if_available, mmap_detector_events)
    973 if mmap_detector_events and hdf_file_exists:
    974     raise IOError(
    975         f"Can only use mmap from .hdf files. "
    976         f"Since {bruker_hdf_file_name} already exists, "
   (...)
    983         "or remove the existing .hdf file."
    984     )
--> 985 self._import_data_from_d_folder(
    986     bruker_d_folder_name,
    987     mz_estimation_from_frame,
    988     mobility_estimation_from_frame,
    989 )
    990 if mmap_detector_events:
    991     self._import_data_from_hdf_file(
    992         bruker_d_folder_name,
    993         mmap_detector_events,
    994     )

File ~\AppData\Roaming\Python\Python39\site-packages\alphatims\bruker.py:1043, in TimsTOF._import_data_from_d_folder(self, bruker_d_folder_name, mz_estimation_from_frame, mobility_estimation_from_frame)
   1035 self._version = alphatims.__version__
   1036 self._zeroth_frame = True
   1037 (
   1038     self._acquisition_mode,
   1039     global_meta_data,
   1040     self._frames,
   1041     self._fragment_frames,
   1042     self._precursors,
-> 1043 ) = read_bruker_sql(bruker_d_folder_name, self._zeroth_frame)
   1044 self._meta_data = dict(
   1045     zip(global_meta_data.Key, global_meta_data.Value)
   1046 )
   1047 (
   1048     self._push_indptr,
   1049     self._tof_indices,
   (...)
   1055     int(self._meta_data["MaxNumPeaksPerScan"]),
   1056 )

File ~\AppData\Roaming\Python\Python39\site-packages\alphatims\bruker.py:181, in read_bruker_sql(bruker_d_folder_name, add_zeroth_frame, drop_polarity)
    177 logging.info(f"Reading frame metadata for {bruker_d_folder_name}")
    178 with sqlite3.connect(
    179     os.path.join(bruker_d_folder_name, "analysis.tdf")
    180 ) as sql_database_connection:
--> 181     global_meta_data = pd.read_sql_query(
    182         "SELECT * from GlobalMetaData",
    183         sql_database_connection
    184     )
    185     frames = pd.read_sql_query(
    186         "SELECT * FROM Frames",
    187         sql_database_connection
    188     )
    189     if 9 in frames.MsMsType.values:

File C:\Program Files\Python39\lib\site-packages\pandas\io\sql.py:399, in read_sql_query(sql, con, index_col, coerce_float, params, parse_dates, chunksize, dtype)
    341 """
    342 Read SQL query into a DataFrame.
    343 
   (...)
    396 parameter will be converted to UTC.
    397 """
    398 pandas_sql = pandasSQL_builder(con)
--> 399 return pandas_sql.read_query(
    400     sql,
    401     index_col=index_col,
    402     params=params,
    403     coerce_float=coerce_float,
    404     parse_dates=parse_dates,
    405     chunksize=chunksize,
    406     dtype=dtype,
    407 )

File C:\Program Files\Python39\lib\site-packages\pandas\io\sql.py:2080, in SQLiteDatabase.read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize, dtype)
   2068 def read_query(
   2069     self,
   2070     sql,
   (...)
   2076     dtype: DtypeArg | None = None,
   2077 ):
   2079     args = _convert_params(sql, params)
-> 2080     cursor = self.execute(*args)
   2081     columns = [col_desc[0] for col_desc in cursor.description]
   2083     if chunksize is not None:

File C:\Program Files\Python39\lib\site-packages\pandas\io\sql.py:2032, in SQLiteDatabase.execute(self, *args, **kwargs)
   2029     raise ex from inner_exc
   2031 ex = DatabaseError(f"Execution failed on sql '{args[0]}': {exc}")
-> 2032 raise ex from exc

DatabaseError: Execution failed on sql 'SELECT * from GlobalMetaData': no such table: GlobalMetaData

Version (please complete the following information):

Installation Type - Pip
- Platform information
  - system - windows
  - release - Windows 11 Pro
  - version - 21H2
  - machine - x86_64
  - processor - Interl i7-9700
  - cpu count - 8
- Python information:
  - alphatims version - 0.3.1
  - numpy - 1.22.5
  - numba - 0.55.1
  - pandas - 1.4.1

Error on loading the data

Describe the bug
Following the nbs/tutorial.ipynb, pointing to my data. On loading the data, a warning is thrown:
WARNING: AlphaTims version none was used to initialize ..., while the current version of AlphaTims is 0.3.0.

Followed by this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
./alphatims/alphatims/bruker.py in convert_slice_key_to_int_array(data, key, dimension)
   2543     try:
-> 2544         iter(key)
   2545     except TypeError:

TypeError: 'slice' object is not iterable

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_547/2072517597.py in <module>
      1 reload()
      2 bruker_d_folder_name = '.../bruker_data.d/'
----> 3 data = alphatims.bruker.TimsTOF(bruker_d_folder_name)

./alphatims/alphatims/bruker.py in __init__(self, bruker_d_folder_name, mz_estimation_from_frame, mobility_estimation_from_frame, slice_as_dataframe, use_calibrated_mz_values_as_default)
    956         )
    957         # Precompile
--> 958         self[0, "raw"]
    959         logging.info(f"Succesfully imported data from {bruker_d_folder_name}")
    960 

./alphatims/alphatims/bruker.py in __getitem__(self, keys)
   1426         else:
   1427             as_dataframe = self.slice_as_dataframe
-> 1428         parsed_keys = parse_keys(self, keys)
   1429         raw_indices = filter_indices(
   1430             frame_slices=parsed_keys["frame_indices"],

./alphatims/alphatims/bruker.py in parse_keys(data, keys)
   2436             dimension_slices[
   2437                 dimension
-> 2438             ] = convert_slice_key_to_int_array(
   2439                 data,
   2440                 keys[i] if (i < len(keys)) else slice(None),

./alphatims/alphatims/bruker.py in convert_slice_key_to_int_array(data, key, dimension)
   2559                 if not isinstance(start, (np.inexact, float)):
   2560                     raise ValueError
-> 2561                 start = data.convert_to_indices(
   2562                     start,
   2563                     return_type=dimension

./alphatims/alphatims/bruker.py in convert_to_indices(self, values, return_frame_indices, return_scan_indices, return_tof_indices, side, return_type)
   1389             return np.searchsorted(self.rt_values, values, side)
   1390         elif return_type == "scan_indices":
-> 1391             return self.scan_max_index - np.searchsorted(
   1392                 self.mobility_values[::-1],
   1393                 values,

./alphatims/alphatims/bruker.py in scan_max_index(self)
    763     def scan_max_index(self):
    764         """: int : The maximum scan index."""
--> 765         return self._scan_max_index
    766 
    767     @property

AttributeError: 'TimsTOF' object has no attribute '_scan_max_index'

To Reproduce
Steps to reproduce the behavior:

Created clean conda env conda env create -n alphatims -f misc/conda_development_environment.yaml
Opened the tutorial notebook file in the new env, using nb_conda_kernels
Tried to point the notebook to my data

Version (please complete the following information):
2021-10-25 09:21:27> Platform information:
2021-10-25 09:21:27> system - Linux
2021-10-25 09:21:27> release - 5.10.16.3-microsoft-standard-WSL2
2021-10-25 09:21:27> version - #1 SMP Fri Apr 2 22:23:49 UTC 2021
2021-10-25 09:21:27> machine - x86_64
2021-10-25 09:21:27> processor - x86_64
2021-10-25 09:21:27> cpu count - 8
2021-10-25 09:21:27> cpu frequency - 2803.21 Mhz
2021-10-25 09:21:27> ram - 24.2/24.8 Gb (available/total)
2021-10-25 09:21:27>
2021-10-25 09:21:27> Python information:
2021-10-25 09:21:27> alphatims - 0.3.0
2021-10-25 09:21:27> bokeh - 2.2.3
2021-10-25 09:21:27> click - 8.0.3
2021-10-25 09:21:27> datashader - 0.12.1
2021-10-25 09:21:27> h5py - 3.5.0
2021-10-25 09:21:27> hvplot - 0.7.1
2021-10-25 09:21:27> numba - 0.54.1
2021-10-25 09:21:27> pandas - 1.3.4
2021-10-25 09:21:27> psutil - 5.8.0
2021-10-25 09:21:27> python - 3.8.12
2021-10-25 09:21:27> python-lzf - 0.2.4
2021-10-25 09:21:27> pyzstd - 0.15.0
2021-10-25 09:21:27> selenium - 3.141.0
2021-10-25 09:21:27> tqdm - 4.62.3
2021-10-25 09:21:27>

Allow selection of columns returned when slicing with the dictionary method

Is your feature request related to a problem? Please describe.
I like the semantics of using a dictionary to slice the data:

    wide_ms1_points_df = raw_data[
        {
            "rt_values": slice(float(precursor_cuboid_d['wide_ms1_rt_lower']), float(precursor_cuboid_d['wide_ms1_rt_upper'])),
            "mz_values": slice(float(precursor_cuboid_d['wide_mz_lower']), float(precursor_cuboid_d['wide_mz_upper'])),
            "scan_indices": slice(int(precursor_cuboid_d['wide_scan_lower']), int(precursor_cuboid_d['wide_scan_upper'])),
            "precursor_indices": 0,
        }
    ]

I might be missing it but I haven't seen a way to also choose the columns returned in the dataframe with this method, so the dataframe is:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4564742 entries, 0 to 4564741
Data columns (total 13 columns):
 #   Column               Dtype  
---  ------               -----  
 0   raw_indices          int64  
 1   frame_indices        int64  
 2   scan_indices         int64  
 3   precursor_indices    int64  
 4   push_indices         int64  
 5   tof_indices          uint32 
 6   rt_values            float64
 7   rt_values_min        float64
 8   mobility_values      float64
 9   quad_low_mz_values   float64
 10  quad_high_mz_values  float64
 11  mz_values            float64
 12  intensity_values     uint16 
dtypes: float64(6), int64(5), uint16(1), uint32(1)
memory usage: 409.2 MB

Describe the solution you would like
Something like this could be considered:

    wide_ms1_points_df = raw_data[
        {
            "rt_values": slice(float(precursor_cuboid_d['wide_ms1_rt_lower']), float(precursor_cuboid_d['wide_ms1_rt_upper'])),
            "mz_values": slice(float(precursor_cuboid_d['wide_mz_lower']), float(precursor_cuboid_d['wide_mz_upper'])),
            "scan_indices": slice(int(precursor_cuboid_d['wide_scan_lower']), int(precursor_cuboid_d['wide_scan_upper'])),
            "precursor_indices": 0,
            "columns": ['frame_indices','scan_indices','rt_values','mz_values','intensity_values']
        }
    ]

Allowing the choice of column type would be useful as well:

"dtypes": [np.uint16, np.uint16, np.float32, np.float64, np.uint16]

Describe alternatives you've considered
Dropping unwanted columns and downcasting the column types works fine. I think this idea would reduce the compute effort though.

Additional context
Add any other context or screenshots about the feature request here.

INTERFACE_PARAMETERS definition is missing in alphatims.utils (standalone installer 0.2.4)

Describe the bug
Alphatims fails to start with the error AttributeError: module 'alphatims.utils' has no attribute 'INTERFACE_PARAMETERS'

To Reproduce
Steps to reproduce the behavior:

Install v 0.2.4 through GUI installer on windows
Start the software

Logs

Traceback (most recent call last):
File "alphatims_pyinstaller.py", line 4, in
AttributeError: module 'alphatims.utils' has no attribute 'INTERFACE_PARAMETERS'
[15952] Failed to execute script alphatims_pyinstaller

Version (please complete the following information):

Installation Type One-Click Installer
If no log is available, provide the following:
- Platform information
  - system Windows
  - release 20H2
  - version 10.0.19042.928
  - machine x86_64
- Python information:
  - alphatims version 0.2.4

Plotting

Plotting should be extended to:

Variable axis (currently only mz vs. mobility)
1D plotting: XICs, mobilograms and spectra
Perhaps even multiple plots?
Datashader option to disable when zommed in sufficiently?
Tagging/selection of datapoints?
Include rt/mobility/mz values for rangesliders

[Question] Centroiding of TimsTOF class objects

Very useful software you have created! For the most part I have become proficient at using alphatims to accomplish various tasks, but I have been struggling to figure out how I can centroid my data.

There is the function alphatims.bruker.centroid_spectra(), but from what I can tell the purpose of this is to centroids fragment ions. I could be wrong about this, it isn't very clear how to properly use it. A required argument is a spectrum_indptr array (which can be obtained using alphatims.bruker.TimsTOF.index_precursors() ), but it is unclear exactly what this array is describing. It is also not very clear what the difference between an index and an index_pntr is. Also spectrum_counts array is required, but I do not know how to obtain this.

My goal is to perform centroiding on all MS1 and MS2 level data before doing any additional slicing/processing. It would be very useful if the TimsTOF object itself could be centroided upon import so that the data contained in all the TimsTOF class attributes/methods reflect centroided values.

Any suggestions or advice would be greatly appreciated.

Dependabot couldn't find a Pipfile for this project

Dependabot couldn't find a Pipfile for this project.

Dependabot requires a Pipfile to evaluate your project's current Python dependencies. It had expected to find one at the path: /requirements/requirements.txt/Pipfile.

If this isn't a Python project, or if it is a library, you may wish to disable updates for it from within Dependabot.

View the update logs.

Unable to read data folder on some systems

I'm getting a strange error on a new system where I'm not completely in control of the OS (it's some flavor of containerized Debian). This is happening on alphatims 1.0.6 and 1.0.5.

Any ideas on troubleshooting this?

Alphatims installs without problems via pip, including the Bruker DLL:

from alphatims.bruker import BRUKER_DLL_FILE_NAME
!ls -l $BRUKER_DLL_FILE_NAME

-rw-r--r-- 1 root root 18668200 Jan  4 05:07 /usr/local/lib/python3.10/site-packages/alphatims/ext/timsdata.so

Then it starts reading the files and tqdm recognizes the correct number of frames, but then crashes:

from alphatims.bruker import TimsTOF
data = TimsTOF("/lipidomics.d")

100%|##########| 11106/11106 [00:13<00:00, 823.93it/s]
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
File /usr/local/lib/python3.10/site-packages/alphatims/bruker.py:131, in open_bruker_d_folder(bruker_d_folder_name, bruker_dll_file_name)
    130 if isinstance(bruker_dll_file_name, str):
--> 131     bruker_dll = init_bruker_dll(bruker_dll_file_name)
    132 logging.info(f"Opening handle for {bruker_d_folder_name}")
File /usr/local/lib/python3.10/site-packages/alphatims/bruker.py:67, in init_bruker_dll(bruker_dll_file_name)
     66 import ctypes
---> 67 bruker_dll = ctypes.cdll.LoadLibrary(
     68     os.path.realpath(bruker_dll_file_name)
     69 )
     70 bruker_dll.tims_open.argtypes = [ctypes.c_char_p, ctypes.c_uint32]
File /usr/local/lib/python3.10/ctypes/__init__.py:452, in LibraryLoader.LoadLibrary(self, name)
    451 def LoadLibrary(self, name):
--> 452     return self._dlltype(name)
File /usr/local/lib/python3.10/ctypes/__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    373 if handle is None:
--> 374     self._handle = _dlopen(self._name, mode)
    375 else:
OSError: libgomp.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
UnboundLocalError                         Traceback (most recent call last)
Cell In[6], line 2
      1 from alphatims.bruker import TimsTOF
----> 2 data = TimsTOF("/lipidomics.d")
File /usr/local/lib/python3.10/site-packages/alphatims/bruker.py:1016, in TimsTOF.__init__(self, bruker_d_folder_name, mz_estimation_from_frame, mobility_estimation_from_frame, slice_as_dataframe, use_calibrated_mz_values_as_default, use_hdf_if_available, mmap_detector_events, drop_polarity, convert_polarity_to_int)
   1012     else:
   1013         self.bruker_d_folder_name = os.path.abspath(
   1014             bruker_d_folder_name
   1015         )
-> 1016         self._import_data_from_d_folder(
   1017             bruker_d_folder_name,
   1018             mz_estimation_from_frame,
   1019             mobility_estimation_from_frame,
   1020             drop_polarity,
   1021             convert_polarity_to_int,
   1022             mmap_detector_events,
   1023         )
   1024 elif bruker_d_folder_name.endswith(".hdf"):
   1025     self._import_data_from_hdf_file(
   1026         bruker_d_folder_name,
   1027         mmap_detector_events,
   1028     )
File /usr/local/lib/python3.10/site-packages/alphatims/bruker.py:1114, in TimsTOF._import_data_from_d_folder(self, bruker_d_folder_name, mz_estimation_from_frame, mobility_estimation_from_frame, drop_polarity, convert_polarity_to_int, mmap_detector_events)
   1112 if (mobility_estimation_from_frame != 0) and bruker_dll_available:
   1113     import ctypes
-> 1114     with alphatims.bruker.open_bruker_d_folder(
   1115         bruker_d_folder_name
   1116     ) as (bruker_dll, bruker_d_folder_handle):
   1117         logging.info(
   1118             f"Fetching mobility values from {bruker_d_folder_name}"
   1119         )
   1120         indices = np.arange(self.scan_max_index).astype(np.float64)
File /usr/local/lib/python3.10/contextlib.py:135, in _GeneratorContextManager.__enter__(self)
    133 del self.args, self.kwds, self.func
    134 try:
--> 135     return next(self.gen)
    136 except StopIteration:
    137     raise RuntimeError("generator didn't yield") from None
File /usr/local/lib/python3.10/site-packages/alphatims/bruker.py:140, in open_bruker_d_folder(bruker_d_folder_name, bruker_dll_file_name)
    138 finally:
    139     logging.info(f"Closing handle for {bruker_d_folder_name}")
--> 140     bruker_dll.tims_close(bruker_d_folder_handle)
UnboundLocalError: local variable 'bruker_dll' referenced before assignment

The data files were retrieved like this:

import os
from ftplib import FTP

def start_ftp():
    sample_path = "MSV000084402/raw/SRM1950_20min_88_01_6950.d"
    ftp = FTP("massive.ucsd.edu")
    ftp.login()
    ftp.cwd(sample_path)
    return ftp

ftp = start_ftp()
if not os.path.exists("lipidomics.d"):
    os.mkdir("lipidomics.d")
with open("lipidomics.d/analysis.tdf_bin", "wb") as f:
    ftp.retrbinary("RETR " + "analysis.tdf_bin", f.write)
with open("lipidomics.d/analysis.tdf", "wb") as f:
    ftp.retrbinary("RETR " + "analysis.tdf", f.write)

[Maintainance] CICD

While working on the CICD of the project to publish the docker image I noticed a couple of things that you/we might want to consider addressing

On the Version_Bumped stage of the CICD it throws this warning

Warning: The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

This could be updated.

Version bumping is handled in a very ... bespoke way using sed, a tool like bumpver could be used to make this automatic.
A lot of the packaging an versioning could be simplified by migrating to the pyproject.toml structure (and would at the same time remove many redundant places where configurations and dependencies are stablished).
There are maguey processes spread in multiple scripts around the repo. It would be a good idea to centralize those in a MAKEFILE (even ion the makefile just calls those scripts).

lmk what you think
Best
Sebastian

[Question] precursor_indices definition

Hello!

I have been exploring the package and had a question on how the 'precursor_indices' is defined.
When I filter based on the precursor index (1 in this case), get two distinct quad isolation.

This is contrary to what I thought would be the case (I was expecting each precursor to be only one iso window), so I wanted to know whether this is an expected behavior or how the precursor index is being defined.

Is there any indexing to access unique quad isolation windows in a sequential manner?

Appreciating your help!
-Sebastian

Snippet showing how I am getting the results

# This file comes from the PRIDE repo for the alphatims paper
# curl ftp.pride.ebi.ac.uk/pride/data/archive/2022/02/PXD028735/LFQ_timsTOFPro_diaPASEF_Ecoli_01.d.zip --output ../data/ecoli_timsTOFPro_diaPASEF.d.zip
dia_data = TimsTOF("../data/LFQ_timsTOFPro_diaPASEF_Ecoli_01.d", mmap_detector_events=False)
inds = dia_data[{"precursor_indices": 1}, "raw"]
out = dia_data.convert_from_indices(inds,
        raw_indices_sorted=True,
        return_quad_mz_values=True,
        )
np.unique(out["quad_low_mz_values"], return_counts=True)
# (array([400., 800.]), array([ 20845610, 119548979]))

Loading data that were not collected using ddaPASEF or diaPASEF

Is your feature request related to a problem? Please describe.
Data acquired using a TIMS method, but not a PASEF method enabled cannot be loaded. The GUI gives the error :"Scan mode is not ddaPASEF or diaPASEF" and said the file was corrupted. I am running on windows.

I also tried on command line by running the following:
alphatims.bruker.TimsTOF(r'C:\Users\Mast\Desktop\timstof_python\analysisname.d')

The following error occurred:
Traceback (most recent call last):

File "", line 1, in
atb.TimsTOF(r'C:\Users\Mast\Desktop\timstof_python\analysisname.d')

File "C:\Users\Mast\AppData\Roaming\Python\Python38\site-packages\alphatims\bruker.py", line 777, in init
self._import_data_from_d_folder(

File "C:\Users\Mast\AppData\Roaming\Python\Python38\site-packages\alphatims\bruker.py", line 817, in _import_data_from_d_folder
) = read_bruker_sql(bruker_d_folder_name)

File "C:\Users\Mast\AppData\Roaming\Python\Python38\site-packages\alphatims\bruker.py", line 185, in read_bruker_sql
with sqlite3.connect(

OperationalError: unable to open database file

Describe the solution you would like
I would like to be able to load a file that has TIMS data, but not MS/MS data. if this is possible.

Precursor filter

Retrieving scans by precursor index should be possible. Can be implemented in the quad slice perhaps?

No plots are shows when using the GUI in Google Chrome or Firefox

Describe the bug
I have created an 'alphatims docker image' (see attachment) for alphatims 0.3.2 that I run in WSL2, and I can manage to port the gui to my browser (localhost:4500). However, once I have uploaded a *.d file, there are no plots displayed. I can't see any error messages being displayed either (see attached log)

Dockerfile to create the alphatims image (zipped)

Dockerfile.zip

Setup file run by CMD in dockerfile
setup.zip

To Reproduce
Steps to reproduce the behavior (more detailed in the Dockerfile):

'conda create -n alphatims python=3.8'
'conda install -n alphatims pip'
'conda activate alphatims'
pip install 'alphatims[plotting]'
Run alphatims gui --port:4500
Open browser and load http://localhost:4500
Load either *.d folder or *.hdf file

Expected behavior
Expect plots to appear once the file is uploaded in the gui. The plots are compressed and do not show anything.

Logs
Please provide the log (see the AlphaTims terminal on where to find it).
log_20220607125907.txt

Screenshots
If applicable, add screenshots to help explain your problem.

Version (please complete the following information):

Installation Type: pip
If no log is available, provide the following:
- Platform information
  - system Linux
  - release 5.10.102.1-microsoft-standard-WSL2
  - version #1 SMP Wed Mar 2 00:30:59 UTC 2022
  - machine x86_64
  - processor [e.g. i386]
  - cpu count 20
- Python information:
  - alphatims version 0.3.2
  - alphatims - 0.3.2
  - bokeh - 2.4.3
  - click - 8.1.3
  - datashader - 0.14.0
  - h5py - 3.7.0
  - hvplot - 0.8.0
  - numba - 0.55.2
  - pandas - 1.4.2
  - psutil - 5.9.1
  - python - 3.8.13
  - python-lzf -
  - pyzstd - 0.15.2
  - selenium - 4.2.0
  - tqdm - 4.64.0

Additional context
Add any other context about the problem here. Attached log files or upload data files if possible.

convert_from_indices

Greetings!

alphatims/alphatims/bruker.py

Line 1262 in f6e55e8

frame_indices=None,

I'd like to access the first X frames of the data. I don't want to go with standard indexing because I only need a couple of fields. I'm attempting to use .convert_from_indices and it produces strange results.

# accessing the first two frames
data[[1,2]].shape
# returns (6879, 14)

data.convert_from_indices(
    raw_indices=None,
    frame_indices=[1,2],
    return_tof_indices=True,
)

_['tof_indices'].shape

# returns (1, 8838754)
# an array of arrays!
# the length of inner array is the length of whole run

I've read the function 5 times but don't understand how this happens... :)

recode polarity to int

Hello,

You provide the drop_polarity argument to read_bruker_sql but do not use it. The code says:

        frames = pd.DataFrame(
            {
                col: pd.to_numeric(
                    frames[col]
                ) for col in frames if col != "Polarity"
            }
        )

There is probably an if drop_polarity: missing.
https://github.com/MannLabs/alphatims/blob/master/alphatims/bruker.py#L249

I understand the motivation for keeping only numeric types. Personally, I would prefer the polarity was never dropped.

there is no place to infer polarity from. I wish it was in the global metadata but it's not.
how about keeping polarity but recoding it into -1 and +1? I tried opening a pull request but had trouble with pushing my commit here.

frames['Polarity'] = frames['Polarity'].apply(lambda x: int(x+'1')) then all your dtypes are numeric.

Only increasing indices are supported: Can combinations of recurring elements be retrieved at once?

Describe the Problem
Indexing does not work as expected with an iterable. Wrong elements are returned and correct elements are missing. This is due to recurring and unsorted indices not being supported.

What I want to do
Given a combination of frames and precursors, I want to only get the fragments matching to the provided pairs (frame, precursor)

To Reproduce
There are plenty of ways to break it but this is what's working consistently for me:

data = TimsTOF(<path_to_.d>)
matching_df = data[[2,2,3], :, [1,3,1]]
matching_df

I.e., create an object and select frame 2 and 3 where precursors 1 and 3 should be returned for frame 2 and precursor 1 for frame 3.

Expected behavior

A dataframe should be returned where the frame_indices and precursor_indices in individual rows should be matching pairs like they were indexed. In this case:

frame_indices, precursor_indices
2, 1
2, 3
3, 1

Actual behaviour
Every precursor except 1 is ignored. In case of more indices, it also happens that random precursors are selected that aren't even in the list of indices.

Screenshots

Here you can see how the pair (2,3) is missing in the dataframe.

Version (please complete the following information):

latest stable alphatims within conda environment, installed with pip
If no log is available, provide the following:
- Platform information
  - system Ubuntu
  - release 22.04.1
  - machine x86_64
  - processor Intel(R) Xeon(R) Gold 6148
  - cpu count 160
- Python information:
  - alphatims version 1.0.8
  - python version 3.9

Additional context
This is part of timsTOF integration into the oktoberfest rescoring pipeline. I am reading the pasefMsmsScans.txt table from a MaxQuant run and try to filter the alphatims object for those fragments relevant for the MQ search. The idea is to create master MS2 spectra by summing the fragment intensities of individual precursors so that they can be compared against fragment intensity predictions to calculate spectral angles. Maybe I am doing sth. wrong or maybe you have an idea how to better approach this if it circumvents this indexing problem.

Discrepancy of m/z values between AlphaTims, OpenTIMS, and Bruker SDK

Thank you for sharing this library. Its performance and slicing is really nice work.

Describe the bug

It seems too soon to call it a bug; it may well be something I'm doing. I've noticed that m/z values are different for the same raw points between AlphaTims, OpenTIMS, and Bruker SDK. The m/z values from OpenTIMS and the Bruker SDK are the same, but I've seen 5-8 ppm difference in m/z values from AlphaTims.

To Reproduce

Using AlphaTims:

file_name = './P3856_YHE211_1_Slot1-1_1_5104.d'
data = alphatims.bruker.TimsTOF(file_name)
df = data[
    {
        "frame_indices": 14612,
        "scan_indices": 33,
    }
]
df[['frame_indices','rt_values','scan_indices','mz_values','intensity_values']]

Using OpenTIMS:

path = pathlib.Path(file_name)
D = OpenTIMS(path)
df_l = []
for idx,d in enumerate(D.query_iter(D.ms1_frames, columns=('frame','mz','scan','intensity','retention_time'))):
    if ((d['retention_time'][0] >= rt_lower) and (d['retention_time'][0] <= rt_upper)):
        d['frame'] = d['frame'].astype(np.uint16, copy=False)
        d['mz'] = d['mz'].astype(np.float32, copy=False)
        d['scan'] = d['scan'].astype(np.uint16, copy=False)
        d['intensity'] = d['intensity'].astype(np.uint16, copy=False)
        d['retention_time'] = d['retention_time'].astype(np.float32, copy=False)
        df_l.append(pd.DataFrame(d))
df = pd.concat(df_l, axis=0, sort=False, ignore_index=True)
df[(df.scan==33) & (df.frame==14612)][['frame','retention_time','scan','mz','intensity']]

Using Bruker SDK:

td = timsdata.TimsData(file_name)
frame_points = []
frame_id = 14612
for scan_idx,scan in enumerate(td.readScans(frame_id=frame_id, scan_begin=0, scan_end=number_of_scans)):
    index = np.array(scan[0], dtype=np.float64)
    mz_values = td.indexToMz(frame_id, index)
    intensity_values = scan[1]
    scan_number = scan_idx
    number_of_points_on_scan = len(mz_values)
    for i in range(0, number_of_points_on_scan):
        mz_value = float(mz_values[i])
        intensity = int(intensity_values[i])
        d = {'frame_id':frame_id, 'mz':mz_value, 'scan':scan_number, 'intensity':intensity, 'retention_time_secs':retention_time_secs}
        frame_points.append(d)
df = pd.DataFrame(frame_points)
df[(df.scan==33) & (df.frame_id==14612)][['frame','retention_time_secs','scan','mz','intensity']]

Expected behavior

I don't know which framework is giving the correct m/z readings, but it seems odd they are different.

Logs

2021-08-02 15:07:21> Importing data from /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:21> Reading frame metadata for /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:22> Reading 25,701 frames with 1,292,274,220 detector strikes for /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:44> Indexing /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d...
2021-08-02 15:07:44> Opening handle for /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:44> Fetching mobility values from /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:44> Closing handle for /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:44> Opening handle for /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:44> Fetching mz values from /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:44> Closing handle for /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d
2021-08-02 15:07:46> Succesfully imported data from /media/big-ssd/experiments/P3856/raw-databases/P3856_YHE211_1_Slot1-1_1_5104.d

Screenshots

Version (please complete the following information):

I installed with pip in a fresh conda environment:

conda create --name alphatims python=3.8
conda activate alphatims
pip install "alphatims[stable]"

2021-08-02 15:07:21> Platform information:
2021-08-02 15:07:21> system        - Linux
2021-08-02 15:07:21> release       - 4.15.0-147-generic
2021-08-02 15:07:21> version       - #151-Ubuntu SMP Fri Jun 18 19:21:19 UTC 2021
2021-08-02 15:07:21> machine       - x86_64
2021-08-02 15:07:21> processor     - x86_64
2021-08-02 15:07:21> cpu count     - 12
2021-08-02 15:07:21> cpu frequency - 3331.27 Mhz
2021-08-02 15:07:21> ram           - 51.4/62.8 Gb (available/total)

2021-08-02 15:07:21> Python information:
2021-08-02 15:07:21> alphatims  - 0.2.8
2021-08-02 15:07:21> bokeh      - 
2021-08-02 15:07:21> click      - 8.0.1
2021-08-02 15:07:21> datashader - 
2021-08-02 15:07:21> h5py       - 3.1.0
2021-08-02 15:07:21> hvplot     - 
2021-08-02 15:07:21> numba      - 0.53.1
2021-08-02 15:07:21> pandas     - 1.3.1
2021-08-02 15:07:21> psutil     - 5.8.0
2021-08-02 15:07:21> python     - 3.8.5
2021-08-02 15:07:21> python-lzf - 
2021-08-02 15:07:21> pyzstd     - 0.14.4
2021-08-02 15:07:21> selenium   - 
2021-08-02 15:07:21> tqdm       - 4.61.1

Additional context

All three libraries are using a float64 for the m/z value. I was thinking perhaps there's an intermediate step in AlphaTims where the m/z values are downcast to 32 bits? Just a thought.

Linux one-click install

Currenttly, the Linux one-click installer refers to a single runnable app. This should be a proper .deb or .rpm package installer.

Pyinstaller

When pyinstaller is used, the GUI is unable to use datashader properly.

[Pitch] Docker Image

Hello there! I was wondering if you would be interested in adding a docker image to the release cycle of the package.

I would gladly make PR for it if you want!

Thanks for the good work on the package :)
-Sebastian

python 3.10

Hello! I cloned and edited alphatims/__init__.py to allow for python 3.10 and built a wheel with it. It appears to install and work fine. Would you mind officially allowing python 3.10?

Some key dependencies:

  • Installing llvmlite (0.39.0)
  • Installing numpy (1.22.4)
  • Installing numba (0.56.0)
  • Installing alphatims (1.0.0 alphatims/alphatims-1.0.0-py3-none-any.whl)

Define tests and setup CI for GitHub

Reducing the memory footprint of the TimsTOF object

Is your feature request related to a problem? Please describe.
In my application I use Ray to distribute cuboids of the raw data for multiple workers on a node. The TimsTOF object seems to occupy about 8GB in memory for one of my raw databases once it's instantiated. Shared objects in Ray are serialised in Plasma but I was wondering whether this object could be smaller.

Describe the solution you would like
Perhaps consider making RT values a numpy array of float32 rather than float64. mz values could be float32 and only float64 if high precision is required? Scan and frame indices could be uint16 and uint32 respectively rather than int64.

Describe alternatives you've considered
The current solution works fine but I'm downcasting the types once I slice into a dataframe.

Additional context
Add any other context or screenshots about the feature request here.

1.0.5 GUI wont start

Hello,

I just stumpled across the following error: I installed version 1.0.5. of AlphaTIMS via the GUI installer. When I try to open AlphaTIMS with the GUI it wont open (tried it with several browsers set as default). I put the cmd-log of the error message of the quitting below. As this error does not occur with version 1.0.4. of AlphaTIMS I dont think something is wrong with my setup.

Thank you for the cool software.
All the best,
Tobias

Logs
Traceback (most recent call last):
File "alphatims_pyinstaller.py", line 14, in
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
File "alphatims\gui.py", line 15, in
import alphatims.bruker
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
File "alphatims\bruker.py", line 277, in
@alphatims.utils.njit(nogil=True)
File "alphatims\utils.py", line 444, in njit
import numba
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in load_unlocked
File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
File "numba_init.py", line 190, in
ensure_critical_deps()
File "numba_init.py", line 133, in _ensure_critical_deps
import scipy
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in load_unlocked
File "PyInstaller\loader\pyimod03_importers.py", line 495, in exec_module
File "scipy_init.py", line 76, in
delvewheel_init_patch_1_0_1()
File "scipy_init.py", line 67, in _delvewheel_init_patch_1_0_1
os.add_dll_directory(libs_dir)
File "os.py", line 1109, in add_dll_directory
FileNotFoundError: [WinError 2] Das System kann die angegebene Datei nicht finden: 'C:\Users\XXX.XXX\AppData\Local\Programs\AlphaTims\scipy.libs'
[20160] Failed to execute script 'alphatims_pyinstaller' due to unhandled exception!

Exporting diaPASEF as mgf

Describe the bug
Exporting diaPASEF data to mgf fails - but hdf5 succeeds. I can successfully export ddaPASEF files to mgf.

To Reproduce

$ alphatims export mgf HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.d
2023-08-02 12:10:31> File HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.d is not a ddaPASEF file, nothing to do.

$ alphatims export hdf HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.d
2023-08-02 12:09:36> Successfully wrote TimsTOF data to HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.hdf.

Expected behavior
Export to MGF (I'm trying to create mzMLs)

Logs

*******************
* AlphaTims 1.0.8 *
*******************
2023-08-02 12:10:28> Platform information:
2023-08-02 12:10:28> system        - Windows
2023-08-02 12:10:28> release       - 10
2023-08-02 12:10:28> version       - 10.0.22621
2023-08-02 12:10:28> machine       - AMD64
2023-08-02 12:10:28> processor     - Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
2023-08-02 12:10:28> cpu count     - 20
2023-08-02 12:10:28> cpu frequency - 3600.00 Mhz
2023-08-02 12:10:28> ram           - 16.2/31.9 Gb (available/total)
2023-08-02 12:10:28>
2023-08-02 12:10:28> Python information:
2023-08-02 12:10:28> alphatims  - 1.0.8
2023-08-02 12:10:28> bokeh      -
2023-08-02 12:10:28> click      - 8.1.6
2023-08-02 12:10:28> datashader -
2023-08-02 12:10:28> h5py       - 3.9.0
2023-08-02 12:10:28> hvplot     -
2023-08-02 12:10:28> jinja2     -
2023-08-02 12:10:28> numba      - 0.57.1
2023-08-02 12:10:28> pandas     - 2.0.3
2023-08-02 12:10:28> psutil     - 5.9.5
2023-08-02 12:10:28> python     - 3.10.8
2023-08-02 12:10:28> python-lzf -
2023-08-02 12:10:28> pyzstd     - 0.15.9
2023-08-02 12:10:28> selenium   -
2023-08-02 12:10:28> tqdm       - 4.65.0
2023-08-02 12:10:28>
2023-08-02 12:10:28> Current AlphaTims version is up-to-date with GitHub.
2023-08-02 12:10:28>
2023-08-02 12:10:28> Running CLI command `alphatims export mgf` with parameters:
2023-08-02 12:10:28> bruker_raw_data             - HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.d
2023-08-02 12:10:28> centroiding_window          - 5
2023-08-02 12:10:28> disable_log_stream          - False
2023-08-02 12:10:28> disable_overwrite           - False
2023-08-02 12:10:28> keep_n_most_abundant_peaks  - -1
2023-08-02 12:10:28> log_file                    - D:\Github\scratch\bruker\.env\lib\site-packages\alphatims\logs\log_20230802121028.txt
2023-08-02 12:10:28> threads                     - 19
2023-08-02 12:10:28>
2023-08-02 12:10:29> Importing data from HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.d
2023-08-02 12:10:29> Using HDF import for HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.hdf
2023-08-02 12:10:31> Successfully imported data from HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.d
2023-08-02 12:10:31> File HT_20230705_Belharra_200SPD_Exp1_Yeast_300ng_blank_A1_S1-A1_1_2636.d is not a ddaPASEF file, nothing to do.
2023-08-02 12:10:31> Analysis done in 2.83 seconds.
2023-08-02 12:10:31> WARNING: Temp mmap arrays were written to C:\Users\mlazear\AppData\Local\Temp\temp_mmap_zk1n8ojb. Cleanup of this folder is OS dependant, and might need to be triggered manually! Current space: 174,547,656,704

Screenshots
If applicable, add screenshots to help explain your problem.

Version (please complete the following information):

Installation Type [e.g. One-Click Installer / Pip / Developer]: pip install alphatims in a fresh python venv

Additional context

Happy to transfer the file if needed

np.uint16 sufficient for intensities?

Hello, do I understand correctly that you are putting intensities into np.uint16 that only goes to 65535? I believe it should be np.uint32...

alphatims/alphatims/bruker.py

Line 503 in 12b9a13

intensities = np.empty(frame_indptr[-1], dtype=np.uint16)

Consider this dataset that you can get from UCSD MASSIVE:

import os
from ftplib import FTP
ftp_url = 'ftp://massive.ucsd.edu/MSV000084402/'

def start_ftp():
    sample_path = 'MSV000084402/raw/SRM1950_20min_88_01_6950.d'
    ftp = FTP('massive.ucsd.edu')
    ftp.login()
    ftp.cwd(sample_path)
    return ftp

ftp = start_ftp()

os.mkdir('lipidomics.d')
with open('analysis.tdf_bin', "wb") as f:
    ftp.retrbinary("RETR " + 'lipidomics.d/analysis.tdf_bin', f.write)

with open('analysis.tdf', "wb") as f:
    ftp.retrbinary("RETR " + 'lipidomics.d/analysis.tdf', f.write)

# pip install opentimspy opentims_bruker_bridge
import opentims_bruker_bridge
from opentimspy.opentims import OpenTIMS

TIMS_FOLDER = 'lipidomics.d'
D = OpenTIMS(TIMS_FOLDER)

frames = pd.DataFrame(D.frames)
ms1_frame_numbers = frames.query("(MsMsType == 0) & (ScanMode != 0)").Id.to_numpy()
print(f'{D}, {len(frames)} frames, {len(ms1_frame_numbers)} MS1 frames')

MIN_INTENSITY = 500
ms1 = (
    pd.DataFrame(D.query(frames=ms1_frame_numbers[:500]))
    .query(f'intensity > {MIN_INTENSITY}')
    .reset_index(drop=True)
)
ms1[ms1.intensity > 65535].shape

Returns 1470 rows.
There are definitely intensities in the hundreds of thousands - outside np.uint16 range.

Feature finding

Include in contours when plotting?

Remove warning message on module load

We should do the same as MannLabs/alphabase#163.

Numpy Dependency issue "Attribute not found"

Describe the bug
Hey all, i am starting to explore TIMS data and already knew about this tool and had finally some time to try it out. During Indexing the spectra (in MGF conversion) i got the following error:

AttributeError: module 'numpy' has no attribute 'int'

To Reproduce

To reproduce just execute the following alphatims export mgf (with numpy >=1.24.0)

Expected behavior
No error and i get an MGF (for testing)

Logs
Just for completeness, here the full error:

    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Version (please complete the following information):
I tried this with a new environment and just executed pip install alphatims

Package         Version
--------------- ---------
alphatims       1.0.7
click           8.1.3
h5py            3.9.0
llvmlite        0.40.1rc1
numba           0.57.0
numpy           1.24.3
pandas          2.0.2
pip             23.1.2
psutil          5.9.5
python-dateutil 2.8.2
pytz            2023.3
pyzstd          0.15.7
setuptools      67.8.0
six             1.16.0
tqdm            4.65.0
tzdata          2023.3
wheel           0.40.0

Additional context
Also in issue #25 this was mentioned!

To fix this issue, it is enough to run pip install numpy==1.23.5 (the version before they remove int as dtype).

It would be great if you could update the requirements.txt and include it there. Alternatively you could also update this line to np.int64.

File extension not understood

Hi all,

Im trying to pack alphatims in a docker container and run use the CLI version of the tool. Here I use Alphatims 0.3.2 but I run into an error where it does not recognize the file extension. I run the line below and get the following output. The bruker-Directory without quotations does lead to the same error.

root@70689f3fe955:/var/opt/project# alphatims export hdf "/var/opt/project/K562_post_1_S1-A1_1_12197.d/"

AlphaTims 0.3.2 *

2022-04-09 21:09:09> Platform information:
2022-04-09 21:09:09> system - Linux
2022-04-09 21:09:09> release - 5.13.0-30-generic
2022-04-09 21:09:09> version - #33~20.04.1-Ubuntu SMP Mon Feb 7 14:25:10 UTC 2022
2022-04-09 21:09:09> machine - x86_64
2022-04-09 21:09:09> processor -
2022-04-09 21:09:09> cpu count - 24
2022-04-09 21:09:09> cpu frequency - 1.66 Mhz
2022-04-09 21:09:09> ram - 6.3/11.7 Gb (available/total)
2022-04-09 21:09:09>
2022-04-09 21:09:09> Python information:
2022-04-09 21:09:09> alphatims - 0.3.2
2022-04-09 21:09:09> bokeh -
2022-04-09 21:09:09> click - 8.1.2
2022-04-09 21:09:09> datashader -
2022-04-09 21:09:09> h5py - 3.6.0
2022-04-09 21:09:09> hvplot -
2022-04-09 21:09:09> numba - 0.55.1
2022-04-09 21:09:09> pandas - 1.4.2
2022-04-09 21:09:09> psutil - 5.9.0
2022-04-09 21:09:09> python - 3.9.12
2022-04-09 21:09:09> python-lzf -
2022-04-09 21:09:09> pyzstd - 0.15.2
2022-04-09 21:09:09> selenium -
2022-04-09 21:09:09> tqdm - 4.64.0
2022-04-09 21:09:09>
2022-04-09 21:09:09> Current AlphaTims version is up-to-date with GitHub.
2022-04-09 21:09:09>
2022-04-09 21:09:09> Running CLI command alphatims export hdf with parameters:
2022-04-09 21:09:09> bruker_raw_data - /var/opt/project/K562_post_1_S1-A1_1_12197.d/
2022-04-09 21:09:09> disable_log_stream - False
2022-04-09 21:09:09> disable_overwrite - False
2022-04-09 21:09:09> enable_compression - False
2022-04-09 21:09:09> log_file - /usr/local/lib/python3.9/site-packages/alphatims/logs/log_20220409210909.txt
2022-04-09 21:09:09> threads - 23
2022-04-09 21:09:09>
2022-04-09 21:09:10> Importing data from /var/opt/project/K562_post_1_S1-A1_1_12197.d/
2022-04-09 21:09:10> Something went wrong, execution incomplete!
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/alphatims/cli.py", line 114, in parse_cli_settings
yield kwargs
File "/usr/local/lib/python3.9/site-packages/alphatims/cli.py", line 272, in export_hdf
data = alphatims.bruker.TimsTOF(parameters["bruker_raw_data"])
File "/usr/local/lib/python3.9/site-packages/alphatims/bruker.py", line 1022, in init
raise NotImplementedError(
NotImplementedError: WARNING: file extension not understood

[Question] Precursor masses from XML settings are not present as indices in raw data

Dear alphatims developers,

I cannot open the file “sample_414.d” in the alphatimsGUI it gives the error “ValueError: slice step cannot be zero”.
Log file attached:
log_20230911152500.txt

When I open the file in python, it works.

timstof_data = alphatims.bruker.TimsTOF(tmp_file.as_posix())
tab_raw = timstof_data[:,:,:,:]

However, looking at the precursor_indices column leaves me puzzled. It contains only the value “0”, but I specified 4 values in my settings.

Number of unique values and default print:

First 10 rows of all columns:

Furthermore the “timstof_data.precursors” data frame is None. The MS1 filter values are also shown in the Bruker analyzer software. Screenshot from “sample_414.d\414.m\microTOFQImpacTemAcquisition.method” (and I measured in positive mode).

Question

Do I look in the wrong column? Are the MS1 filter masses somehow encoded in another index column?

I am looking forward to your suggestions :-)

Synchro/slice data import

Hi,

Thanks a lot for developing alphatims.
I am currently facing an issue when importing a segmented (metabolomics) synchro/slicePASEF analysis, yielding in the following error:

File /usr/miniconda3/envs/py3.8/lib/python3.8/site-packages/alphatims/bruker.py:2306, in TimsTOF.set_cycle(self)
   2304     low_mz = row.IsolationMz - row.IsolationWidth / 2
   2305     high_mz = row.IsolationMz + row.IsolationWidth / 2
-> 2306     cycle[
   2307         frame,
   2308         scan_begin: scan_end,
   2309     ] = (low_mz, high_mz)
   2310     precursor_frames[frame] = False
   2312 cycle[precursor_frames] = (-1, -1)

IndexError: index 166 is out of bounds for axis 0 with size 4

Let me know if any need further information is needed or a minimal data example via email.
Best regards,
Vivian

mannlabs / alphatims Goto Github PK

alphatims's People

Contributors

Stargazers

Watchers

Forkers

alphatims's Issues

Recommend Projects

Recommend Topics

Recommend Org