pnnl-m-q / mzapy Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 1.0 1.64 MB

A Python package that provides an interface to raw MS data in the MZA format.

License: BSD 2-Clause "Simplified" License

Python 100.00%

mzapy's People

Contributors

Stargazers

Watchers

Forkers

jianfelixguo

mzapy's Issues

working with remote MZA files

The MZA interface needs a bit of modification to handle cases where the MZA file being read may not be on the local filesystem (and possibly a read-only one). On initial thought, this will primarily affect caching of scan data which should only happen locally, regardless of whether the MZA file being read is local or remote. It will be good to examine all interactions between the MZA interface and the underlying file to see what else could be affected by the file being remote rather than local, and accommodate these cases as well.

option to plot mass spectra in centroid mode

add an option to the mzapy.view.plot_spectrum function to be able to plot spectra in centroid mode (with bars for peaks) instead of only profile data

Expand unit test coverage

The package needs a module for doing unit tests. As the code base grows, there needs to be a set of standard tests that can assess functionality across the package, making sure that new additions do not end up unknowingly breaking other parts of the code base. This is especially important before pulling changes from the dev branch to main and updating distribution packages/online docs. I am not settled on a particular format for these tests, but once established it needs to remain consistent with new additions to the code base.

Initializing calibration objects with already optimized parameters

The calibration objects in the calibration module would be more useful if there was a non-awkward way of initializing an instance using already known optimized parameters and without needing to fit data. At present, this can be achieved by initializing the object with the fit=False flag, then manually setting the object's opt_params attribute to the correct fitted parameters. This is too awkward and it would be nice to have some sort of factory function that can do all of that cleanly behind the scenes. In implementing this, it is also worth considering the distinction between creating a calibration from known calibration function/parameters (but without necessarily having access to the individual calibrant information) and loading a previously made calibration (which would include information on the calibration function/parameters as well as individual calibrant info). The latter (i.e. a mechanism for saving/loading calibrations) is also worth implementing.

All Module reference sections in API reference part of docs should have subsections

Some of the modules have subheads under "Module Reference" heading which looks nice on the main index page. For those that do not (e.g. the one with the MZA object) it just lists all the functions directly which doesnt look very nice. The best fix is to add subheadings under all of the "Module Reference" sections so that the main index page looks nice for all the modules. For the MZA object, it can be pretty easily divided into subheads for stuff like initilization/etc, functions for extracting data as DFs, functions for extracting data as arrays, and so on. For other modules, there is probably some way of categorizing the functions appropriately as well

peak fitting functions ought to return peak parameters as tuples for easier unpacking

I want to be able to do something like

peaks = find_peaks_1d_gauss(x_data, y_data, *other_params)
for peak_x, peak_ht, peak_wt in peaks:
    # iterate one peak at a time
    # and do stuff with the peak parameters
    ...

but instead it just returns separate arrays for each of the fitted parameters (x, ht, wt) so that you have to use zip to iterate peak by peak. I feel that iterating peak by peak (without zip, as in the example above) is a much more rational use case so these functions should return a list split by peaks rather than lists split by peak parameters.

Also a possible improvement would be to make them into generators that yield one peak at a time.

store stats on scan cache hits and misses

When scan caching is turned on for the MZA object, whenever scans are accessed increment one of two counters (as instance variables) reflecting whether it was a cache hit or miss. These values can even be stored in the scan cache file itself and keep track over a long term the cache hits/misses. At least as instance variables they could be pretty useful for characterizing performance.

pnnl-m-q / mzapy Goto Github PK

mzapy's People

Contributors

Stargazers

Watchers

Forkers

mzapy's Issues

working with remote MZA files

option to plot mass spectra in centroid mode

Expand unit test coverage

Initializing calibration objects with already optimized parameters

All Module reference sections in API reference part of docs should have subsections

peak fitting functions ought to return peak parameters as tuples for easier unpacking

store stats on scan cache hits and misses

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent