Hi, I am starting this thread to discuss plans for sigpyproc. Curre

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Roadmap discussion about sigpyproc3 HOT 5 OPEN

pravirkr commented on August 16, 2024

Roadmap discussion

from sigpyproc3.

Comments (5)

telegraphic commented on August 16, 2024

Hey Pravir, finally found some time to respond to this!

I am pleased you are thinking about this. The FRB community needs a really solid software package that is well equipped for the future. So I think a refactor is well motivated, but I'm actually wondering whether it's better to start a completely new package, and port the useful code over to it?

Data arrays

I love the idea of having a profile, block and cube class. Can these support multiple beams and polarization too? I recently played around with xarray, which I conceptually like the idea of as a base data structure -- from which you could define a profile, cube and block. .

However there are two issues I've found for high resolution data:

The 'coords' require full numpy arrays, not just a start value and step size. Here's a github discussion I started which nobody could answer.
I really think units should be built in, similar to astropy's unit arrays (although pint-xarray tries to add this).
No support for 1,2 or 4-bit data. I really like how h5py and np.memmap() provide access to data, but the need to support low bitwidth means you can't use straightforward memory mapping.

I got a bit carried away and tried to roll my own xarray-like DataArray array class in a pet project called hyperseti, the method to read from a filterbank is here. This got much messier than I anticipated so not sure it's the best solution! I do however think the overall xarray concept is very attractive -- your data should have labels that describe it well (I would add units too!).

As well as profile/block/cube, should it support voltage-level products? Perhaps we could learn from baseband, which looks to have a task framework including dedispersion. (I'm not very familiar with these)

Other Dependencies

In terms of unit/coordinates/time handling, I think astropy does a decent job so would be a reasonable dependency. I like the idea of ALL arrays having units attached (i.e. as an astropy.Quantity array). I'm not as big of a fan of numba, but it is very reasonable to use it. I quite like pybind (in preference to cython), but perhaps only used sparingly (e.g. do we really need FFTW3?). cupy is easy for GPU acceleration, but not everyone has an NVIDIA GPU, so only an optional dependency if possible.

Internal data structure

My opinion is that the sigproc header is too dated, and it doesn't support polarization, so I would suggest using a new and improved internal data structure. (Also storing angles as a float inDDMMSS.ss is quite frankly insane). With some thought we could come up with a more suitable list of keywords: e.g I would prefer (f and df) to (fch1 and foff), (t and dt) to (tstart and tsamp), or one could use more general [var]_start and [var]_step, e.g. freq_start, time_start to have consistency.

Next-gen use cases and questions

I guess the most important for adoption is that the package is easy to use for current use cases, and that it can support next-generation use cases too. The UWL and narrow-bandwidth FRBs make a good case study: sub-band searches will probably become more common. How easy would it be to pipe data into tensorflow? How about multiple beams? How about parallelization across nodes (e.g. dask), paralleization on CPUS (openmp) or GPU acceleration (cupy)?

It is easy to have 'feature creep' and make the task huge, so it will be important to set a scope and stick to it. What exactly is a 'single pulse toolbox'? And does this need to be high performance for FRB searches, or is useability more important?

from sigpyproc3.

pravirkr commented on August 16, 2024

Hi @telegraphic,

xarray seems to be the perfect replacement for subclassing ndarray. However, one of the issues I found is in the implementation. I like the existing structure of having methods added to the array itself. We can also pipe together methods, e.g

tim = TimeSeries(data)
tim.downsample().pad().toDat()

I don't see a safe way to subclass xarray (docs discourage subclassing). Instead, they suggest using accessors to extend new methods. I am not able to figure out a clean API design using accessors. Another approach would be to wrap the xarray inside the object, and the array can then be accessed by tim.data.

For FFT, I think the better option will be to use numpy FFT and switch to pyFFTW (if available), thus removing FFTW3 as a dependency in this package.

Re sigproc headers, yes, I agree. So, one idea could be to move all sigproc and psrfits related functions to the io module and have a common Header class with classmethods which can be used as:

hdr = Header.from_sigproc()
hdr = Header.from_psrfits()

We still use some of the sigproc cmd tools, e.g., header, seek, dedisperse, etc. Should we aim to provide these tools in this package? I got interested in the idea of updating sigproc to modern standards in sigproc2.

Single-pulse toolbox

So, my intention behind a single pulse toolbox is to have a reference place of robust methods to simulate and analyze single pulses/FRBs. For example,
What is the S/N of a burst? There are different implementations in search packages, and to understand the S/N reported in the literature, we always have to refer to the package code. But there should be one robust way to calculate S/N in this toolbox (e.g., matched filtered approach @vivgastro).
What is the optimized DM of the burst? same argument again, we can have a pdmp and DM_phase methods to get a robust estimation.
How to simulate a burst the right way?
So, basically, I am looking for a python-based implementation of psrchive targeted towards FRBs using the profile and block class. We can have support for polarizations and a robust TOA estimation. These numbers can then be reported in literature and TNS to have a uniform definition.

I now think this demands a separate package, which should have the block and profile class and most of the psrchive methods. We can then also have a standard format to store these data (similar to .ar files).

from sigpyproc3.

telegraphic commented on August 16, 2024

It seems you came to the same conclusion about xarray! tim.data seems reasonable to me.

Using from_sigproc and from_psrfits for the header sounds good to me.

PyFFTW (or the cupy FFT) are easy replacements.

I love that sigproc2 is PravirK's updates to Evan Keane's fork of Michael Keith's release of Duncan Lorimer's original SIGPROC. Maybe some similar command line tools but with different names to avoid name collisions? e.g. spt_seek and spt_header.

Adding 'publication quality plots' to the list as you didn't explicitly mention it 📈

from sigpyproc3.

pravirkr commented on August 16, 2024

ok, I have moved the single-pulse toolbox idea (block/profile, psrchive, simulate, plots) to a different package burstpy for now.

As we expand the sigpyproc codebase with other formats, a restructuring is required (to v0.6 with #16).
Based on this discussion, this is my roadmap for sigpyproc v1.0.

PSRFITS search-mode (using astropy )
header in sigpyproc.io.psrfits and FitsReader in sigpyproc.readers
polarization support (I think it will be IQUV for filterbanks with nifs=4 and several other combinations for PSRFITS)
Implement most of the sigproc command-line utilities (e.g, seek) and visualization tools.
Publish to PyPI (using cibuildwheel)
need to fix OpenMP dependency on different systems for that.
[Optional] Move the backend codes from pybind11 to numba. Need to compare the performance first. It will also fix the OpenMP dependency (using numba threading layers)

from sigpyproc3.

pravirkr commented on August 16, 2024

@telegraphic Re PSRFITS support, there are already several implementations (your, PulsarDataToolbox, etc). Why re-write things again if the definitions are fixed? Instead, we can add one of the implementations as a dependency.

I like the baseband_tasks.io.psrfits framework (also baseband_tasks.io.hdf5) and will be easy to adapt here; however, it supports only fold-mode. I think the search-mode addition will be easy to implement, so will try for a PR there. All we need is robust wrappers around astropy.io.fits to read header keywords and data.

from sigpyproc3.

Roadmap discussion about sigpyproc3 HOT 5 OPEN

Comments (5)

Data arrays

Other Dependencies

Internal data structure

Next-gen use cases and questions

Single-pulse toolbox

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent