Comments (5)
Hey Pravir, finally found some time to respond to this!
I am pleased you are thinking about this. The FRB community needs a really solid software package that is well equipped for the future. So I think a refactor is well motivated, but I'm actually wondering whether it's better to start a completely new package, and port the useful code over to it?
Data arrays
I love the idea of having a profile, block and cube class. Can these support multiple beams and polarization too? I recently played around with xarray, which I conceptually like the idea of as a base data structure -- from which you could define a profile, cube and block. .
However there are two issues I've found for high resolution data:
- The 'coords' require full numpy arrays, not just a start value and step size. Here's a github discussion I started which nobody could answer.
- I really think units should be built in, similar to astropy's unit arrays (although pint-xarray tries to add this).
- No support for 1,2 or 4-bit data. I really like how
h5py
andnp.memmap()
provide access to data, but the need to support low bitwidth means you can't use straightforward memory mapping.
I got a bit carried away and tried to roll my own xarray-like DataArray
array class in a pet project called hyperseti, the method to read from a filterbank is here. This got much messier than I anticipated so not sure it's the best solution! I do however think the overall xarray concept is very attractive -- your data should have labels that describe it well (I would add units too!).
As well as profile/block/cube, should it support voltage-level products? Perhaps we could learn from baseband, which looks to have a task framework including dedispersion. (I'm not very familiar with these)
Other Dependencies
In terms of unit/coordinates/time handling, I think astropy does a decent job so would be a reasonable dependency. I like the idea of ALL arrays having units attached (i.e. as an astropy.Quantity
array). I'm not as big of a fan of numba
, but it is very reasonable to use it. I quite like pybind (in preference to cython), but perhaps only used sparingly (e.g. do we really need FFTW3?). cupy
is easy for GPU acceleration, but not everyone has an NVIDIA GPU, so only an optional dependency if possible.
Internal data structure
My opinion is that the sigproc header is too dated, and it doesn't support polarization, so I would suggest using a new and improved internal data structure. (Also storing angles as a float inDDMMSS.ss
is quite frankly insane). With some thought we could come up with a more suitable list of keywords: e.g I would prefer (f
and df
) to (fch1
and foff
), (t
and dt
) to (tstart
and tsamp
), or one could use more general [var]_start
and [var]_step
, e.g. freq_start
, time_start
to have consistency.
Next-gen use cases and questions
I guess the most important for adoption is that the package is easy to use for current use cases, and that it can support next-generation use cases too. The UWL and narrow-bandwidth FRBs make a good case study: sub-band searches will probably become more common. How easy would it be to pipe data into tensorflow? How about multiple beams? How about parallelization across nodes (e.g. dask), paralleization on CPUS (openmp) or GPU acceleration (cupy)?
It is easy to have 'feature creep' and make the task huge, so it will be important to set a scope and stick to it. What exactly is a 'single pulse toolbox'? And does this need to be high performance for FRB searches, or is useability more important?
from sigpyproc3.
Hi @telegraphic,
xarray seems to be the perfect replacement for subclassing ndarray. However, one of the issues I found is in the implementation. I like the existing structure of having methods added to the array itself. We can also pipe together methods, e.g
tim = TimeSeries(data)
tim.downsample().pad().toDat()
I don't see a safe way to subclass xarray (docs discourage subclassing). Instead, they suggest using accessors to extend new methods. I am not able to figure out a clean API design using accessors. Another approach would be to wrap the xarray inside the object, and the array can then be accessed by tim.data
.
For FFT, I think the better option will be to use numpy FFT and switch to pyFFTW (if available), thus removing FFTW3 as a dependency in this package.
Re sigproc headers, yes, I agree. So, one idea could be to move all sigproc and psrfits related functions to the io
module and have a common Header
class with classmethods
which can be used as:
hdr = Header.from_sigproc()
hdr = Header.from_psrfits()
We still use some of the sigproc cmd tools, e.g., header
, seek
, dedisperse
, etc. Should we aim to provide these tools in this package? I got interested in the idea of updating sigproc
to modern standards in sigproc2.
Single-pulse toolbox
So, my intention behind a single pulse toolbox is to have a reference place of robust methods to simulate and analyze single pulses/FRBs. For example,
What is the S/N of a burst? There are different implementations in search packages, and to understand the S/N reported in the literature, we always have to refer to the package code. But there should be one robust way to calculate S/N in this toolbox (e.g., matched filtered approach @vivgastro).
What is the optimized DM of the burst? same argument again, we can have a pdmp and DM_phase methods to get a robust estimation.
How to simulate a burst the right way?
So, basically, I am looking for a python-based implementation of psrchive targeted towards FRBs using the profile and block class. We can have support for polarizations and a robust TOA estimation. These numbers can then be reported in literature and TNS to have a uniform definition.
I now think this demands a separate package, which should have the block and profile class and most of the psrchive methods. We can then also have a standard format to store these data (similar to .ar
files).
from sigpyproc3.
It seems you came to the same conclusion about xarray! tim.data
seems reasonable to me.
Using from_sigproc
and from_psrfits
for the header sounds good to me.
PyFFTW (or the cupy FFT) are easy replacements.
I love that sigproc2 is PravirK's updates to Evan Keane's fork of Michael Keith's release of Duncan Lorimer's original SIGPROC. Maybe some similar command line tools but with different names to avoid name collisions? e.g. spt_seek
and spt_header
.
Adding 'publication quality plots' to the list as you didn't explicitly mention it 📈
from sigpyproc3.
ok, I have moved the single-pulse toolbox idea (block/profile, psrchive, simulate, plots) to a different package burstpy for now.
As we expand the sigpyproc codebase with other formats, a restructuring is required (to v0.6 with #16).
Based on this discussion, this is my roadmap for sigpyproc v1.0.
-
PSRFITS
search-mode (using astropy )
header insigpyproc.io.psrfits
andFitsReader
insigpyproc.readers
- polarization support (I think it will be IQUV for filterbanks with nifs=4 and several other combinations for
PSRFITS
) - Implement most of the
sigproc
command-line utilities (e.g,seek
) and visualization tools. - Publish to PyPI (using cibuildwheel)
need to fix OpenMP dependency on different systems for that. - [Optional] Move the backend codes from pybind11 to numba. Need to compare the performance first. It will also fix the OpenMP dependency (using numba threading layers)
from sigpyproc3.
@telegraphic Re PSRFITS
support, there are already several implementations (your, PulsarDataToolbox, etc). Why re-write things again if the definitions are fixed? Instead, we can add one of the implementations as a dependency.
I like the baseband_tasks.io.psrfits framework (also baseband_tasks.io.hdf5) and will be easy to adapt here; however, it supports only fold-mode. I think the search-mode addition will be easy to implement, so will try for a PR there. All we need is robust wrappers around astropy.io.fits to read header keywords and data.
from sigpyproc3.
Related Issues (19)
- Add license HOT 2
- [Tracker] Implement sigproc CLI tools
- downsample is broken HOT 1
- Header backend default incompatible with machine_ids HOT 9
- PFITS multi-file
- Incorrect key name in the header class HOT 3
- unpack() function deals 1-bit data differently with ewanbarr/sigpyproc HOT 3
- disable progress bar if not needed
- Compilation issues on Mac OSX HOT 1
- telescope id not defined
- UnboundLocalError: local variable 'data' referenced before assignment HOT 1
- Feature: buffered file reads HOT 1
- read_dedisp_block gives only the lowest frequency channel copied into all channels HOT 1
- Module load time
- Add comments for all C++ functions
- pybind11 with openmp HOT 1
- Update API for cpp functions
- Built OK but cannot import on Mac HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sigpyproc3.