frbs / sigpyproc3 Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 11.0 13.27 MB

Python3 version of Ewan Barr's sigpyproc library

Home Page: https://sigpyproc3.readthedocs.io

License: MIT License

Python 100.00%

fast-radio-bursts filterbank pulsars radio-astronomy

sigpyproc3's People

Contributors

Stargazers

Watchers

Forkers

david-mckenna ewanbarr filterbank kmjc kaichao arul16psp05 pavanuttarkar chime-pulsar-timing joesbright lzx12490

sigpyproc3's Issues

read_dedisp_block gives only the lowest frequency channel copied into all channels

I realised that I can just read a block and dedisperse afterwards, but I thought I will report it anyways.

fil = FilReader(file)
data = fil.read_block(0, fil.header.nsamples)
data_dd = fil.read_dedisp_block(0, fil.header.nsamples, 0)
print(data_dd == data_dd[0])  # all rows are the same as the 0th
print(data_dd[0] == data)  # The 0th row is the same as the last row in the data.

Output

FilterbankBlock([[ True,  True,  True, ...,  True,  True,  True],
                 [ True,  True,  True, ...,  True,  True,  True],
                 [ True,  True,  True, ...,  True,  True,  True],
                 ...,
                 [ True,  True,  True, ...,  True,  True,  True],
                 [ True,  True,  True, ...,  True,  True,  True],
                 [ True,  True,  True, ...,  True,  True,  True]])
FilterbankBlock([[False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False],
                 ...,
                 [False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False],
                 [ True,  True,  True, ...,  True,  True,  True]])

unpack() function deals 1-bit data differently with ewanbarr/sigpyproc

Hi,
I found that this repository unpacks 1-bit data in big endian order, however ewanbarr/sigpyproc reacts differently in this case:
https://github.com/ewanbarr/sigpyproc/blob/54a804200723d30601026be5bfa37ec90c8266c1/c_src/libSigPyProc.c#L23-L27
I wonder if this is meant to be?

[Tracker] Implement sigproc CLI tools

Implement python version of sigproc CLI tools.

This is a tracker issue that lists the remaining apps to be added.

List of APIs

IO

FRB/Pulsar

Pulse Injection

fake

PFITS multi-file

Add support to read multiple contiguous SEARCH-mode PSRFITS files.

pybind11 with openmp

Test properly if OpenMP works in pybind11 framework, or there is a need to acquire/release GIL while calling C++ code.
py::gil_scoped_acquire and py::gil_scoped_release

Built OK but cannot import on Mac

Hi,

I have seen past issues on compiling sigpyproc3 on Mac. Here I share my experience with a successful compiling, but my problem is I cannot import the installed sigpyproc (sadly). Please give some instructions.

Install OpenMP

brew install libomp

Install clang-omp using homebrew:

brew install llvm

Add llvm binaries to your path using :

echo 'export PATH="/usr/local/opt/llvm/bin:$PATH"' >> ~/.bash_profile

echo 'export PATH="/usr/local/include:$PATH"' >> ~/.bash_profile

echo 'export PATH="/usr/local/lib:$PATH"' >> ~/.bash_profile

Test clang usgae:

clang -fopenmp hello.c -o hello -L /usr/local/lib/

./hello

You can create any simple hello.c file here to test -fopenmp and clang

linking

ln -s /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.spec /usr/local/lib/libgomp.spec
ln -s /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.1.dylib /usr/local/lib/libgomp.1.dylib
ln -s /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.dylib /usr/local/lib/libgomp.dylib
ln -s /usr/local/Cellar/gcc/10.2.0/lib/gcc/10/libgomp.a /usr/local/lib/libgomp.a

Installation

> pip3 install git+https://github.com/FRBs/sigpyproc3

Collecting git+https://github.com/FRBs/sigpyproc3
  Cloning https://github.com/FRBs/sigpyproc3 to /private/var/folders/62/chn0plln2b37czw0t47n9kd80000gn/T/pip-req-build-83fgta5g
  Running command git clone -q https://github.com/FRBs/sigpyproc3 /private/var/folders/62/chn0plln2b37czw0t47n9kd80000gn/T/pip-req-build-83fgta5g
Requirement already satisfied (use --upgrade to upgrade): sigpyproc==0.5.1 from git+https://github.com/FRBs/sigpyproc3 in /usr/local/lib/python3.8/site-packages
Requirement already satisfied: pybind11>=2.6.0 in /usr/local/lib/python3.8/site-packages (from sigpyproc==0.5.1) (2.6.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.8/site-packages (from sigpyproc==0.5.1) (1.19.4)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/site-packages (from sigpyproc==0.5.1) (4.53.0)
Building wheels for collected packages: sigpyproc
  Building wheel for sigpyproc (setup.py) ... done
  Created wheel for sigpyproc: filename=sigpyproc-0.5.1-cp38-cp38-macosx_10_15_x86_64.whl size=138411 sha256=6eaf5ceb8639cf1b00d76989f1a6755f6289257663a9f978dbf02329df934f96
  Stored in directory: /private/var/folders/62/chn0plln2b37czw0t47n9kd80000gn/T/pip-ephem-wheel-cache-f8hrvu4u/wheels/16/24/22/1cf298bc509480534c02d09f5529f91c47cb10053eba7b6a12
Successfully built sigpyproc

Problem:

I don't know whether it is installed properly, so I checked available python3 libraries:

>help("modules")

I can find that sigpyproc is inside the list.

However, I cannot import sigpyproc:

> python3
Python 3.8.5 (default, Jul 21 2020, 10:48:26) 
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sigpyproc.Readers import FilReader
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/sigpyproc/__init__.py", line 1, in <module>
    from sigpyproc.Readers import FilReader
  File "/usr/local/lib/python3.8/site-packages/sigpyproc/Readers.py", line 6, in <module>
    from sigpyproc.Utils import File
  File "/usr/local/lib/python3.8/site-packages/sigpyproc/Utils.py", line 5, in <module>
    import sigpyproc.libSigPyProc as lib
ImportError: dlopen(/usr/local/lib/python3.8/site-packages/sigpyproc/libSigPyProc.cpython-38-darwin.so, 2): Symbol not found: ___kmpc_for_static_fini
  Referenced from: /usr/local/lib/python3.8/site-packages/sigpyproc/libSigPyProc.cpython-38-darwin.so
  Expected in: flat namespace
 in /usr/local/lib/python3.8/site-packages/sigpyproc/libSigPyProc.cpython-38-darwin.so
>>> 

I am stuck here. Any advice? Thank you very much!



Best,
Zoe

Add comments for all C++ functions

Document and explain the intent of all core C++ functions and methods.

Compilation issues on Mac OSX

python setup.py install fails on Mac OSX (10.15.6, but it should be the same in all recent versions) with

clang: error: unsupported option '-fopenmp'
error: command 'gcc' failed with exit status 1

It is easy to work around (the system's compiler is not OpenMP-enabled and one needs to point to another gcc, e.g. installed with Homebrew), but it might be good to have a recommended Mac OSX installation somewhere in the repo or the docs. I myself don't know what is the cleanest and easiest solution.

I installed fftw and a new gcc with Homebrew, and then ran

CC="/usr/local/bin/gcc-10 -I/usr/local/include -L/usr/local/lib" python setup.py install

Header backend default incompatible with machine_ids

A friend of mine ran into a Header issue:

The backend default

sigpyproc3/sigpyproc/header.py

Line 99 in d612e8e

backend: str = "Fake"

is incompatible with the machine_ids dict

sigpyproc3/sigpyproc/io/sigproc.py

Line 55 in d612e8e

"FAKE": 0,

(I'd have made a PR but I was unsure whether it's better to change the default or the dictionary key)

downsample is broken

the call here

sigpyproc3/sigpyproc/base.py

Lines 499 to 501 in b1f87c0

    
           kernels.downsample_2d( 
        
               data, write_ar, tfactor, ffactor, self.header.nchans, nsamps 
        
           )

has an extra argument write_arr that isn't in the function

sigpyproc3/sigpyproc/core/kernels.py

Line 110 in b1f87c0

def downsample_2d(array, tfactor, ffactor, nchans, nsamps):

so doing something like

fil=FilReader(filfname)
myfil.downsample(tfactor=2)

gives

downsample :  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sigpyproc3/sigpyproc/base.py", line 499, in downsample
    kernels.downsample_2d(
TypeError: too many arguments: expected 5, got 6

UnboundLocalError: local variable 'data' referenced before assignment

When i use "PFITSReader" to read a fits file, then use "read_block" get the data, i have the problem:

Traceback (most recent call last):
File "/home/mbs/Desktop/mycode/ASTROSOFT/sigpyproc3/read_fil.py", line 13, in
c = fil.read_block(start=1000, nsamps=2000)
File "/home/mbs/Desktop/mycode/ASTROSOFT/sigpyproc3/sigpyproc/readers.py", line 208, in read_block
data = self._fitsfile.read_subints(startsub, nsubs)
File "/home/mbs/Desktop/mycode/ASTROSOFT/sigpyproc3/sigpyproc/io/pfits.py", line 412, in read_subints
sdata = self.read_subint_pol(
File "/home/mbs/Desktop/mycode/ASTROSOFT/sigpyproc3/sigpyproc/io/pfits.py", line 464, in read_subint_pol
return data
UnboundLocalError: local variable 'data' referenced before assignment

How can i fix it.

Feature: buffered file reads

Issue

Currently the file IO infrastructure in sigpyproc is limited by the fact that each read of the file creates a new buffer. Calls to the sigpyproc.io.fileio.FileReader.cread(...) function can result in two allocations, one for the data read from file and another for the unpacking buffer used in the case of 1, 2 and 4 bit data. While Python may elide some of the performance cost of these buffer allocations via caching, the behaviour is unpredictable.

To reduce the memory performance issues from from these re-allocations and to open up interoperability between sigpyproc and tools like pycuda and torch, it would be useful to have more fine-grained control over some of the buffer allocations.

Say we wish to build a sigpyproc pipeline that uses torch. To enable asynchronous memcopies between the host and GPU it is necessary to page-align, lock and register each memory buffer ("pinning" in CUDA parlance). Currently, to do this with a cread() we need to pin a new buffer on each loop. Pinning has a very heavy overhead and so should be avoided at all costs. The general strategy is to pin a buffer once at the beginning of a program and reuse that buffer.

Feature request

I suggest that the read_plan interface (at least on FilReader but maybe elsewhere) be updated to take an allocator method. The allocator method should take a number of bytes as an argument and return a object that exports the Python Buffer Protocol interface (PEP 3118), e.g.

# Simple bytearray allocator (probably the default allocator)
def bytearray_allocator(nbytes) -> Buffer:
    return bytearray(nbytes)
    
# A torch pinned memory allocator
def pinned_allocator(nbytes) -> Buffer:
    buffer = bytearray(nbytes)
    cudart = torch.cuda.cudart()
    tensor = torch.frombuffer(buffer, dtype=torch.int8)
    r = cudart.cudaHostRegister(tensor.data_ptr(), tensor.numel() * tensor.element_size(), 0)
    if not r.success:
        raise RuntimeError(f"Unable to pin memory buffer: {r}")
    return buffer

The new call signature for read_plan would look like:

    def read_plan_buffered(
        self,
        gulp: int = 16384,
        start: int = 0,
        nsamps: int | None = None,
        skipback: int = 0,
        description: str | None = None,
        quiet: bool = False,
        allocator: Callable[[int], Buffer] = None,
    ) -> Iterator[tuple[int, int, np.ndarray]]:

The semantics of the call would remain the mostly same. The main difference now being that the ndarray returned on each iteration is the same ndarray just containing different data. This could have some side affects if the behaviour is not understood, e.g. if I push the array from each loop to a list then I end up with a list containing only references to the same object, where updating one, updates all.

Other parts of the codebase that would need to change would be:

The FileReader class would need a new method creadinto that wraps the existing readinto funtion in Python's FileIO module.
The unpack functions would need to take their unpacked buffers as arguments instead of creating new buffers on every call.

disable progress bar if not needed

Use the verbose argument in the read_plan.

Update API for cpp functions

Create and return a NumPy array for each function call. will be easier to use.

Add license

It would be good to add a license! @ewanbarr and @pravirkr any opinion on this? I would suggest MIT/BSD license.

Related: telegraphic/numbits#3

Module load time

Loading sigpyproc modules is really slow—even console scripts where we do not need numba kernels.

This is because of numba compilation times. Maybe moving to numba was a bad idea.

Incorrect key name in the header class

Hi,

I think the standard key for source name is source_name but here the class uses source. This conflicts while writing a processed file using an old file. I can change and submit a pull request. Let me know.

sigpyproc3/sigpyproc/header.py

Line 506 in 8c2103c

"source": header.get("source_name", "Fake"),

telescope id not defined

When a new telescope id is encountered, it should fall back to some default. Currently, it results in a KeyError.

Roadmap discussion

Hi, I am starting this thread to discuss plans for sigpyproc.

Current work

I am refactoring the code in packaging branch based on PEP8 and the new type hints. Adding more abstraction and moving the dynamic header class to more strictly structured. This is going to break the existing API (functions name changed to lower case, etc.). Another addition would be to refactor some of the existing code into 3 classes profile (for 1D pulse profile), block (for 2D freq-time spectrum) and cube (for folded data) similar to psrchive. Also, will be adding robust S/N estimation (using pdmp approach).

Future work

FRB simulator

I have plans to integrate @vivgastro Furby as a class inside sigpyproc (with some additional features and support for UWL-like frequency bands). This will complete sigpyproc as a Single-pulse toolbox in the sense that it can generate data/pulses as well as search, visualize and measure properties of those pulses.

PSRFITS support

As @telegraphic suggested, it would be nice to support other formats (e.g., PSRFITS, HDF5). I think we can add support to read those formats into the existing sigpyproc framework. I am not sure if we should also have a unified header (e.g., all PSRFITS keywords) or a writer class as all these formats (at least sigproc and PSRFITS) are completely different. Also, there are existing packages like your working towards this. IMO we should keep the header keywords (~25) defined in the sigproc docs as the base of this package and read other format files into this framework.

For example, we can have from sigpyproc.Readers import FitsReader with all the functionalities of FilReader.

Roadmap

Should we move towards an entirely python-based package? Most of the C++ code (running mean/median, FFT) can easily be accessed using NumPy and SciPy. One issue might be the speed and multi-threading, but it can be compensated using the Numba. @telegraphic

We can revive the FRBs/sigproc project to have a fully modern C++ and sigpyproc-like object-oriented framework with proper documentation. The codebase there is very old and can be easily condensed using modern third-party libraries. @evanocathain

	kernels.downsample_2d(
	data, write_ar, tfactor, ffactor, self.header.nchans, nsamps
	)