bd-j / forcepho Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 4.0 37.14 MB

Generative modeling galaxy photometry for JWST

Home Page: https://forcepho.readthedocs.io

Python 90.87% C++ 5.17% Cuda 2.15% C 0.24% Shell 1.58%

forcepho's People

Contributors

Stargazers

Watchers

Forkers

jerryinfinity lgarrison jaba-git wrensuess

forcepho's Issues

Distortion corrections and Astrometry

Add a layer that applies image specific mapping from sky position to image coordinates, including the local linear transformation matrix.

Add oversampling option

This should be added to compute_gaussian to allow more accurate pixel computations

CUDA supports efficient computation with half-precision (16-bit) floats. This is probably enough precision for the pixel data in our problem. We might do this with CUDA's half2 type, but this would require using SIMD instrinsics rather than plain math operators.

This will require care in two ways: making sure that our values stay within 6e-5 to 6e4, and making sure that our operation count is small enough that the build-up of round-off/truncation error is acceptable. The signficand precision is 1/2048, or 5e-4.

Source grouping

Group very nearby/overlapping sources in the SuperScene so that they always appear together as active sources when checking out regions.

Demo multiple images

Make a demo for fitting the same source(s) with multiple images, in the case of both different bands or different dithers in the same band.

Low-res PSF gaussian fits

It might be useful to generate multi-gaussian fits to versions of the PSF smoothed by different gaussians. These fits might require fewer components, and can be used when convolving the galaxy gaussian of the same radius, thus cutting down on the total number of gaussians required.

The relevant code is in forcepho.mixtures.fit_f090w though additional code for smoothing the PSF images is necessary. The current set of gaussian radii being used for the galaxies is

minrad, maxrad, dlnr = 0.20, 28.0, np.log(2)
lnradii = np.arange(np.log(minrad), np.log(maxrad), dlnr)

Question regarding plotting the residual and model images

Hello,
I am trying to perform Bulge-Disk Decomposition on generated galaxy images and am using color_plot_together.py (from demo_color) on them.

I keep getting the following plots:

When I print the data,ierr and delta values from the function plot_residual in the color_plot_together.py file I only get arrays of nan values, but the plots for the data seem to be coming out fine unlike the plots for residual and model plots.

Can someone help me with finding out why this is happening ?

PSFs as Gaussian Mixtures

Represent PSFs as mixtures of gaussians. This should be done for several bands, where the rasterized PSF can come from webbpsf (with significant oversampling)

GPU code gives the wrong derivatives

The GPU code at the end of the MIT Hackathon runs fast but gives the wrong chi2 derivatives compared to the CPU (the chi value itself is quite close). The CPU derivatives agree with the numerical derivatives.

The GPU code is in the mithack branch and a simple script to run a proposal and print the derivatives is here: https://github.com/bd-j/forcepho/blob/mithack/tests/run_patch_gpu_test_simple.py

Source specific crpix/crval

Right now we use a single exposure dependent CRPIX/CRVAL pair for the astrometry of all sources, even though we use source and exposure dependent scale matrices. In the presence of distortions this will cause errors in pixel positions. As long as the distortions are small across a patch this can be mitigated by using a locally calculated CRPIX/CRVAl pair for each exposure; but in this case we could also use source independent scale matrices.

We should either use source and exposure dependent CRPIX/CRVAL pairs, or make the scale matrices source independent.

If we need extra room in memory for this info, we could probably save space by using a single scale matrix, since the scale matrices are simply related by a cos(dec) multiplication

CPU kernel data/residual swap

The swap of data and residual is not working for the CPU kernel. This swap is used for subtracting fixed sources and for building the design matrix to do linear least squares for the flux optimization conditional on shape.

Port to other GPUs

It will be useful to have the GPU code able to run on non-summit GPUs to which we have access.

One Volta specific feature that may not be easy to replicate is the MPS server that partitions the GPU in space instead of time. If it is true that pre-Volta MPS cannot efficiently use the GPU from multiple CPU processes, we can try to emulate MPS using a single CPU MPI rank that mediates between the sampler processes and the GPU. We can also simply live with some inefficiency until a time closer to launch when it makes sense to obtain access to a Volta GPU.

resources

use pkg_resources to store sersic (and default PSF) mixtures within the package.

Identifying Active sources for a Bulge+Disk galaxy

I am trying to perform Bulge-Disk Decomposition on generated galaxy images and was trying to use the color_fit.py and color_plot_together.py (from demo_color) on them.

I assumed that the galaxy only has 2 components, the bulge and the disk, and both the components have the same position while generating the image. I ran into a problem with forcepho when instead of detecting 2 active sources, only one active source was being detected (either the disk or the bulge).

I traced this issue back to the grow_scene and find_overlaps functions in the superscene.py file. I made a temporary fix in the the find_overlaps function which I have mentioned below,

    def find_overlaps(self, center, radius,seed_index=0, sort=False):
        """Find overlapping sources *not including at the provided center*
        Uses a metric based on |x_i - x_j|/|R_i + R_j|

        Parameters
        ----------
        center : ndarray of shape (2,)
            Floats representing the ra and dec of the center
            *in scene coordinates*

        radius : float
            Radius of influence of the `center`

        sort : bool, optional
            Whether to sort the output in increasing order of the metric

        Returns
        -------
        inds : ndarray of ints
            The indices of the sources that overlap, optionally sorted
            in order of distance from the center
        """
        kinds = self.kdt.query_ball_point(center, radius + self.boundary_radius)
        d = self.scene_coordinates[kinds] - center
        metric = np.hypot(*d.T) / (radius + self.roi[kinds])
        overlaps = (metric < 1) & (metric > 0)

#      LET US CONSIDER ONLY 2 (MAXIMUM) SOURCES
        if not np.any(overlaps)== True:		#IF BOTH VALUES IN OVERLAP ARE FALSE THEN THIS IS FOLLOWED
            if len(kinds)<=1:
                pass				#IF THERE IS ONLY ONE SOURCE THIS PART IS PASSED
            else:
                if seed_index == 0:
                    overlaps = [False,True]		#only defining for 2 sources
                else:
                    overlaps = [True,False]    		#only defining for 2 sources
        
        if sort:
            order = np.argsort(metric[overlaps])
        else:
            order = slice(None)
        return np.array(kinds)[overlaps][order]

The "if- code" I wrote works only for a maximum of 2 sources. I believe it will fail if there are 2 sources at the centre and any other source at another position.

Is there another way of accepting active sources at the centre?

pymc3 global lock on cluster

Figure out how to bypass the theano model compilation pattern that blocks on the cluster due to usage of a particular directory on disk

Pinned memory

My understanding is that using pinned memory can make repeated transfers to the GPU faster. I don't think we are at all memory bound, but it still might be useful to use pinned memory for the proposals and responses? I'm not sure it makes sense for the patch data since that is a single transfer per CPU process in the current architecture.

CPU version of the kernel

It would be useful to have a CPU version of the current CUDA evaluation kernel. Much of the code in compute_gaussians_kernel.cu should be reusable. The CPU code will have to access the pixel and meta data arrays held by the Patch class, and will need to accept proposals sent from python and return chi^2, gradients, and optionally residuals.

demo_basic crashes when writing fits image

Running the demo using CPUs:(force) [cnaw@koa demo_basic]$ python basic.py --sersic 1.6 --rhalf 0.08 --snr 100 --add_noise 0
Writing to ./tests/sersic1.6_rhalf0.080_snr100_noise0/sersic1.6_rhalf0.080_snr100_noise0_data.fits
Traceback (most recent call last):
File "/home/cnaw/python/forcepho/demo/demo_basic/basic.py", line 112, in
patcher.build_patch(region, None, allbands=bands)
File "/home/cnaw/anaconda3/envs/force/lib/python3.10/site-packages/forcepho/patches/pixel_patch.py", line 92, in build_patch
meta = self.find_exposures(region, allbands)
File "/home/cnaw/anaconda3/envs/force/lib/python3.10/site-packages/forcepho/patches/pixel_patch.py", line 405, in find_exposures
imbands = np.array([fits.getheader(f, self.sci_ext)["FILTER"]
File "/home/cnaw/anaconda3/envs/force/lib/python3.10/site-packages/forcepho/patches/pixel_patch.py", line 405, in
imbands = np.array([fits.getheader(f, self.sci_ext)["FILTER"]
File "/home/cnaw/anaconda3/envs/force/lib/python3.10/site-packages/astropy/io/fits/header.py", line 169, in getitem
card = self._cards[self._cardindex(key)]
File "/home/cnaw/anaconda3/envs/force/lib/python3.10/site-packages/astropy/io/fits/header.py", line 1819, in _cardindex
raise KeyError(f"Keyword {keyword!r} not found.")
KeyError: "Keyword 'FILTER' not found."

However, the keyword is there:

(force) [cnaw@koa demo_basic]$ listhead ./tests/sersic1.6_rhalf0.080_snr100_noise0/sersic1.6_rhalf0.080_snr100_noise0_data.fits | grep FILTER
FILTER = 'CLEAR '
FILTER = 'CLEAR '
FILTER = 'CLEAR '
FILTERS = 'CLEAR '

A skeleton file is created but without any imaging data.

HMC sampler

Add an HMC and/or nested sampling backend/demo.

Shape Reparameterization

reparameterize from position angle and b/a to a 2-d distortion or shear.

Priors

Right now priors are mostly implemented as top-hats. It would be good to add a more flexible priors system that can be used with multiple samplers.

Demo Gaussian Mixture Images

Make a demo for images (and gradients) using Gaussian Mixtures for the galaxies and PRFs.

Models with Gaussian mixture plus phonions

Develop hybrid galaxy models that use gaussians in the outer portions and phonions in the center.

Demo multiple galaxies

Make a demo for multiple galaxy fitting in a single image, either using phonions or GMs.

Test image accuracy

Develop a framework to quantify the image accuracy. For GMs this might be the convergence as a function of image subsampling, while for phonions it might be convergence as a function of number of phonions.

spline smoothing parameter

It would be great to have a high-level control of the spline smoothing parameter "s" in https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html.

Tests and CI

Need to add some continuous integration and tests.

Flux optimization

It's possible to analytically optimize the flux values, conditional on the shapes and positions, via simple linear regression. This can/should be done for scenes before any sampling.

PSF radial profiles

Need to build code to plot radial profiles of both the input PSFs and the multi-gaussian approximation.

Galaxy Gaussian Mixtures

Approximate Sersic profiles (optionally smoothed by a very narrow gaussian) with a mixture of gaussians at fixed radii.

PSF Gaussian Mixtures

Overhaul the PSF gaussian mixture code (currently using EM) to allow negative gaussians. This would allow for Airy rings.

Expand Sersic parameter ranges for gaussian mixtures

Current ranges of half-light radius and sersic index are too restrictive.

Reduced Shear

Implement reduced shear galaxy shape parameterization in the python/numba code and CUDA kernel.

MPI patch server

We have many patches that we wish to process in parallel. To do so, we will run many independent compute processes via MPI. We thus need a way to tell the compute processes (clients) which patch they are supposed to process. We probably want an MPI server to dynamically dispatch this work to allow for load balancing or updating of fixed galaxies based on results from nearby patches. mpi4py should let us do this dispatch pretty easily.

We'll may also want some pre-processor pipeline that breaks up the data and scene into patches and mini-scenes on disk (Sandro's code already does most of this, I think). That way, a compute node only needs to touch the data on disk it actually needs.

Speed up Gaussian conversion

Speed up conversions from on-sky gaussians to on-image gaussians. This is currently a bottleneck due to inefficient python loops and linear algebra calls that could probably be made faster for the 2 x 2 matrices used here.

Patch masks

Our patches may not always be rectangular; they may have ragged edges/irregular borders. We will indicate masked regions with a pixel bitmask. The patch in forcepho/patch.py is set up to take a pixel mask but currently does nothing with it. We probably will still send the masked regions to the GPU but indicate them with ierr = 0.

Patch sizes and non-compete boundaries

We need to set the size of the patch to be sufficiently large to include the covariant sources, while understanding how separated they need to be. A patch will have bordering objects called fixed galaxies. I think we want that a fixed galaxy in one patch not be an active galaxy in another patch running at the same time.

Clearly this depends on the size of objects. E.g., we're fitting out to 5-6 sigma in the outermost Gaussian mixture radius.

We are limited by the GPU shared memory to a particular number of active sources, likely fewer than 30.

We also need to be watchful that it presently appears that "non-interacting" pixels still take a noticeable amount of time to compute, like 25% of the interacting pixels. So we can't just set the patch size to be huge in the hopes that any single object interacts with only a small portion of the pixels.

So I think that we want the patches to be modest in size. But they have to handle the covariances, so probably that means that we want to get all of the active sources in a ~5" region and then extend that out to the radius of those sources. The result may be a 10" patch!

My intuition is that if two such patches abut, then the fixed galaxies in one are not the active galaxies in another. So this might be a simple way to dice up the dispatch problem, i.e., define a grid of 10" patches (which means that (5/10)^2 = 25% of objects are active), run them all for a few hundred iterations, then pick a new registration of the 10" grid. At least for the UDF application, where the data is already on a fixed pixel grid, that could be simple.

Spline Galaxy Gaussian Mixtures

Create simple functions to interpolate the gaussian amplitudes as a function of Sersic index and hal-light radius. Ideally these will be amenable to autodiff.

Include fixed galaxies in the fitting

I think it will be simple to incorporate galaxies from the border into the current GPU work flow:

Prepare the patch. Then call the kernel once including only the fixed Gaussians; ignore the returned likelihood and derivatives (i.e., don't even need to copy them back). The resulting residual image will then be what's left. Then swap the pointers between the data and residual array in the patch, e.g., by re-copying the patch structure (just the structure, not all of the arrays that it points to). Then one can proceed with the real proposals.

Super pixels

To support warp-level short-circuiting of evaluation of gaussians on distant pixels, we want spatially compact pixel regions to be contiguous in memory. We might do this by ordering the flattened pixel arrays in "super-pixel" order, where a super-pixel is a block of 4x8 or 8x8 (some multiple of the warp size).

This will introduce new padding elements along each dimension, which can be indicated with ierr = 0. Some care will be needed in unpacking the flattened, padded residual images back into their original sizes and shapes.

Speed up Sersic splines

Right now generating a proposal for a single source (even for the GPU) involves evaluating several bicubic splines and their derivatives using a loop over scipy.SmoothBivariateSpline objects in the CPU code. This is expensive (~150 microseconds for ten Sersic radii) and should be sped up to make sure proposals can be generated fast enough for the CPU.