Code Monkey home page Code Monkey logo

empiricisn's People

Contributors

drphilmarshall avatar jbkalmbach avatar tholoien avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

empiricisn's Issues

validating empriciSN

@tholoien @drphilmarshall
As we have discussed before, we are very interested in this methodology for both catalog and image simulations. Therefore validating this method and maybe retraining it (but trying to keep most of the methodology in place) is important. Since you had said that there were no plans for a validation paper, some of us in DESC would like to validate and retrain this model. @kponder produced a first plot of something we wanted to check (and we have discussed before) : since the SN properties trained on galaxy properties are not intrinsic properties but a mixture of intrinsic and observed properties, is the very important expected redshift distance relationship preserved? While the current result is extremely preliminary, and we need to spend more time on it, it seemed that there are larger residuals that we would like from this relationship.

This and several other quantitive questions require further investigation. @kponder, I and perhaps other collaborators intend to create lists of such issues, investigate them and check if we can improve the model. We would also like to record the results as they progress in an ongoing LSST DESC note. Obviously, we might try to ask you questions as this moves along.

What is a good place for us to do this in terms of a github repository. Would you prefer that we used this repository (and merge with master as things move along), or woudl you prefer that we set up own own fork altogether?

Add demo notebook

Add a demo notebook to demonstrate the capabilities of the class.

DES Supernovae

We need access to photometry of supernovae and their hosts from DES, if available. Do we have any updates about the state of a DES SN catalog?

K-correction and extinction correction

This might be a bit obvious, but I just want to make sure we're on the same page: we want to apply K-corrections and Galactic extinction corrections to the host galaxy magnitudes, right? I'm typically working with low-redshift stuff so I haven't had to worry much about K-correction, but it seems like we should put all the host mags into their rest-frame values if we want to compare them.

Units for master data sets

Hi,

I'm using EmpiriciSN to generate light curves for the Buzzard simulated galaxies in an attempt to make a large simulated dataset of SN1a + hosts in prep for LSST.

I want to make sure all the host galaxy parameters I'm feeding in are in the same units and defined the same way as the training set. Where I can find a list of the units and definitions for all of the parameters in the sdss_master and snls_master CSV files?

Thank you!

Perform XD Modeling

Once the data is all gathered and all necessary quantities have been calculated and stored in master data files, we want to model the distributions of properties using Extreme Deconvolution (XD) methods.

The current plan is to use astroML to do this. Any changes to this plan will be noted here.

Requirements for Twinkles

We are still in the process of planning the Simulation inputs for SN in TWINKLES 3. We are interested in keeping things somewhat realistic, but this is to be able to evaluate questions we want to in situations which are expected to arise (kind of like you would want to have support in regions of parameter space that are important if you plan on correcting things using importance sampling).
We will have a set of galaxies (we can get stellar, halo masses, morphology in terms of sersic indices of 1 and 4 (ie., disk or bulge) in catsim, along with measures of size (half light radius, angle of axis.

We want to populate some of these galaxies with SN with known input parameters. These input parameters are:

  • matching of galactic hosts. Ie. Given a large set of galaxies at different redshifts, positions and other parameters, which of them are likely to host more SN? Don't know if this is coming out of EmpericiSN, if not we can use rates (or our own ideas of how many we want to do this). Then, given a set of galaxies and SN, can we match them up?
  • positional (redshift, ie. must match host, ra, dec must be relative to host and match orientation)
  • SALT parameters t0 (uniform), x0, x1, c, z

There are different levels at which we can use EmpericiSN
It is possible to take the positional parameters (which are relatively more important here, and supply other parameters from external distributions) or do both for EmpericiSN.

In order to make progress, here are the steps I think we need:

  • Decide what are the smallest set of galaxy parameters for which EmpericiSN can make good predictions for SN. Which SN parameters ? Only positional, only SALT, both?
  • Let us say we do only positional parameters: What we need is to take the galaxy parameters as input (Here is the relevant schema) and return a distribution (or rather samples) of SN positions. Notice that this has a bulge (disk) ra, dec, and bulge (disk) semi-major axis. along with sersic indices of both the components and position angle of the disk. Given these parameters, we get a surface brightness of the galaxy. Can you output a set of SN sample positions for each of these? Can you output the probability of a SN to occupy these hosts during a fixed time interval? This will mean outputting a radius (which I think you already have) and an angle.
  • Plot a comparison of the positions obtained this way to positions obtained by sampling the ellipse. Again, I think this comparison is important to demonstrate what EmpericiSN can do, and I would expect to see such (or some variation of this plot) in a paper trying to do what EmpericiSN is doing.

It might be ok even if we feel that the distribution is not terribly realistic (I don't know what the most realistic thing here is), what I want to know is what the distribution is like, and whether we have to add SN in some region to provide samples of SN in those kinds of areas.

Train Model

Split the SN sample into training and validation sets and use this to train the model. Use cross-validation to determine the right number of parameters to incorporate and measure the accuracy of the model.

Improve README

I've put a basic explanation of what we're trying to do in the README, but it could probably use some improvement. In particular, we should add a copyright/disclaimer to it like the one Phil showed me.

TravisCI demo failure

Hi Phil,

Looks like TravisCI is failing when it gets to the step of running the demo notebook. Looks like it's maybe a problem with jupyter on Travis? Could you take a look?

Thanks.

XDGMM Class

Create XDGMM class that incorporates the astroML and Bovy et al. fitting methods and is a wrapper for the scikit-learn Estimator class so that it can be used with the scikit-learn machine learning (cross validation) methods.

EmpericiSN methodology and datasets

x0 values used.

I am using the SDSS supernovae classified as "SNIa" (spectroscopically confirmed) or "zSNIa" (photometrically identified with a host redshift). I am hesitant to change the SALT parameters if it means drastically reducing the size of the dataset, since it's already a little on the small side, and I want the SALT parameters all coming from the same source for consistency, so my inclination would be to not change them if I can't find them for the whole sample.

For the x0 parameter, the SDSS ones come from the Sako et al. dataset, while I actually calculated the SNLS ones myself. The SNLS source I used provided x1, c, a redshift, and a peak rest-frame B-band magnitude, so I used SNCosmo to calculate the x0 parameter from those.

OK there are a few issues tangled up here:

Consistency: Well, the methodology you describe actually obtains parameters from multiple sources here, so, it is not consistent in the way you desire. In fact, it is probably worse than you imagine (but I would not blame you for that). Here are some of the issues involved:

  • Sullivan et al. used an older version of the SALT2 model (Guy10) than what is the default model (Betoule et. al., 2014) in SNCosmo and was used in JLA, or the SDSS data release (Sako et. al., 2014). So, while the parameters x1 and c have the same names, they do not mean the same things (and is a bit like measuring the same thing with two different but similar units).
    • JLA values are better than the Sullivan et al. values. People had just learned more by then. In particular, we had a slightly different model, but more importantly, improvements in calibration that shifted the parameters in correlated ways leading to important differences in cosmology. I do not necessarily believe that this particular improvement will necessarily be important for you as well.
    • As far as the JLA sample of SNLS supernovae goes, it is almost the same as the Sullivan sample. There were a handful (~ 4) SNLS + low z SN that did not make the JLA sample.
    • x0 as defined in SDSS (Sako et. al., 2014) is about a factor of 1.3 different from x0 defined in JLA/SNCosmo . This is due to conventions, not due to differences in software. So, you have to be careful in collating x0 values from SDSS, SNCosmo and SNLS/JLA sources. A possible out, is using that factor to modify the SDSS SN parameters to make them equivalent to JLA/SNCosmo.
    • The data SDSS SN that are photometrically identified tend to be of a much lower quality than the other SNIa. It certainly includes several CC SN as is apparent by the Hubble Diagram of the sample. Selection biases are a tricky question, particularly in ML enterprises, and I think this should be addressed separately. I would assume that aside from justifying selection bias answers, you will be asked why you did not take photometrically identified SNLS SNIa (I think that is Bazin et. al, 2009)
      since you did take SDSS SNIa of the kind. (Don't know if Bazin had host z, that may be a good distinction).

I believe everyone involved will agree to the above statements, they are not very controversial. Still, I think the information about the changes is hardly posted out there in a way anyone can obtain it. This is one of the problems we have.

x0 or Mabs ?

To get a feel for this question, I would like to see a plot of z, vs {x0, x1, c} for a set of SN with parameters and abundances drawn from a galaxy distribution. I can help on this if you want.

Remake Plots

Since we are updating the data files, the plots will need to be remade to show SALT2 parameters for the supernovae and show surface brightnesses in other bands.

Including Error in sampling from conditionals

While adapting the empiricSN demo notebook for use in Twinkles (see issue #310). I noticed that including error in sampling from the conditional distribution was not working. I have found the bug and will submit a pull request.

SDSS Supernovae

It looks like SDSS should have a fairly large catalog of supernovae and host galaxies, based on Sako et al. (2014): http://arxiv.org/abs/1401.3317

However, the catalog website linked off the arXiv page (http://sdssdp62.fnal.gov/sdsssn/DataRelease/index.html) appears to be down, so I'm not sure how to access the full catalog information.

I've sent Sako an email asking him about it, but have yet to hear a response. Does anyone else have a suggestion for others to talk to or another way to access the data?

Build a Fitting Tool

Build a simple tool to use the trained model to produce realistic SN parameters based on input host parameters (from CatSim).

We could also have this tool produce light curves; once you have the SALT2 parameters it is simple to generate light curves using SNCosmo.

Master Plan

This is a basic roadmap for the tasks that need to be completed. It will be updated to link to sub-issues and to mark progress.

  • Gather SN and host data for the SDSS and SNLS samples
    • Change SNLS parameters from SALT to SALT2 (#7)
    • Change K-correction to use kcorrect (#14)
  • Sort data and calculate quantities to be used in the modeling, remake master data files (#4, #9)
  • Make plots comparing the various SN and host parameters (#10)
  • Perform XD modeling (#11)
  • Build XDGMM wrapper class that can be used with scikit-learn machine learning methods (#15)
  • Train the model and use cross-validation to find best number of parameters and estimate the model's reliability (#12)
  • Build a simple tool to be used to generate supernova light curves based on the model given CatSim parameters (#13)

Storing the datasets

Phil mentioned it's not good practice to store the actual datasets on github. (Though I think they will be fairly manageable filesizes in our case.) What is the best way to store them for easy sharing? Dropbox?

SALT2 Parameters for SNLS SNe

I found a more recent set of light curve parameters for the SNLS supernovae that fit the light curves with SALT2 instead of SALT, which I had before. I was previously converting the SDSS SALT2 parameters into SALT ones to fit the SNLS data, but I think it will be better to just grab these SALT2 fits for the SNLS supernovae and have everything using those parameters. Simulation code like sncosmo should be able to produce light curves using only the SALT2 stretch and color parameters and the redshift, so no peak mag will be needed.

Creating this to track the gathering of the SALT2 data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.