lsstdesc / descqa Goto Github PK
View Code? Open in Web Editor NEWA validation framework and tests for mock galaxy catalogs and beyond
Home Page: https://portal.nersc.gov/cfs/lsst/descqa/v2/
License: BSD 3-Clause "New" or "Revised" License
A validation framework and tests for mock galaxy catalogs and beyond
Home Page: https://portal.nersc.gov/cfs/lsst/descqa/v2/
License: BSD 3-Clause "New" or "Revised" License
We propose to create a new subdirectory, named descqagen
, which will host code that generates validation datasets, for example, code to query the HSC database.
descqagen
would hence be part of, but mostly independent from other components of the DESCQA framework.
@duncandc agrees to be the first volunteer to push his code in.
See LSSTDESC/DC2-production#21 for detail and validation data from HSC.
Tests should refrain from customizing plotting styles (colors, fig size, font size, etc). This will help the plot style look more consistent among different tests. If later we want to adjust plotting styles, we can do it for all tests in descqa/plotting.py
.
Each test can still do minor adjustments (e.g., mark styles and sizes and line styles) to fit its need.
Moving the conversation started in #10 here. The idea is that we would like to make sure that the CL WG goal of investigating miscentering in DC2 is possible by validating the radial profiles of cluster members.
One way to do this is just to directly measure the radial profiles in cluster mass halos. This should be easy to implement, but is going to be difficult to compare to data, since and profiles measured there will have significant projection effects.
The other thing to do is to measure color dependent clustering, which will be easier to find validation data for, but less directly tests what we care about.
This issue that @evevkovacs encountered was due to the fact that the color cut implementation in the CLF test assumes the catalog does not iterate over redshift blocks. When a catalog iterates over redshift blocks, some blocks may not have the redshift ranges needed to determine the color cut, and hence results in error.
This can be fixed if the test first determine the color cut using all blocks.
Since these are related, we should check that the shears, convergence and magnification are self consistent (within machine precision). The test is that magnification should be satisfy
'1/magnification = (1-kappa)2 - shear12 - shear2**2)'
(kappa = convergence)
@yymao Please take a look at https://github.com/evevkovacs/descqa/blob/ready_SEDS/SED_test.ipynb
The distributions are not as unreasonable looking as I expected, if we convert the SEDs to magnitudes. I am thinking that we can add a keyword to the yaml file such as function: -2.5*np.log10, which tells the readiness test to first evaluate the function on the catalog data before plotting it or computing any other statistics. What do you think?
I'm going to create a tagged version of DESCQA for Run 1.1 (while the DESCQA is not directly used in the run, it documents the validation tests that the catalog going it has passed, so I think it is good to have a tagged version too.)
Is there any issues/PRs should be resolved before we make this tag? Maybe #60 and #64? Do we think it is reasonable to close them before Friday? Or should we leave them for the next tag (i.e., not included in Run 1.1)?
@erykoff have made some plots on red sequence colors (mean, scatter) as a function of redshift for protoDC2 and compared them with DES data.
@erykoff, can you share your plots here, and then we can continue to make a validation test from what you have done?
As discussed in #40, we open this issue to discuss implementing a color-color diagram test in DESCQA. According to @janewman-pitt-edu this would not be a required test but it would be nice to have for visual inspection. (cc'ing @morriscb @sschmidt23)
@nsevilla, it seems to me that you have implemented this test to some extent. Will you be able to port it into DESCQA? Let me know if you need any help.
The purpose of this issue is to understand the shape of the distribution of galaxy stellar masses in protoDC2 catalogues as we have been discussing this with @reneehlozek. In the previous version, there was a small bump at about 10^10 solar masses that was not obvious to me why it was there. In the most recent version, the bump is still there as well as another skewness at lower masses around 10^8 solar masses. Maybe this second bump is because that the peak of the distribution was about 10^8 solar masses in version 2 while the peak is currently at much lower masses around 10^(6-7) solar masses. The histograms are shown below:
I checked the stellar mass distribution from SDSS galaxies from Maraston et al. 2013 that plotted the distributions for SDSS BOSS and CMASS galaxies that were fit with different templates. These distribution do not have any obvious bumps. The figures from the paper are shown below for BOSS and CMASS catalogues:
I also know that the CMASS galaxies are more biased toward higher mass galaxies but it would be good to know why protoDC2 galaxies peak at such small stellar masses.
@EiffL, @patricialarsen, and @jablazek have been working on getting 2pt correlation (e.g. ellipticity-direction) for IA model testing.
Can one of you list the items that we should test? And we can see if we need separate one issue for each of them.
Some techniques used here are related to #10 #35 but the purpose and validation datasets would be different.
P.S. @patricialarsen I cannot assign you. Please register your GitHub account to the DESC roster.
@cwwalter requested in #127 a test to check the amplitude of tree rings in 1.2i. We have code from @karpov-sv here: https://github.com/karpov-sv/lsst-misc/blob/master/Tree_Rings_Analysis.ipynb
We'd need to adapt this code to use the e-image reader in GCR. @andrewkbradshaw, would you like to try to adapt this code to DESCQA?
Currently the redshift distribution test, N(z), only goes to z = 1. We should extend the redshift range to z = 3 to make sure things look reasonable beyond z > 1. This is particular important for cosmoDC2.
The validation data is valid in the redshift range up to around z = 1.5, so we can plot the redshift range to z = 3, but don't use the data beyond z = 1.5 to calculate the chi^2.
Users should have a way to query the meta-data to find out definitions such as the ellipticity definition etc., or just to see what meta-data is available.
I know @evevkovacs is working on the SMF test so I opened this issue to track progress.
This is to test if the catalog follow the shear sign convention.
As proposed in LSSTDESC/cosmodc2#19, we'd like to add the functionality in the readiness test to check uniqueness for columns that contain unique identifiers, such as galaxy_id
and halo_id
.
@rmandelb is the idea to do a conditional luminosity function using true halos?
This is a required validation test from the WL WG @msimet. The distribution of position angle should be a uniform distribution.
This is a very simple test and is certainly a new-comer-welcome task.
This is a required validation test from the WL WG @msimet. This test is to check the distribution (mean, scatter, etc) of galaxy sizes. This test is somewhat related to the size-luminosity test #13.
Proposed validation data set is COSMOS.
I believe @evevkovacs is already working on this. This issue is to track progress.
The purpose of this test will be to check the consistency of the instance catalog creation and image generation with the input quantities in the protoDC2 catalog. This can be done by generating a small instance catalog and one exposure with ImSim and checking the measured sizes and ellipticities of the galaxies in the field. The statistics will not be very high but it will catch obvious failures.
This test will also be able to check for obvious failures in the model for complex morphologies slated to be included in the ImSim version of DC2
See LSSTDESC/DC2-production#31 for detail.
DC2 validation test brainstorming by @rmjarvis and @fjaviersanchez:
Image level (@rmjarvis):
Check that the images contain some pixels above 10sigma level.
Calculate gain and read noise and compare with prediction.
Check masked (saturated) bits of the images.
Check masked (bad/dead) pixels -> PhoSim.
Catalog (visit level):
use stars and use PSFmag to compute the CheckAstroPhoto
test. (Using standalone test check in DC2-production #259). Update 09/09/18: Done in standalone code.
size stars vs magnitude at different epochs should be flat (use HSM size/sdssShape). Use scatter plot for every single star. Update 09/09/18: Done in standalone code.
given a calexp, select a clean stellar sample, check the PSF on each location (position of the star) and check the stacked difference (low priority).
select a set of calexp
s and check that the input seeing is correlated with the size of the stars appearing in them: Update 09/10/18: Done in standalone code
DCR test: translate the shape of the star to get the shape on the zenith direction for a bunch of good stars, separate per band, and check this as a function of airmass.
DCR test: repeat that splitting the sample into redder and bluer stars.
Catalog (coadd level):
Separate stars and galaxies and use them in .CheckAstroPhoto
In CheckAstroPhoto
add the input N(m) and the output N(m), check ratio and see when they start to separate from each other (in progress, see here).
Check that galaxy density decreases with MW extinction (#140).
Check color-color diagram for input and output for several colors (inspect to validate) -> (#141)
Add input-true size as a function of true size.
Count the number of objects around a central galaxies in a given aperture (1 arcmin) and represent that as a function of the cluster richness (something in the input???).
To help future maintainers, we will create flowcharts to illustrate the code structure of descqarun and descqaweb. The flowcharts will provide a high-level picture of how different components of the DESCQA framework are connected.
(cc'ing people who might be interested in this issue: @tomuram @evevkovacs @ehneilsen)
During the Sprint Week, @DouglasLeeTucker and @saharallam have made progress on creating color magnitude diagrams for GCR Catalogs (protoDC2 and Buzzard).
@DouglasLeeTucker and @saharallam, can you share some plots you made with us here, and then we can continue to make a validation test from what you have done?
Now that we have a readiness test in place (you can see a demo here), we need to finalize the acceptable ranges for each quantities in protoDC2, especially those used by the image generation.
All the acceptable ranges are set in this YAML config file. @evevkovacs and I have put in many quantities, but I don't think we have exhaust them. Some of them do not have have acceptable ranges specified.
Once this is done, it means we can sign off protoDC2 2.1.2. I assign this to @evevkovacs and @dkorytov but I think we might also need help from @danielsf and @abensonca. Also cc'ing @rmandelb @katrinheitmann @jchiang87 for their information.
As discussed in #10 and #63, we'd like to make sure the catalog has reasonable color-dependent galaxy-galaxy clustering signals.
This should be relatively easy to implement: we can use the current galaxy-galaxy clustering and just implement color-selection. We can use SDSS data as validation data. Validation criteria TBD.
This issue is to track the progress on the implementation of the test that checks the density as a function of the MW extinction. In principle, the test will be most relevant for the DC2 object catalogs. However, in principle, it can be applied to any catalog.
This issue is for some bugs (at least two) in the dN/dz test. @yymao and I have been discussing this on slack, but it needs a bit more digging, so we agreed to open an issue (currently assigned to the two of us).
At the DC2 telecon today it was clear that the errorbars on the dN/dz test were wrong (an order of magnitude larger than the scatter in the data points). Yao and I identified a few things going on:
This is not a bug, but simply misuse of the test: the latest runs were trying to set the number of z bins using the wrong parameter, so it wasn't getting set (it was setting Nbins
when it should have been N_zbins
). The configuration being used was such that the errorbar determination could not be particularly stable -- it should be the case that number of jackknife regions > number of data points. This was not the case with the settings that were being used. He and I have since been exploring use of 15 regions and 8 redshift bins, which should be more stable.
One bug is that when normed=True
, the errorbars have some problems (see run here with the above configuration issues fixed). I suspect the histogram normalization scheme may be inconsistent between the jackknife calculation and the overall histogram, which would cause the covariance to be misestimated.
Another bug is that when normed=False
, the data and sims have inconsistent histogram normalization schemes, so they cannot be compared. You can see this in another run here. However, note that in this case the errorbars on the simulation histogram actually seem a lot more reasonable! So this justifies the conclusion above that the normed=True
plots have a bug in the errorbar calculation.
For testing IA models, it would be useful to test the 2-pt functions. We plan to use treecorr for the calculation. We plan to start with using analytical calculation as the validation dataset (@jablazek @patricialarsen). (Edited by Rachel to reflect the focus on cosmological shear-shear, not intrinsic alignments.)
To create a new test result category "for reference" (needs a fancier name), for tests that may generate plots and outputs but do not intend to implement specific validation criteria, and not sensible being called passed/failed.
See more details in LSSTDESC/DC2-production#20.
Note that the wp(rp) code in v1 does not work on light cones. When we have proto-dc2 snapshots we can use the old code. In the meantime we should find new correlation code for light cones.
There have been some conversations about potentially using the CosmoDC2 input catalogs to also seed WFIRST image simulations, in order to enable a tests of doing the joint analysis of images from both surveys. Along the way, it has become evident that the current selected population is overall around 1/4 to 1/2 magnitude too bright in J and H bands.
In the future, we should include NIR and maybe other colors in the selection criteria (cf. LSSTDESC/cosmodc2#36) to make sure we aren't producing a subtly unrealistic galaxy population. This should be coupled with corresponding tests in descqa that things like N(m) and colors are reasonable for non-optical bandpasses.
The DC2 sprinkler will need a strongly lensed AGN catalog and a strongly lensed SNe catalog. We need to write a reader and some tests for verification of these catalogs.
See this slide for context. We want to make sure the truth catalog faithfully reproduce the galaxies in the extragalactic catalog. This verification test will simply compare the galaxy properties in the truth catalog against the extragalactic catalog.
Since both extragalactic and truth catalogs are now available in GCRCatalogs. This should be pretty straight-forward.
(cc @fjaviersanchez @danielsf @jbkalmbach @katrinheitmann @evevkovacs)
As discussed in #10, the PZ WG wants to check if the galaxy bias has reasonable redshift dependence. @morriscb @sschmidt23 can provide validation data and criteria. The test itself still need to be implemented.
This issue is for developing tests to understand how well the gg lensing signal changes as a function of lens properties (stellar mass, luminosity, colors etc.).
A few references that are worth exploring:
This would be similar to this earlier test: https://github.com/LSSTDESC/descqa/blob/master/descqa/DeltaSigmaTest.py#L51, but we like it to be more general...
Now that we are preparing for Run 1.1, we should make sure there is no obvious bug (like incorrect units, labels, etc) in the catalog before the catalog goes into the image pipeline. @evevkovacs @katrinheitmann and I think it would be a good idea to have a "readiness test" that checks if the quantities that are used by CatSim have reasonable distributions (for example, max, min, mean, std, and histogram for visual inspection).
This test needs to be done by 1/19 (hopefully well before that).
Incorporate into DESCQA the work done in PZ WG to plot BPT diagram for emission lines.
Make DESCQA an importable Python package to make developing validation tests more easily and conveniently. Here's the proposed module structure:
descqa/
__init__.py
master.py
archiver.py
tests/
__init__.py
base.py
utils.py
...
configs/
@patricialarsen (cc @EiffL) following the update to the shear sign convention in the GCRCatalogs (see
LSSTDESC/gcr-catalogs#111) we should make sure the shear-related tests also follow the same convention (that is, the treecorr/GalSim convention).
We need to, for example, remove the minus sign in
https://github.com/LSSTDESC/descqa/blob/master/descqa/shear_test.py#L172
https://github.com/LSSTDESC/descqa/blob/master/descqa/shear_test.py#L273
(and maybe other places; I haven't done an extensive check).
I was looking at one of the DESCQA runs here and was wondering if it is possible to have the actual COSMOS catalog to test the scaling of the ellipticity distribution with type and luminosity (using more absolute magnitude bins and different magnitude cuts).
Some AGNs are excessively bright and cause issues in image simulation, hence, we should implement a validate test to filter out the instance catalogs that contain offending AGNs (magNorm
being too small).
We can apply the test to /global/projecta/projectdirs/lsst/production/DC2/Run1.2p/phosimInput/
One example of instance catalog that contains offending AGNs is DC2-R1-2p-WFD-u/000054/
.
(cc @katrinheitmann)
As it is now, the test checks for mag_lsst
. We can do a quick update to CheckColors so it is compatible with the DM outputs.
This epic issue serves as the general discussion thread for all validation tests on the extragalactic catalogs in the DC2 era.
Note: Please feel free to edit the tables in this particular comment of mine since we will use them to keep track of the progresses of validation tests
➡️ Required tests that we have identified (for DC2):
Test | WGs | Implemented | Validation Data | Criteria | "Eyeball" Check by WG | Issue |
---|---|---|---|---|---|---|
p(e) | WL | ✔️ @evevkovacs | ✔️ COSMOS | ✔️ | ✔️ (WL @rmjarvis ) | #14 #81 |
p(position angle) | WL | ✔️ @msimet | ✔️ uniform | ✔️ | ✔️ (WL @msimet) | #76 #82 |
size distribution | WL | ✔️ @msimet | ✔️ COSMOS | ✔️ | ✔️ (WL @msimet) | #77 #80 |
size-luminosity | WL | ✅ @vvinuv | ✔️ vdW+14, COSMOS | ✔️ | ✔️ (WL @msimet) | #13 #56 |
shear 2pt corr. | WL | ✔️ @patricialarsen | ✔️ camb | ✔️ | ✔️ (WL @patricialarsen) | #35 #54 |
N(z) | PZ, LSS | ✔️ @evevkovacs | ✔️ DEEP2 | ✔️ | ✔️ (PZ @sschmidt23 ) | #11 #107 |
dN/dmag | WL, LSS | ✔️ @duncandc | ✔️ HSC | ✔️ | ✔️ (WL @rmandelb ) | #7 #47 |
red sequence colors | CL | ✅ @j-dr | ✔️ DES Y1 | ✔️ | ✔️ (CL @erykoff ) | #41 #101 |
CLF | CL | ✅ @chto | ✔️ SDSS | ❔ | ❔ | #9 #102 |
galaxy-galaxy corr | WL, LSS, TJP | ✔️ @vvinuv @morriscb | ✔️ SDSS, DEEP2 | ✖️ | ✔️ (LSS @slosar ) | #10 #38 |
color-dependent clustering | CL | ✔️ @yymao | ✔️ SDSS | ✖️ | ✔️ (LSS @slosar ) | #73 #100 |
galaxy bias(z) | PZ | ✅ @fjaviersanchez | ✔️ CCL | ❔ | ❔ | #75 #87 |
color distribution | PZ, CL, LSS | ✔️ @rongpu | ✔️ SDSS, DEEP2 | ❔ | ✔️ (PZ @sschmidt23 ) | #15 #89 |
shear-galaxy corr. | TJP, WL | ✔️ @EiffL | ✔️ SDSS | ❔ | ❔ | #118 |
stellar mass function | - | ✔️ @evevkovacs | ✔️ PRIMUS | ❔ | ❔ | #49 |
cluster stellar mass distribution | CL, SL | ✅ @Andromedanita | ✔️ BOSS, CMASS | ❔ | ❔ | #109 |
color-color diagram | PZ | ✔️ @nsevilla | ✔️ SDSS | ❔ | ❔ | #74 #88 |
➡️ Tests that are not currently required but good to have:
Test | WGs | Implemented | Validation Data | Validation Criteria | Issue |
---|---|---|---|---|---|
color-mag diagram | PZ, CL | ❓ @DouglasLeeTucker @saharallam | ✔️ SDSS / not required | not required | #40 |
Cluster radial profiles | CL | ❔ | ❔ | ❔ | #63 |
IA 2-pt corr. | TJP | ✖️ @EiffL | ❓ @jablazek | ❔ | #42 |
emission line galaxies | PZ, LSS | ✖️ @adam-broussard | ❓ DEEP2 | ❔ | #12 |
Analysis WGs are encouraged to join this discussion and to provide feedback on these validation tests. This epic issue is assigned to the Analysis Coordinator @rmandelb, and will be closed when the Coordinator deems that we have implemented a reasonable set of validation tests and corresponding criteria for DC2.
@yymao, @evevkovacs, and @katrinheitmann can provide support to the implementation of these validation tests in the DESCQA framework. In addition to GitHub issues, discussions can also take place on the #desc-qa channel on LSSTC Slack.
P.S. The corresponding issue in DC2_Repo is LSSTDESC/DC2-production#30
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.