Code Monkey home page Code Monkey logo

jwst_validation_notebooks's Introduction

JWST Validation Notebooks

Build Status STScI

This repository contains jupyter notebooks that are used to perform validation testing of the JWST Calibration Pipeline (hereafter referred to as the calibration pipeline). These notebooks are structured to capture documentation, code, tests, figures, and other outputs that JWST instrument teams use to validate data from the calibration pipeline.

This repository is integrated with the JWST pipeline software repository and the Jenkins automation server. To see most recent build status, go to the STScI Jenkins server.

Current Validation Suite

To see the current suite of validation notebooks, visit our website. Please note that the website is currently only available to internal STScI staff who are on the VPN. Contact Misty Cracraft (@cracraft) or Alicia Canipe (@aliciacanipe) for questions or access issues.

Executing Validation Notebooks Locally

You must be an internal user in order to execute the Validation Notebooks locally, because the test data is only available internally. In order to execute the notebooks or test a new notebook, you can use the following setup:

  1. Clone the validation notebooks repository: git clone https://github.com/spacetelescope/jwst_validation_notebooks.git
  2. Open a terminal window into the newly created directory
  3. Create the Validation Notebooks conda environment: source setup_environment.sh

You should now be able to run the notebooks with jupyter (type jupyter notebook), or test the creation and running of the test suite by typing python convert.py.

Opening and Running the Notebooks

Starting the Jupyter server

Start jupyter with:

jupyter notebook

This will open a jupyter instance in your web browser, where you can access the notebooks by selecting the jwst_validation_notebooks folder.

Notebook Home

From there, you can select the specific testing directory and notebook. jupyter notebooks have an .ipynb extension.

Selecting a Kernel

To change the kernel you are using, select the Kernel drop down button in the top left corner of the notebook and hover over "Change Kernel".

Select Kernel

From there, you can select the conda environment kernel that includes your JWST pipeline installation. Then, you should be able to execute the notebook. For more information about jupyter notebooks, see the Jupyter Notebook documentation. There is also a handy cheat sheet with shortcuts and commands.

Running Notebooks with nbpages and convert.py

If you would like to generate HTML outputs locally, make sure you are in the jwst_validation_notebooks repository and execute the following commands:

python convert.py

In order to get a full list of run instructions, run

python convert.py --help

There are, however, a few flags that could be especially useful, and a few notes about convert.py which need to be kept in mind.

  • In the main notebook directory is a file named exclude_notebooks. This file is currently passed to the --exclude flag if that flag is not set at the command line. As such, it is not currently possible to use the --include flag when running from the main notebook directory.
  • To run a subset of notebooks, use some combination of the --notebook-path command-line option to only run the notebooks in a particular directory and the --exclude option to avoid running particular individual notebooks.

Contributing

New Notebooks

Prior to contributing to jwst_validation_notebooks development, please review our style guide. Note that notebook cell outputs must be cleared prior to submission.

Make sure to follow the template outlined in our repository. More information about storing test data is included below.

Before your notebook will be accepted, you must test it in the proper environment described in the Executing Validation Notebooks Locally section above. This will help ensure a smoother delivery of new tests.

This repository operates using the standard fork and pull request github workflow. The following is a bare bones example of this work flow for contributing to the project:

  1. Create a fork off of the spacetelescope jwst_validation_notebooks repository.
  2. Make a local clone of your fork.
  3. Ensure your personal fork is pointing upstream properly.
  4. Create a branch on that personal fork.
  5. Make your notebook changes and be sure to clear the outputs from the cells.
  6. Push that branch to your personal GitHub repository (i.e. origin).
  7. On the spacetelescope jwst_validation_notebooks repository, create a pull request that merges the branch into spacetelescope:master.
  8. Ensure that the pull request passes the continuous integration check.
  9. Assign a reviewer from the team for the pull request (Misty Cracraft @cracraft or Alicia Canipe @aliciacanipe).
  10. Iterate with the reviewer over any needed changes until the reviewer accepts and merges your branch.
  11. Iterate with the reviewer over copying your test data into either Box or Artifactory.
  12. Delete your local copy of your branch.

Temporary Directory

In order to avoid conflicts between multiple notebooks in the same directory (especially when being run by an automated process), the template notebook contains a cell that sets up a temporary directory and moves the notebook execution into that directory. Even if you don't start your notebook as a copy of the template, you should copy this cell. For development purposes, you may wish to set the use_tempdir variable to False, but when you are ready to submit the notebook in a pull request, please change it to True.

CRDS Cache Location

The Jenkins instance is running on a virtual machine inside STScI, so it works best with its CRDS cache set to "/grp/crds/cache", but especially when working over the VPN this location may not work best for you. In order to use a local CRDS cache, set the CRDS_CACHE_TYPE environment variable to "local" (e.g. export CRDS_CACHE_TYPE=.local). This will tell CRDS to cache files in the directory ${HOME}/crds/cache

New Test Data

If you have a notebook that uses updated test data or new test data, follow the steps below to request a data update.

Artifactory Workflow

Artifactory should be used for data that is for internal use only.

  1. Create a Jira "Task" Issue in the JWST Simulations Jira project requesting to have your data added to Artifactory. Assign the ticket to Misty Cracraft (@cracraft) or Alicia Canipe (@aliciacanipe), and provide more information about the data: simulation information, data location, and pipeline step(s). Once your data has been added to Artifactory, Misty Cracraft (@cracraft) or Alicia Canipe (@aliciacanipe) will resolve the issue and notify you that your data is ready to be used (the full path to the data will be provided by the person who notified you that your data was ingested successfully).

  2. Make sure you have the proper OS environmental variable set to access STScI's instance of Artifactory. This can be done via command line or put into a setup file like a .bash_profile file. If you are working in the jwst_validation_notebooks environment, your environment will be set up automatically.

    export TEST_BIGDATA=https://bytesalad.stsci.edu/artifactory/
    
  3. Make sure your environment has ci_watson installed. This is done automatically by the jwst_validation_notebooks environment.

    pip install ci_watson
    
  4. In your notebook, import the ci_watson package needed.

    from ci_watson.artifactory_helpers import get_bigdata
    
  5. Read in each file stored in Artifactory (the full path should have been provided by the person who ingested the data).

    satfile = get_bigdata('jwst_validation_notebooks',
                          'validation_data',
                          'jump',
                          'jump_miri_test',
                          'miri_sat_55k.fits')
    
  6. Follow the normal workflow for contributing a notebook once you have confirmed that your notebook is running successfully.

Box Folder Workflow

Artifactory is only accessible to internal users on the STScI network. If you would like to contribute a test notebook that uses externally available data, this test data should be stored in our Box folder (jwst_validation_notebooks) instead.

  1. Create a Jira "Task" Issue in the JWST Simulations Jira project requesting to have your data added to the Box folder. Assign the ticket to Misty Cracraft (@cracraft) or Alicia Canipe (@aliciacanipe), and provide more information about the data: simulation information, data location, and pipeline step(s). Once your data has been added to Box, Misty Cracraft (@cracraft) or Alicia Canipe (@aliciacanipe) will resolve the issue and notify you that your data is ready to be used (the Box link to the data will be provided by the person who notified you that your data was ingested successfully).
  2. Then, in your validation notebook, you will use the following command to import your file from Box (we are using an example file link, you will substitute yours):
from astropy.utils.data import download_file

your_file_box_url ="https://stsci.box.com/shared/static/tpks98b3voqg7r13jt8i6el3yfg9dqoc.fits"
file = download_file(your_file_box_url)

Box assigns a default alphanumerical string as the filename, so you may want to update the filename before processing, or verify that the format is correct. Depending on the data, you can try:

# open file into correct format and write to local disk for processing
with fits.open(file) as hdu:
  hdu.info()
  hdu.writeto(filename)

or use a jwst datamodel:

from jwst.datamodels import RampModel
model = RampModel(file)
model.save(filename)

Code of Conduct

Users and contributors to the jwst_validation_notebooks repository should adhere to the Code of Conduct. Any issues or violations pertaining to the Code of Conduct should be brought to the attention of a jwst_validation_notebooks team member or to [email protected].

Questions

For any questions about the jwst_validation_notebooks project or its software or documentation, please open an Issue.

Current Core Development Team

jwst_validation_notebooks's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

jwst_validation_notebooks's Issues

Update for archiving test results

Another follow-up to this. We’re going to need a way to archive the notebook results for a particular release. Maybe we can have both:

jupyter nbconvert --to HTML notebook.ipynb
and
jupyter nbconvert --to pdf notebook.ipynb

commands in our nbconvert function. That would allow us to store the PDFs in central store.

Adding regression test notebooks on hold

Regression test notebooks are staged here: /grp/jwst/wit/nircam/canipe/validation_notebook_staging/

There are a couple outstanding questions to investigate so I'm not submitting a PR yet.

1. New environment variable needed: export TEST_BIGDATA=https://bytesalad.stsci.edu/artifactory DONE
2. These access the pipeline artifactory instance. I'm not sure if this will work with our infrastructure.
3. They take a long time to run.
4. These depend on the same pytest update as the unit tests: https://jira.stsci.edu/browse/JP-1881
5. These also require the same extra packages as the unit tests (see PR #63 )
ipython
pytest-xdist
pytest-html
pip install -e .[test,docs]

Permissions Set for Website html Files

Permissions are set in Jenkins pipeline here for files that are synced over to the website directory. Currently, the files permissions that are applied don't match file permissions in the website directory.

The permissions are set by user iraf and owned by iraf in the website directory. I am concerned that if the permissions are set incorrectly, they could be could become visible by outside users and not writable by people within the institute.

Fix environment.yaml to only installed directly imported packages

Looking at

- pip:
- asdf>=2.7.1
- astropy>=4.1
- crds>=10.3.1
- drizzle>=1.13.1
- gwcs>=0.16.1
- jsonschema>=3.0.2
- numpy>=1.16
- photutils>=0.7
- psutil>=5.7.2
- poppy>=0.9.0
- pyparsing>=2.2
- requests>=2.22
- scipy>=1.1.0
- spherical-geometry>=1.2.18
- stdatamodels>=0.2.0,<1.0
- stsci.image>=2.3.3
- tweakwcs>=0.7.0
- matplotlib
- jupyter
- ci-watson
- junit-xml
- nbformat
- jinja2
- ipython
- ipykernel
- pytest
- pytest-xdist
- pytest-html
# These repositories have no tags or releases yet, so none are chosen.
- git+https://github.com/STScI-MIRI/miricoord
- git+https://github.com/STScI-MIRI/miri3d
- git+https://github.com/york-stsci/nbpages
# This repository has tags, but selecting them doesn't seem to work.
- git+https://github.com/spacetelescope/pysiaf
- git+https://github.com/spacetelescope/[email protected]
- git+https://github.com/spacetelescope/[email protected]
- git+https://github.com/spacetelescope/[email protected]

I notice that there's a lot of specified packages installed that are not actually used in the notebooks as direct imports, and are actually dependencies of jwst or one of the simulators. It would be best to not have these in there. So spherical_geometry for example gets installed as a dependency of jwst, but there's no reason to have it specifically spelled out in the environment.yml if it is not being directly imported anywhere in any of the notebooks. It is only needed because jwst needs it. And jwst will define exactly which version(s) it needs and install it, or not install it if it is no longer needed. Same is true for many of these packages.

It would be good to go through for each package listed in environment.yml and grep the repo to see if it is actually used in an import statement in a notebook. If it is used in an import statement, then it should be listed here. If not, then remove it.

Further, jwst and pysiaf should both be installed via pip install jwst, and pip install pysiaf instead of from github, unless you're actually trying to get the latest dev version of pysiaf, which is what is happening now.

Also, jupyter is being installed both by conda and pip. Pick one that works.

Finally, if this repo is intended to test that these notebooks work with the jwst pipeline, it would be good not to pin the version of the pipeline here. Instead, test the latest release. And pip install jwst will get the latest. Otherwise this repo is not doing what it is supposed to be doing - automated testing of the latest pipeline releases.

Define Review Process

There is currently no defined process for reviewing pull requests. This process is very involved for a reviewer and the contributor. When a user submits a notebook for review, it will most likely contain paths pointing to data that are stored on disk or somewhere on central store. However, for the notebooks to run on the Jenkins server, the data will need to be stored on artifactory under

https://bytesalad.stsci.edu/artifactory/webapp/#/artifacts/browse/tree/General/jwst_validation_notebooks

and will need to be retrieved using ci_watson.get_bigdata for single data files or leverage jwst.regtest.regtestdata for jwst asn files. The data can be manually added by someone who has permissions to write to the artifactory repository, whether it is the contributor, reviewer, or someone in DMD.

There are examples of how to use ci_watson and regtestdata in the resample notebook.

Once the data is stored in it's correct location on artifactory and the changes are made in the notebooks pointing to the data on artifactory using ci_watson or regtestdata, the user will need to test that the notebook successfully pulls the data from artifactory and the notebook completes successfully.

Once this is done, the contributor will need to clear the cells of the notebook and the restart the kernel before pushing more changes back to the pull request.

Reviewers should make sure notebooks are being submitted to master with no executed cells, executed notebooks take up a lot of space on your github repository!

The reviewer can verify and merge the notebook into master and the next time the build is run, the html will be generated and will be served via https://jwst-validation-notebooks.stsci.edu/

NOTE: A future goal of this repository is to fold it into the notebooks repository to show users some more in depth analysis of how the pipeline is performing. Once JWST is taking data and we can make calls to astroquery for data, rather than relying on artifactory and adding the data manually.

update environments file

The NIRSpec pipeline testing tool (nirspec_pipe_testing_tool) has been updated to version 1.1.3. The environments.yaml file should reflect this change in order for the NIRSpec notebooks (once loaded) to work properly.

Investigate viewing permissions for website

Some IDT members might want to look at the outputs of the notebooks on our website. Currently, this is not available to them. Maybe we can talk with ITSD or another group to investigate ways around this. Filing a ticket here so it stays on our radar.

General formatting and organization clean-up

Since this is a low-priority issue, I'll file a ticket for minor formatting things to fix in terms of making the folder and notebook names consistent.

Folder names should either be the name of a step (e.g., source_catalog) or the name of a pipeline (e.g., calwebb_detector1, calwebb_tso3). Notebook names should be jwst_<step/pipeline>_<instrument/mode>.ipynb, where can just be the shorter suffix (detector1, tso3, image2, etc). Does that seem reasonable @cracraft and @york-stsci?

Following those suggestions:

  1. We should see if it's possible to move the notebooks in the “regression_tests” folder into the individual step name folders, e.g., the notebook that's in the regression_test —> tweakreg folder can just go in the normal tweakreg folder; however, we should leave the jwst_instrument_regression_tests.ipynb notebooks in the regression_tests folder.
  2. The “image3” folder should be named calwebb_image3
  3. The calwebb_coron3 notebooks should be named jwst_coron3_...ipynb to be consistent with the other notebooks
  4. The caldetector1-miri-lrs-tso.ipynb notebook should be renamed to jwst_detector1_miri_lrstso.ipynb

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.