ncar / ncar-python-tutorial Goto Github PK

View Code? Open in Web Editor NEW

63.0 16.0 32.0 50.61 MB

Numerical & Scientific Computing with Python Tutorial

Home Page: https://ncar.github.io/ncar-python-tutorial

License: Creative Commons Attribution 4.0 International

Jupyter Notebook 95.09% Shell 0.90% Python 3.98% CSS 0.02%

python xarray dask numpy jupyter scipy matplotlib cartopy tutorial

ncar-python-tutorial's Introduction

NCAR Python Tutorial

NCAR Python Tutorial
- Setup
- Launch Jupyter Lab

Setup

This tutorial covers the installation and setup of a Python environment on:

Cheyenne
Casper
CGD's Hobart
Personal laptop/desktop with a UNIX-variant Operating System

NOTE: For windows users, setup scripts provided in this repository don't work on Windows machines for the time being.

Step 1: Clone NCAR Python Tutorial Repository

Run the following commmand to clone this repo to your system(e.g. cheyenne, casper, your laptop, etc...):

git clone https://github.com/NCAR/ncar-python-tutorial.git

Step 2: Install Miniconda and Create Environments

Change directory to the cloned repository
```
cd ncar-python-tutorial
```
Run the configure script:

NOTE: Be prepared for the script to take up to 15 minutes to complete.
```
./setup/configure
```

$ ./setup/configure --help
usage: configure [-h] [--clobber] [--download] [--prefix PREFIX]

Set up tutorial environment.

optional arguments:
  -h, --help            show this help message and exit
  --clobber, -c         Whether to clobber existing environment (default:
                        False)
  --download, -d        Download tutorial data without setting environment up
                        (default: False)
  --prefix PREFIX, -p PREFIX
                        Miniconda3 install location)

Default values for --prefix argument are:

Personal laptop / Hobart: $HOME/miniconda3
Cheyenne or Casper: /glade/work/$USER/miniconda3

NOTE: In case the default prefix is not appropriate for you (due to limited storage), feel free to specify a different miniconda install location. For instance, this install location may be a project workspace on a shared filesystem like GLADE or Hobart's filesystem.

The configure script does the following:

Install conda package manager if it is unable to find an existing installation. Otherwise, it will update the base environment
Create or Update python-tutorial conda environment.
Download data if not on Cheyenne or Casper or Hobart. If on Cheyenne or Casper or Hobart, create soft-links to an existing/local data repository.

Step 3: Close and re-open your current shell

For changes to take effect, close and re-open your current shell.

Step 4: Run the Setup Verification Script

Check that conda info runs successfully:
```
conda info
```
From the ncar-python-tutorial directory, activate python-tutorial conda environment:
```
conda activate python-tutorial
```
Run the setup verification script to confirm that everything is working as expected:
```
cd ncar-python-tutorial
./setup/check_setup
```
This step should print "Everything looks good!".

Launch Jupyter Lab

1. Cheyenne or DAV via JupyterHub (Recommended)

JupyterHub link: https://jupyterhub.ucar.edu/

To use the Cheyenne or DAV compute nodes,we recommend using JupyterLab via NCAR's JupyterHub deployment.

Open your preferred browser (Chrome, Firefox, Safari, etc...) on your local machine, and head over to https://jupyterhub.ucar.edu/.

You will need to authenticate with either your yubikey or your DUO mobile app

2. Cheyenne or DAV via SSH Tunneling

In case you are having issues with jupyterhub.ucar.edu, we've provided utility scripts for launching JupyterLab on both Cheyenne and Casper via SSH Tunneling:

conda activate base
./setup/jlab/jlab-ch # on Cheyenne
./setup/jlab/jlab-dav # on Casper

3. Hobart via SSH Tunneling

For those interested in running JupyterLab on CGD's Hobart, you will need to use SSH tunneling script provided in setup/jlab/jlab-hobart

conda activate base
./setup/jlab/jlab-hobart

$ ./setup/jlab/jlab-hobart --help
Usage: launch dask
Possible options are:
 -w,--walltime: walltime [default: 08:00:00]
 -q,--queue: queue [default: medium]
 -d,--directory: notebook directory
 -p,--port: [default: 8888]

4. Personal Laptop

For those interested in running JupyterLab on their local machine, you can simply run the following command, and follow the printed instructions on the console:

conda activate base
jupyter lab

ncar-python-tutorial's People

Contributors

Stargazers

Watchers

ncar-python-tutorial's Issues

Update Pangeo kernel on JHub and use for tutorial

Need the Pangeo environment/kernel on the JHub updated to include all of the packages that we need for this tutorial.

Also need to tailor this tutorial for this new kernel.

OHC Notebook broken

Currently, the OHC notebook is broken because the link to the data doesn't point to a file that exists. We need to fix this ASAP.

CC @jukent

Clarify which hosts tutorial setup are to be run on

@matt-long It would be good to clarify in the tutorial setup instructions in the readme which host is being used for each step. E.g. do you install conda on your laptop or on cheyenne?

brainstorm topics for June CGD tutorial

Let's enumerate topics that deserve formal instruction at the June CGD tutorial.

Python basics
git
Python package design

jupyterlab_vim

Can we add https://github.com/jwkvam/jupyterlab-vim to post_build_base?

cc @brittstephens

error from qstat in jlab-ch

When I try to run jlab-ch, I get the following (truncated) output:

Launching notebook server
  queue = share
  account = P93300670
  nodes = 1
  ncpus = 1
  memory = 8GB
  walltime = 06:00:00
  port = 8888

submitted job: 7186475.chadmin1.ib0.cheyenne.ucar to queue share
waiting for job to runqstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar
qstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar
..qstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar
qstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar
..qstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar

It looks like something is going wrong with extracting the job id from the output generated by qsub. It is quite possible that the format of the output from qsub has changed with the updates to cheyenne. That said, the bash for extracting the job id is sjob=${s%.*}. I don't know bash well enough to know what this is doing, or why it appears to no longer work.

Increase stdout from setup/configure?

I got a new laptop and after a week of use realized I still needed to install jupyter... I figured I'd go ahead and set up the recommended analysis environment as well. Per the README, I ran ./setup/configure and things seem to be stalled at

$ ./setup/configure
************** Found an existing Conda installation in: /Users/mlevy/miniconda3/bin/conda **************
***************** Skipping Conda installation... *****************
************** Creating/Updating conda environments (this can take 5-10 min) ***********

It's been 20+ minutes, and while I would not be surprised to learn that conda is doing something behind the screens, a little more information on how to track progress over this period of time would be nice. $ conda list isn't showing any updates to (base) yet, but something is happening to my environment:

$ conda list envs
# packages in environment at /Users/mlevy/miniconda3:
#
# Name                    Version                   Build  Channel

I'll continue to let the script run in the background, hopefully it'll finish eventually...

(Possibly related to #30?)

Remove event-specific information from repo and make stand-alone (self-guided)

Now that the event is over, we should work to make the repo "stand-alone." We need to remove any event-specific information from the repo, and we need to make the repository a "self-guided" tutorial.

jlab-dav not working

I've been using jupyter notebook on casper successfully... until today. I even did it yesterday.

I've been starting it from jlab-dav.

Today this was the error I got:

Launching notebook server
  partition = dav
  memory = 8GB
  constraint = casper
  account = 
  port = 8888

sbatch: error: You must specify a time limit (-t)
Contact [email protected] for assistance

sbatch: error: Batch job submission failed: Access/permission denied
waiting

Update ssh tunnel scripts with option to select a default environment to use

Currently, the ssh tunnel scripts in scripts/jlab require activating a conda environment with jupyter installed in it prior launching ./scripts/jlab/jlab-machine-name script. We can improve the user experience by embedding this into the scripts themselves. One solution is to add an option for conda_env_name to activate prior launching the jupyter server, and adding a conda activate $conda_env_name step in the scripts themselves.

Short Tutorial Seminar Series

It's been proposed that we break up our tutorials into smaller chunks that are two-hour long segments. Can we develop a curriculum of "seminars" that gets people to the "I'm ready for a hackathon!" stage.

bash kernel set x?

Can we seriously turn off the set -x feature within the bash kernel? It really screws up output with output you would expect to not be there.

Remove need to change/update the user's base environment

This is a potential headache for users who already have software installed in their base environment. We need a way to make tutorials like this work without needing the user to maintain a special base on their own.

CC @jbaksta @andersy005 @jukent

Troubleshooting

Option 1: Flexible channel priority

conda config --set channel_priority flexible

followed by

./setup/configure --clobber

Add Python basics to Welcome notebook

Tutorial announcements

CISL WIP
Staff Notes Daily
Communique
Tweet it

Looking for plotting talks from last tutorial.

I found the advanced Xarray plotting tutorial -- archive/old-contents/notebooks/xarray/02-xarray-advanced-plotting.ipynb

but wasn't there a talk on Cartopy and data visualization in general? Is that stored in a different repo?

Virtual Tutorial To-Do List - Part 1

Identify all of parts, create an outline
Branch/PR just for part 1
- pair programming review w @kmpaul
- get xdev team to read/add missing content/create PR
- repeat
Create Nikola md page for part 1
Reach out to Beta testers of part 1
Post and Announcement

Repeat for part 2

References:

#105 -- for 0 to 30 tutorial content discussion
#99 -- for overall tutorial curricula discussion

Finish 0-30 Tutorial Content

There is now a skeleton of the 0-30 tutorial in the files z230-pt1.md and z230-pt2.md.

There is still missing content in here, though, and someone needs to fill in the blanks.

That is currently missing includes:

Numpy coverage (Should this be here?)
Pandas coverage (Should this be here?)
Matplotlib + Cartopy (Should this be here?)
GitHub (last steps should be push up to GitHub?)
Git conflicts, reverting git commits, etc.
Jupyter Notebooks & Lab
netCDF4-python?

Create OHC notebook#1 - opening datasets in Xarray.

How to import modules?
What is in a dataset?
What is the difference between a datset and a dataarray?

Pull SSH launch scripts into separate package

Currently, the jlab-machine launch scripts (using SSH tunneling) ship with the Python tutorial content. However, these scripts are separately useful outside of the tutorial. I believe that these launch scripts should be a stand-alone package that is a dependency for the tutorial.

First, I think that we should pull the launch scripts out into a separate repo and then make this repo installable via pip. I don't think a conda install is required. We will need a good name for this package.

Second, how we specify the launch-script package as a dependency for the tutorial can be done in a couple of different ways:

We can make the tutorial a pip installable package, too, and explicitly declare the launch-scripts package as a dependency.
We can leave the tutorial content just as a repo and we can specify the launch-scripts package as a dependency in a the tutorial's conda environment file.

Option (2) is the easiest and perhaps the first step to take, regardless.

Grammar changes

On the landing page:
"or you are tired of tutorials the leap into advanced third-party packages"
Should "the" be "that"? Or something else?

Configure script doesn't work on clean Mac

I have the perfect laptop to test this configure script with! It's brand new with nothing on it!

Currently, with no python3 installed, I get an error when trying to run the script.

update jobqueue.yaml

dashboard has changed names.

Is the slurm configuration actually a good one?

ensure configure works with existing install

Many people may already have Python setup. Is the configure script robust in this context?

Python Basics Notebook

Should be prepared to cover:

Python Basics
- Overview of basic structures
- Packages vs modules
- Scripts vs modules
- Importing
- ...
NumPy package
SciPy package
stats package
Pandas package
Where to find things?
- StackOverflow, Google, PyPI
Where should you go to ask questions?
- https://github.com/NCAR/python-toolbox-faq

Rename `sample_workflows` to `workflows`

Notebook Template

Could we adopt a template for notebooks in this repository? My proposition is to have:

Table of contents section with links to different sections in the notebook itself
A learning objectives section at the beginning of the notebook
A Going Further section at the end of the notebook. This section could have references to documentation sections that are relevant to what was covered in the notebook and/or references to other notebooks.

@kmpaul, @jukent any thoughts?

conda init tcsh results in: Illegal variable name.

I ran conda init tcsh and it modified my .tcshrc file, but when I login or source .tcshrc, the mods result in an error:

Illegal variable name. The offending file is:

/gpfs/u/home/tomas/.tcshrc

and the I believe the illegal variable error is coming from this line:

__conda_setup="$('/glade/work/tomas/miniconda3/bin/conda' 'shell.tcsh' 'hook' 2> /dev/null)"

Edit tutorial notebooks to address common questions

-- OHC remove dask mentions
-- Plotting, e.g. remove "tab:red" for colors, fig-size is in inches, add how-to set default values ...

slurm_load_jobs error

When I attempt to run jlab-dav, I get

slurm_load_jobs error: Socket timed out on send/recv operation

This seems to be intermittent. execdav seems to work ok. Is there something we should be doing differently in jlab-dav?

Change to a Nikola site

I think we should change the tutorial site to a Nikola site. The main reason being that Nikola is python, so getting the environment set up on our laptops to edit and add content is easy. And we have experience with this with the Xdev blog.

Setup Instructions

We need a setup instructions document. It should probably be placed on the Nikola site:

/site/pages/spring2020/instructions.md

...or *.rst or similar.

Support self-guided as well as in-person tutorials

It would be nice if we can point people to this repo to enable self-guided instruction. I think we have a pretty solid draft of the content we might like.

Can we add a sphinx docs assembly of (some of) the notebooks into a sensible outline?

Considerations:

I think we'd like to include NCAR-specific material, but clearly delineate it as such.
Some of the "workflow" notebooks are more pedagogical than others: we might consider reorganizing. The OHC example, for instance, provides a nice introductory, idealized example, whereas the O2 trends is really an "advanced example." In this sense, the OHC example belongs in Chapter 1, whereas the O2 trends notebook should be in an appendix.

CESMLE - Oxygen trend workflow is incomplete

@matt-long,

I managed to get the oxygen trend notebook in this repo (#68) with a few changes. It is missing a few paragraphs explaining some of the science going on and/or computations. When you get time, can you add some text explaining the important concepts?

I am going to merge #68 for the time being.

conda activate drops user in /

Same user as in #85 -- he had an old version of anaconda installed, and I ran conda update conda to bring it up to 4.7.12; when I first ran into #85 I thought the issue was related to his conda installation so we moved ~/anaconda3 -> ~/anaconda3-old and let setup/configure install miniconda. After resolving #85 by installing the command line tools, setup ran successfully but when he runs

$ conda activate python-tutorial

He gets dumped in / instead of remaining in his current working directory. I wonder if some remnant of the old install of anaconda is causing issues somewhere. Anyone have any thoughts?

(Same thing happens with conda deactivate, for what its worth)

Re-use "old-contents"?

There is a directory called "old-contents" in the archived directory. What is in here? Is there anything in there that we should keep?

Once everything has been retrieved from the old-contents, we should delete the archived directory.

Trouble launching Jupyter Notebook after Cheyenne udpdate

Hi all, forgive me if this isn't the correct place for this issue, but I am having an issue launching Jupyter notebook in the pangeo environment after the Cheyenne upgrade.

I follow instructions here to launch the notebook, and when I enter:

jupyter lab --no-browser --ip=hostname --port=8877

I receive the following error:

Traceback (most recent call last):
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/traitlets.py", line 528, in get
    value = obj._trait_values[self.name]
KeyError: 'runtime_dir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/bin/jupyter-lab", line 11, in <module>
    sys.exit(main())
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/jupyter_core/application.py", line 266, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/config/application.py", line 657, in launch_instance
    app.initialize(argv)
  File "<decorator-gen-7>", line 2, in initialize
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/notebook/notebookapp.py", line 1627, in initialize
    self.init_configurables()
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/notebook/notebookapp.py", line 1317, in init_configurables
    connection_dir=self.runtime_dir,
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/traitlets.py", line 556, in __get__
    return self.get(obj, cls)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/traitlets.py", line 535, in get
    value = self._validate(obj, dynamic_default())
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/jupyter_core/application.py", line 99, in _runtime_dir_default
    ensure_dir_exists(rd, mode=0o700)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/jupyter_core/utils/__init__.py", line 13, in ensure_dir_exists
    os.makedirs(path, mode=mode)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/os.py", line 231, in makedirs
    makedirs(head, mode, exist_ok)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/os.py", line 241, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/run/user/24367'

I have never had issues with this before, which is why I think this is related to the Cheyenne upgrade. Does anyone have ideas about how to fix this? Thanks in advance.

Tutorial Data

Need to identify the data needed for this tutorial.
Find a public place to host data so everyone can download it
Need to provide a script to download the data to other machine
Enable the configure script to automatically download data (if not on Cheyenne or Casper)
If on Cheyenne or Casper, provide soft-links to data

Check for `analysis` environment conflict

Need to check if there already exists an analysis conda environment. If it already exists, stop execution and tell the user to either (1) rename the environment or (2) remove the environment, and tell them the commands to do so.

Then they have to rerun the configure script.

Announcement Update for Spring 2020 Tutorial

We need to announce the Spring 2020 Tutorial again.
We need to contact registered participants to get confirmation of their registration.

analysis environment inconsistent

I just updated environments/env-analysis.py to address a bug in MetPy. However, when I update the environment using conda, I get the following message.

The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/noarch::botocore==1.12.109=py_0
  - conda-forge/linux-64::s3transfer==0.2.0=py36_0
  - conda-forge/noarch::boto3==1.9.108=py_0

Curriculum for beginner, intermediate & advanced tracks?

We've talked about this a lot in the past, but I think we need to come to a conclusion about how to proceed on this. The issue here is that having a 2-3 day tutorial is not enough time to cover both beginner and intermediate topics. One thought for discussion:

Thought: Beginner-level participation looks very different than intermediate- or advanced-level participation. At the intermediate and advanced levels, participants can be expected to participate in a hackathon-like environment. At the beginner level, participants do not have the tools (yet) to do anything like a hackathon. This suggests the following:

Perhaps the goal of the beginner-level tutorial should be to get participants to the minimal level where they could contribute to and develop a hackathon project. This would cover git, GitHub, beginner Python, Jupyter Notebooks. This might be 1-2 days.
Perhaps the intermediate-level tutorial should focus on giving participants additional tools upon which they can find solutions via a hackathon project. This might be tools like intake, xarray, dask, etc. This might be 1-2 days.
Perhaps the advanced-level tutorial is just a hackathon. This should be 2-3 days.

All told, this is a curriculum spanning 4-7 days.

Thought: Experience trying to accommodate all levels of experience in a single tutorial does not seem to work as effectively as we would like. Namely, participants who start as beginners rarely are able to participate at the intermediate or advanced level in the same tutorial. So, it seems to me that people need time to develop their knowledge and let the concepts "sink in." This might suggest the following:

Perhaps the beginner-level tutorial precedes the intermediate-level tutorial by at least 1 week but possibly 2-4 weeks.
Perhaps the intermediate and advanced tutorials can be adjacent, such that 2-3 days are spend on technical topics followed by 2-3 days of hackathon.
Advanced-level participants would not need to show up for the intermediate topics section.
Advanced-level participants can be tapped as instructors.

This would allow for a 2-day beginner tutorial that might be followed by a week-long tutorial + hackathon about 2 weeks later.

Questions to Answer:

What do people think about this approach?
What would need to change with the material we have to make this possible?

Remove old Jekyll site

Now that we have the Nikola site live, we can get rid of the old Jekyll site which lives in docs/. However, we need to look through this directory to see if there is anything we should keep, such as the 'self-paced' guide that was previously developed.

Create Welcome.ipynb

Create a notebook that introduces Jupyter notebooks!

Are command line tools required for the `setup/configure` script on a Mac?

Helping a user run setup/configure on macos 10.14.6, he was getting an error from pip that basically said xcrun was providing invalid developer path. This was fixed by running

$ xcode-select --install

I'm not sure how to check to see if the command line tools are available, and if it really is a requirement I'm not sure why it took until the morning of the tutorial for the issue to appear. (Possibly related to #41 though this was not a new install of the OS)

Alternate text for OHC Notebook 1

import xarray as xr

Did that work for you? If not, you do not have xarray installed in your current notebook environment. Check to make sure it reads "Python [conda env: analysis]" in the top right corner of your jupyter notebook screen. If it just says "Python" or something else, then click on the text to change the selection, so that your "analysis" environment is used.

Write dask lecture material

Content should include:

NCAR & Dask jobqueue
Dask array
Adaptive scaling
Dashboard (on laptop and Cheyenne)
Xarray chunking best practices and rechunking
Possibly: map_blocks and map_overlap