Code Monkey home page Code Monkey logo

ncar-python-tutorial's Introduction

CircleCI

NCAR Python Tutorial


Setup

This tutorial covers the installation and setup of a Python environment on:

  • Cheyenne
  • Casper
  • CGD's Hobart
  • Personal laptop/desktop with a UNIX-variant Operating System

NOTE: For windows users, setup scripts provided in this repository don't work on Windows machines for the time being.

Step 1: Clone NCAR Python Tutorial Repository

Run the following commmand to clone this repo to your system(e.g. cheyenne, casper, your laptop, etc...):

git clone https://github.com/NCAR/ncar-python-tutorial.git

Step 2: Install Miniconda and Create Environments

  • Change directory to the cloned repository

    cd ncar-python-tutorial
  • Run the configure script:

    NOTE: Be prepared for the script to take up to 15 minutes to complete.

    ./setup/configure
$ ./setup/configure --help
usage: configure [-h] [--clobber] [--download] [--prefix PREFIX]

Set up tutorial environment.

optional arguments:
  -h, --help            show this help message and exit
  --clobber, -c         Whether to clobber existing environment (default:
                        False)
  --download, -d        Download tutorial data without setting environment up
                        (default: False)
  --prefix PREFIX, -p PREFIX
                        Miniconda3 install location)

Default values for --prefix argument are:

  • Personal laptop / Hobart: $HOME/miniconda3
  • Cheyenne or Casper: /glade/work/$USER/miniconda3

NOTE: In case the default prefix is not appropriate for you (due to limited storage), feel free to specify a different miniconda install location. For instance, this install location may be a project workspace on a shared filesystem like GLADE or Hobart's filesystem.

The configure script does the following:

  • Install conda package manager if it is unable to find an existing installation. Otherwise, it will update the base environment
  • Create or Update python-tutorial conda environment.
  • Download data if not on Cheyenne or Casper or Hobart. If on Cheyenne or Casper or Hobart, create soft-links to an existing/local data repository.

Step 3: Close and re-open your current shell

For changes to take effect, close and re-open your current shell.

Step 4: Run the Setup Verification Script

  • Check that conda info runs successfully:

    conda info
  • From the ncar-python-tutorial directory, activate python-tutorial conda environment:

    conda activate python-tutorial
  • Run the setup verification script to confirm that everything is working as expected:

    cd ncar-python-tutorial
    ./setup/check_setup

    This step should print "Everything looks good!".


Launch Jupyter Lab

1. Cheyenne or DAV via JupyterHub (Recommended)

To use the Cheyenne or DAV compute nodes,we recommend using JupyterLab via NCAR's JupyterHub deployment.

Open your preferred browser (Chrome, Firefox, Safari, etc...) on your local machine, and head over to https://jupyterhub.ucar.edu/.

You will need to authenticate with either your yubikey or your DUO mobile app

2. Cheyenne or DAV via SSH Tunneling

In case you are having issues with jupyterhub.ucar.edu, we've provided utility scripts for launching JupyterLab on both Cheyenne and Casper via SSH Tunneling:

conda activate base
./setup/jlab/jlab-ch # on Cheyenne
./setup/jlab/jlab-dav # on Casper

3. Hobart via SSH Tunneling

For those interested in running JupyterLab on CGD's Hobart, you will need to use SSH tunneling script provided in setup/jlab/jlab-hobart

conda activate base
./setup/jlab/jlab-hobart
$ ./setup/jlab/jlab-hobart --help
Usage: launch dask
Possible options are:
 -w,--walltime: walltime [default: 08:00:00]
 -q,--queue: queue [default: medium]
 -d,--directory: notebook directory
 -p,--port: [default: 8888]

4. Personal Laptop

For those interested in running JupyterLab on their local machine, you can simply run the following command, and follow the printed instructions on the console:

conda activate base
jupyter lab

ncar-python-tutorial's People

Contributors

andersy005 avatar bonnland avatar dcherian avatar dependabot[bot] avatar jukent avatar mabouali-ford avatar matt-long avatar mnlevy1981 avatar roberttomas avatar xdev-bot avatar zbruick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ncar-python-tutorial's Issues

OHC Notebook broken

Currently, the OHC notebook is broken because the link to the data doesn't point to a file that exists. We need to fix this ASAP.

CC @jukent

error from qstat in jlab-ch

When I try to run jlab-ch, I get the following (truncated) output:

Launching notebook server
  queue = share
  account = P93300670
  nodes = 1
  ncpus = 1
  memory = 8GB
  walltime = 06:00:00
  port = 8888

submitted job: 7186475.chadmin1.ib0.cheyenne.ucar to queue share
waiting for job to runqstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar
qstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar
..qstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar
qstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar
..qstat: illegally formed job identifier: 7186475.chadmin1.ib0.cheyenne.ucar

It looks like something is going wrong with extracting the job id from the output generated by qsub. It is quite possible that the format of the output from qsub has changed with the updates to cheyenne. That said, the bash for extracting the job id is sjob=${s%.*}. I don't know bash well enough to know what this is doing, or why it appears to no longer work.

Increase stdout from setup/configure?

I got a new laptop and after a week of use realized I still needed to install jupyter... I figured I'd go ahead and set up the recommended analysis environment as well. Per the README, I ran ./setup/configure and things seem to be stalled at

$ ./setup/configure
************** Found an existing Conda installation in: /Users/mlevy/miniconda3/bin/conda **************
***************** Skipping Conda installation... *****************
************** Creating/Updating conda environments (this can take 5-10 min) ***********

It's been 20+ minutes, and while I would not be surprised to learn that conda is doing something behind the screens, a little more information on how to track progress over this period of time would be nice. $ conda list isn't showing any updates to (base) yet, but something is happening to my environment:

$ conda list envs
# packages in environment at /Users/mlevy/miniconda3:
#
# Name                    Version                   Build  Channel

I'll continue to let the script run in the background, hopefully it'll finish eventually...

(Possibly related to #30?)

jlab-dav not working

I've been using jupyter notebook on casper successfully... until today. I even did it yesterday.

I've been starting it from jlab-dav.

Today this was the error I got:

Launching notebook server
  partition = dav
  memory = 8GB
  constraint = casper
  account = 
  port = 8888

sbatch: error: You must specify a time limit (-t)
Contact [email protected] for assistance

sbatch: error: Batch job submission failed: Access/permission denied
waiting

Update ssh tunnel scripts with option to select a default environment to use

Currently, the ssh tunnel scripts in scripts/jlab require activating a conda environment with jupyter installed in it prior launching ./scripts/jlab/jlab-machine-name script. We can improve the user experience by embedding this into the scripts themselves. One solution is to add an option for conda_env_name to activate prior launching the jupyter server, and adding a conda activate $conda_env_name step in the scripts themselves.

Short Tutorial Seminar Series

It's been proposed that we break up our tutorials into smaller chunks that are two-hour long segments. Can we develop a curriculum of "seminars" that gets people to the "I'm ready for a hackathon!" stage.

bash kernel set x?

Can we seriously turn off the set -x feature within the bash kernel? It really screws up output with output you would expect to not be there.

Troubleshooting

Option 1: Flexible channel priority

conda config --set channel_priority flexible

followed by

./setup/configure --clobber 

Looking for plotting talks from last tutorial.

I found the advanced Xarray plotting tutorial -- archive/old-contents/notebooks/xarray/02-xarray-advanced-plotting.ipynb

but wasn't there a talk on Cartopy and data visualization in general? Is that stored in a different repo?

Virtual Tutorial To-Do List - Part 1

  • Identify all of parts, create an outline

  • Branch/PR just for part 1
    - pair programming review w @kmpaul
    - get xdev team to read/add missing content/create PR
    - repeat

  • Create Nikola md page for part 1

  • Reach out to Beta testers of part 1

  • Post and Announcement

Repeat for part 2

References:

  • #105 -- for 0 to 30 tutorial content discussion

  • #99 -- for overall tutorial curricula discussion

Finish 0-30 Tutorial Content

There is now a skeleton of the 0-30 tutorial in the files z230-pt1.md and z230-pt2.md.

There is still missing content in here, though, and someone needs to fill in the blanks.

That is currently missing includes:

  • Numpy coverage (Should this be here?)
  • Pandas coverage (Should this be here?)
  • Matplotlib + Cartopy (Should this be here?)
  • GitHub (last steps should be push up to GitHub?)
  • Git conflicts, reverting git commits, etc.
  • Jupyter Notebooks & Lab
  • netCDF4-python?

Pull SSH launch scripts into separate package

Currently, the jlab-machine launch scripts (using SSH tunneling) ship with the Python tutorial content. However, these scripts are separately useful outside of the tutorial. I believe that these launch scripts should be a stand-alone package that is a dependency for the tutorial.

First, I think that we should pull the launch scripts out into a separate repo and then make this repo installable via pip. I don't think a conda install is required. We will need a good name for this package.

Second, how we specify the launch-script package as a dependency for the tutorial can be done in a couple of different ways:

  1. We can make the tutorial a pip installable package, too, and explicitly declare the launch-scripts package as a dependency.
  2. We can leave the tutorial content just as a repo and we can specify the launch-scripts package as a dependency in a the tutorial's conda environment file.

Option (2) is the easiest and perhaps the first step to take, regardless.

Grammar changes

On the landing page:
"or you are tired of tutorials the leap into advanced third-party packages"
Should "the" be "that"? Or something else?

Configure script doesn't work on clean Mac

I have the perfect laptop to test this configure script with! It's brand new with nothing on it!

Currently, with no python3 installed, I get an error when trying to run the script.

Python Basics Notebook

Should be prepared to cover:

  • Python Basics
    • Overview of basic structures
    • Packages vs modules
    • Scripts vs modules
    • Importing
    • ...
  • NumPy package
  • SciPy package
  • stats package
  • Pandas package
  • Where to find things?
    • StackOverflow, Google, PyPI
  • Where should you go to ask questions?

Notebook Template

Could we adopt a template for notebooks in this repository? My proposition is to have:

  • Table of contents section with links to different sections in the notebook itself
  • A learning objectives section at the beginning of the notebook
  • A Going Further section at the end of the notebook. This section could have references to documentation sections that are relevant to what was covered in the notebook and/or references to other notebooks.

@kmpaul, @jukent any thoughts?

conda init tcsh results in: Illegal variable name.

I ran conda init tcsh and it modified my .tcshrc file, but when I login or source .tcshrc, the mods result in an error:

Illegal variable name. The offending file is:

/gpfs/u/home/tomas/.tcshrc

and the I believe the illegal variable error is coming from this line:

__conda_setup="$('/glade/work/tomas/miniconda3/bin/conda' 'shell.tcsh' 'hook' 2> /dev/null)"

slurm_load_jobs error

When I attempt to run jlab-dav, I get

slurm_load_jobs error: Socket timed out on send/recv operation

This seems to be intermittent. execdav seems to work ok. Is there something we should be doing differently in jlab-dav?

Change to a Nikola site

I think we should change the tutorial site to a Nikola site. The main reason being that Nikola is python, so getting the environment set up on our laptops to edit and add content is easy. And we have experience with this with the Xdev blog.

Setup Instructions

We need a setup instructions document. It should probably be placed on the Nikola site:

/site/pages/spring2020/instructions.md

...or *.rst or similar.

Support self-guided as well as in-person tutorials

It would be nice if we can point people to this repo to enable self-guided instruction. I think we have a pretty solid draft of the content we might like.

Can we add a sphinx docs assembly of (some of) the notebooks into a sensible outline?

Considerations:

  • I think we'd like to include NCAR-specific material, but clearly delineate it as such.

  • Some of the "workflow" notebooks are more pedagogical than others: we might consider reorganizing. The OHC example, for instance, provides a nice introductory, idealized example, whereas the O2 trends is really an "advanced example." In this sense, the OHC example belongs in Chapter 1, whereas the O2 trends notebook should be in an appendix.

CESMLE - Oxygen trend workflow is incomplete

@matt-long,

I managed to get the oxygen trend notebook in this repo (#68) with a few changes. It is missing a few paragraphs explaining some of the science going on and/or computations. When you get time, can you add some text explaining the important concepts?

I am going to merge #68 for the time being.

conda activate drops user in /

Same user as in #85 -- he had an old version of anaconda installed, and I ran conda update conda to bring it up to 4.7.12; when I first ran into #85 I thought the issue was related to his conda installation so we moved ~/anaconda3 -> ~/anaconda3-old and let setup/configure install miniconda. After resolving #85 by installing the command line tools, setup ran successfully but when he runs

$ conda activate python-tutorial

He gets dumped in / instead of remaining in his current working directory. I wonder if some remnant of the old install of anaconda is causing issues somewhere. Anyone have any thoughts?

(Same thing happens with conda deactivate, for what its worth)

Re-use "old-contents"?

There is a directory called "old-contents" in the archived directory. What is in here? Is there anything in there that we should keep?

Once everything has been retrieved from the old-contents, we should delete the archived directory.

Trouble launching Jupyter Notebook after Cheyenne udpdate

Hi all, forgive me if this isn't the correct place for this issue, but I am having an issue launching Jupyter notebook in the pangeo environment after the Cheyenne upgrade.

I follow instructions here to launch the notebook, and when I enter:

jupyter lab --no-browser --ip=hostname --port=8877

I receive the following error:

Traceback (most recent call last):
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/traitlets.py", line 528, in get
    value = obj._trait_values[self.name]
KeyError: 'runtime_dir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/bin/jupyter-lab", line 11, in <module>
    sys.exit(main())
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/jupyter_core/application.py", line 266, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/config/application.py", line 657, in launch_instance
    app.initialize(argv)
  File "<decorator-gen-7>", line 2, in initialize
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/notebook/notebookapp.py", line 1627, in initialize
    self.init_configurables()
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/notebook/notebookapp.py", line 1317, in init_configurables
    connection_dir=self.runtime_dir,
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/traitlets.py", line 556, in __get__
    return self.get(obj, cls)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/traitlets/traitlets.py", line 535, in get
    value = self._validate(obj, dynamic_default())
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/jupyter_core/application.py", line 99, in _runtime_dir_default
    ensure_dir_exists(rd, mode=0o700)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/site-packages/jupyter_core/utils/__init__.py", line 13, in ensure_dir_exists
    os.makedirs(path, mode=mode)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/os.py", line 231, in makedirs
    makedirs(head, mode, exist_ok)
  File "/glade/u/home/doughert/miniconda3/envs/pangeo/lib/python3.5/os.py", line 241, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/run/user/24367'

I have never had issues with this before, which is why I think this is related to the Cheyenne upgrade. Does anyone have ideas about how to fix this? Thanks in advance.

Tutorial Data

  • Need to identify the data needed for this tutorial.
  • Find a public place to host data so everyone can download it
  • Need to provide a script to download the data to other machine
  • Enable the configure script to automatically download data (if not on Cheyenne or Casper)
  • If on Cheyenne or Casper, provide soft-links to data

Check for `analysis` environment conflict

Need to check if there already exists an analysis conda environment. If it already exists, stop execution and tell the user to either (1) rename the environment or (2) remove the environment, and tell them the commands to do so.

Then they have to rerun the configure script.

analysis environment inconsistent

I just updated environments/env-analysis.py to address a bug in MetPy. However, when I update the environment using conda, I get the following message.

The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/noarch::botocore==1.12.109=py_0
  - conda-forge/linux-64::s3transfer==0.2.0=py36_0
  - conda-forge/noarch::boto3==1.9.108=py_0

Curriculum for beginner, intermediate & advanced tracks?

We've talked about this a lot in the past, but I think we need to come to a conclusion about how to proceed on this. The issue here is that having a 2-3 day tutorial is not enough time to cover both beginner and intermediate topics. One thought for discussion:

Thought: Beginner-level participation looks very different than intermediate- or advanced-level participation. At the intermediate and advanced levels, participants can be expected to participate in a hackathon-like environment. At the beginner level, participants do not have the tools (yet) to do anything like a hackathon. This suggests the following:

  • Perhaps the goal of the beginner-level tutorial should be to get participants to the minimal level where they could contribute to and develop a hackathon project. This would cover git, GitHub, beginner Python, Jupyter Notebooks. This might be 1-2 days.
  • Perhaps the intermediate-level tutorial should focus on giving participants additional tools upon which they can find solutions via a hackathon project. This might be tools like intake, xarray, dask, etc. This might be 1-2 days.
  • Perhaps the advanced-level tutorial is just a hackathon. This should be 2-3 days.

All told, this is a curriculum spanning 4-7 days.

Thought: Experience trying to accommodate all levels of experience in a single tutorial does not seem to work as effectively as we would like. Namely, participants who start as beginners rarely are able to participate at the intermediate or advanced level in the same tutorial. So, it seems to me that people need time to develop their knowledge and let the concepts "sink in." This might suggest the following:

  • Perhaps the beginner-level tutorial precedes the intermediate-level tutorial by at least 1 week but possibly 2-4 weeks.
  • Perhaps the intermediate and advanced tutorials can be adjacent, such that 2-3 days are spend on technical topics followed by 2-3 days of hackathon.
  • Advanced-level participants would not need to show up for the intermediate topics section.
  • Advanced-level participants can be tapped as instructors.

This would allow for a 2-day beginner tutorial that might be followed by a week-long tutorial + hackathon about 2 weeks later.

Questions to Answer:

  • What do people think about this approach?
  • What would need to change with the material we have to make this possible?

Remove old Jekyll site

Now that we have the Nikola site live, we can get rid of the old Jekyll site which lives in docs/. However, we need to look through this directory to see if there is anything we should keep, such as the 'self-paced' guide that was previously developed.

Are command line tools required for the `setup/configure` script on a Mac?

Helping a user run setup/configure on macos 10.14.6, he was getting an error from pip that basically said xcrun was providing invalid developer path. This was fixed by running

$ xcode-select --install

I'm not sure how to check to see if the command line tools are available, and if it really is a requirement I'm not sure why it took until the morning of the tutorial for the issue to appear. (Possibly related to #41 though this was not a new install of the OS)

Alternate text for OHC Notebook 1

import xarray as xr

Did that work for you? If not, you do not have xarray installed in your current notebook environment. Check to make sure it reads "Python [conda env: analysis]" in the top right corner of your jupyter notebook screen. If it just says "Python" or something else, then click on the text to change the selection, so that your "analysis" environment is used.

Write dask lecture material

Content should include:

  • NCAR & Dask jobqueue
  • Dask array
  • Adaptive scaling
  • Dashboard (on laptop and Cheyenne)
  • Xarray chunking best practices and rechunking
  • Possibly: map_blocks and map_overlap

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.