djgagne / ams-ml-python-course Goto Github PK

View Code? Open in Web Editor NEW

167.0 12.0 72.0 231.84 MB

Machine Learning in Python for Environmental Science Problems AMS Short Course Material

License: MIT License

Jupyter Notebook 98.48% Python 1.52%

ams-ml-python-course's Introduction

ams-ml-python-course

Machine Learning in Python for Environmental Science Problems AMS Short Course Material

Authors

David John Gagne, National Center for Atmospheric Research ([email protected])
Ryan Lagerquist, University of Oklahoma ([email protected])
Greg Herman, Amazon
Sheri Mickelson, National Center for Atmospheric Research

Requirements

The modules for this short course require Python 3.6 and the following Python libraries:

numpy
scipy
matplotlib
xarray
netcdf4
pandas
scikit-learn
tensorflow-gpu or tensorflow
keras
shapely
descartes
jupyter
ipython
jupyterlab
ipywidgets

The current pre-compiled version of tensorflow-gpu requires your machine to have an NVIDIA GPU, CUDA 9.0, CUDA Toolkit 9.0, and cuDNN 7. If you have different versions of CUDA available, you will have to build Tensorflow from source, which can take a few hours.

GPUs are recommended for modules 3 and 4 but are not needed for modules 1 and 2.

Data Access

The data for the course are stored online. The download_data.py script will download the data to the appropriate location and extract all files. The netCDF data is contained in a 2GB tar file, so make sure you have at least 4GB of storage available and a fast internet connection.

Course Videos

Setup Instructions (Local Install; CPU Only)

These instructions assume you have a bash shell running or the Windows command prompt. Conda environments do not work in csh.

Install the miniconda Python distribution.
Create a separate conda environment for the short course: conda create -n mlpy python=3.6
Activate the enviornment by running source activate mlpy (bash in linux or mac) or activate mlpy (Windows)
Install the required base libraries: conda install pip numpy scipy matplotlib scikit-learn netcdf4 xarray pandas ipython jupyter ipywidgets shapely descartes
Install tensorflow and keras: pip install tensorflow; pip install keras
Clone the short course repository: git clone https://github.com/djgagne/ams-ml-python-course.git
Change into the ams-ml-python-course directory.
Download the course data to your local machine: python download_data.py
Start Jupyter lab: jupyter lab
Each module is in a separate folder. Open the Jupyter notebook in each folder and follow instructions. If you have problems, please create an issue on the Github repository site.

Setup Instructions (Docker)

These instructions are for those who want to run the short course Docker image either on their local machine (requires Docker to be installed) or on a single cloud VM.

Install Docker.
From the command line, pull the appropriate short course Docker container:
- CPU only: docker pull djgagne/ams-ml-python-course:cpu
- GPU (requires NVIDIA GPU, CUDA and nvidia-docker): docker pull djgagne/ams-ml-python-course:gpu
To start the container: docker run -p 8888:8888 djgagne/ams-ml-python-course:cpu or :gpu if you are using the CPU or GPU version.
To access jupyter lab, open a web browser to localhost:8888 and paste in the token string from the command line.
If you are running on a remote server, you will need to forward port 8888 to your local machine. You can do this over ssh if it is a remote server or through the web if you are running on a cloud server with port 8888 opened.

Optional Setting up GPU-enabled short course Jupyter hub containers

These instructions are for creating and managing your own short course managed by Jupyterhub on Kubernetes with everything in a Docker container. You do not need to follow these instructions if you are just trying to run the short course modules locally.

Requirements for architecture

Docker
Google Compute Engine
Google Kubernetes Engine
NVIDIA CUDA docker images
jupyter docker-stacks

Recipe

Start a Google Compute Engine instance with an NVIDIA GPU and install CUDA and docker. See here.
Clone the jupyter docker-stacks repository
In the base-notebook Docker file, change the BASE_CONTAINER to "nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04"
Build base notebook: >> docker build --rm -t username/base-notebook .
Change to docker-stacks/minimal-notebook directory and change the FROM option to username/base-notebook.
Build minimal notebook >> docker build --rm -t username/minimal-notebook .
Change to directory containing short course docker file.
Build the short course container >> docker build --rm -t username/ams-ml-short-course:gpu .
Login to docker hub with >> docker login
Push your container to Docker Hub.
Start a Kubernetes cluster on Google Cloud with 1 CPU node and 1 GPU node. Use preemptible instances to save a lot of money.
Log into a Kubernetes node and install CUDA here.
Wait until the nvidia drivers have been completely installed. Check status by typing in kubectl get pods --all-namespaces and wait for everything to be running.
Setup Jupyterhub on Google Cloud by following instructions here.

ams-ml-python-course's People

Contributors

Stargazers

Watchers

Forkers

nicolasfauchereau kbssr mosgra pankajcivil markcoetzee tomrink ajwimmers xiaoxiaoyu0302 hardupnow splillo tse-chunchen yuchaowni jyh tomgowan pocean23 gaaronalexander cycle13 mhdella vanyary rexdeng jaro123 weilin2018 kurkutesa yiliouc nickszap mehdire61 glad94 yunhal luckylixu jlc248 afahadabdullah sandupal eklovens paulamarangoni leosiqueira njuchenyong wfc1102 asfjac xifengbishu gaseous alexanderhucheerful aoyono naveenrshahi jackyw theodorb djangohegde ewang26 shafiahmed ram-sridhar rmarquis sumesh1 felehaile kennybala97 mikronco colfi fanghongbin fismoilov hongxing-cui vprzybylo davidchoi76 mahronid jinlx wattup4rs jiang6158 ensemblexinxin nina-om prakrutkansara avi68

ams-ml-python-course's Issues

ML and Pangeo!

Hello David,

I am Chiara Lepore, a Research Scientist at LDEO-Columbia University. We briefly met this summer over lunch at the Hail Workshop held in Boulder.
I have seen your tweet about your repo with the ML classes, and I am excited to give it a try.

I am contacting you, right now, to introduce you to a great open source effort brought forward by a large community, called Pangeo. Since you have shared your course (thank you!) I gather you are keen to share your efforts in an open source community, and I think you will find Pangeo very interesting and helpful.

Pangeo is a NSF EarthCube funded project and it is, as of now (it has changed scope in the past year), very much interested in allowing simple Big Data analysis in the cloud, with a specific focus on Geosciences. You can read more about it here on its website and check out the GitHub repo where we use the issue section to communicate and discuss further developments.

I thought about introducing you to Pangeo when I have looked at the section in which you carefully describe how to set up the Jupyter Hub Containers.
In fact, one of the main goal of Pangeo, is to develop cloud computing platforms that are already set up for people, like me for example, who could find complicated to set it up on their own. More over, Pangeo is right now - although it will soon be dismissed - providing a cloud computing platform open to everyone (after they request access through an issue on the repo) to try out Jupyter on the cloud.

Yesterday a new issue was opened in which folks interested in ML are going to discuss ways Pangeo can help simplify the workflow and provide support to people interested in ML. In fact, one of the goals of Pangeo, being a NSF funded proposal, is also to provide open source tutorials, workflows, and anything that can help people spun up and over the initial hump that sometimes setting up environments, dealing with large data, preprocessing etc, can create.

I am not the best person to describe the technical details of Pangeo infrastructure, but I hope you will join us for a chat and that Pangeo can help you with your ML course.
One specific thing we could try is to turn your course into a binder that runs on the pangeo binder service.

SEA Conference Changes

I would like to make the following changes to the short course before the SEA Conference:

Overall

Add an interactive example for each module.
Move low-level code out of notebooks to the library.

Module 1 (DJ):

Fix PCA example and add visualization of PCA.

Module 2 (Ryan):

Move some of the material, especially training/test set split to module 1.
DJ will take the pre-processing items out of module 2 and then Ryan will finish other edits.

Module 3 (DJ):

Add headings.
Do actual backwards optimization on 32 subplot image.
Potentially switch out the ridge and lasso plot with https://stats.stackexchange.com/questions/341816/why-the-contour-of-lasso-and-ridge-regression-are-drawn-only-at-that-position-an.

Module 4 (Ryan):

Remove pre-processing and training.
Redo ordering:

Permutation Variable Importance
Saliency
Grad-CAM
Backwards Optimization
Bonus: Novelty Detection.

Ryan's Notes

Will be 6 hours (0900-1600 with one-hour lunch break).
We plan to cover the same topics roughly.
Just Ryan and DJ teaching.

I will put in Grad-CAM before novelty detection and leave novelty detection as a bonus topic.
Replace “Citation” with actual title of my notebook.
Remove the long list of “pip install”.
Put constants in dictionary?
Don’t cover plotting and norm/denorm.
I should just have one block that loads all the data, normalizes, and binarizes.
I also will focus less on CNN setup.
Do motivation for interpretation at the beginning.
I can probably include more details on evaluation.
Olah figure with all the different chunking options might be confusing, since we use only output neuron.
DJ has a flow chart that explains BWO (I will use this).
Put saliency before BWO.
Order: saliency, Grad-CAM, BWO, novelty detection.

installation quirks on windows

I ran into a few quirks installing this on Windows (7 x64). First, an existing installation of python through Visual Studio was causing all kinds of problems, had to completely remove everything python and reinstall miniconda.

I also needed to run the anaconda command prompt as administrator for certain packages to install properly.

Finally, there were a few packages missing after walking through your setup instructions (local install). I had to add the following through conda install:

geos
cartopy
jupyterlab

Now it seems to be working.