Code Monkey home page Code Monkey logo

binderhub-deploy-gke's Introduction

Automatically deploy a BinderHub to Microsoft Azure

mit_license_badge

BinderHub is a cloud-based, multi-server technology used for hosting repoducible computing environments and interactive Jupyter Notebooks built from code repositories.

This repo contains a set of scripts to automatically deploy a BinderHub onto Microsoft Azure, and connect a DockerHub container registry, so that you can host your own Binder service.

This repo is based on the following set of deployment scripts for Google Cloud: nicain/binder-deploy

You will require a Microsoft Azure account and subscription. A Free Trial subscription can be obtained here. You will be asked to provide a credit card for verification purposes. You will not be charged. Your resources will be frozen once your subscription expires, then deleted if you do not reactivate your account within a given time period. If you are building a BinderHub as a service for an organisation, your institution may already have an Azure account. You should contact your IT Services for further information regarding permissions and access (see the Service Principal Creation section below).

Table of Contents


Usage

This repo can either be run locally or as "Platform as a Service" through the "Deploy to Azure" button in the "Deploy to Azure" Button section.

To use these scripts locally, clone this repo and change into the directory.

git clone https://github.com/alan-turing-institute/binderhub-deploy.git
cd binderhub-deploy

To make the scripts executable and then run them, do the following:

chmod 700 <script-name>.sh
./<script-name>.sh

[NOTE: The above command is UNIX specific. If you are running Windows 10, this blog post discusses using a bash shell in Windows.]

To build the BinderHub, you should run setup.sh first (to install the required command line tools), then deploy.sh (which will build the BinderHub). Once the BinderHub is deployed, you can run logs.sh and info.sh to get the JupyterHub logs and IP addresses respectively. teardown.sh should only be used to delete your BinderHub deployment.

You need to create a file called config.json which has the format described in the code block below. Fill the quotation marks with your desired namespaces, etc. config.json is git-ignored so sensitive information, such as passwords and Service Principals, cannot not be pushed to GitHub.

  • For a list of available data centre regions, see here. This should be a region and not a location, for example "West Europe" or "Central US". These can be equivalently written as westeurope and centralus, respectively.
  • For a list of available Linux Virtual Machines, see here. This should be something like, for example Standard_D2s_v3.
  • The versions of the BinderHub Helm Chart can be found here and are of the form 0.2.0-<commit-hash>. It is advised to select the most recent version unless you specifically require an older one.
{
  "azure": {
    "subscription": "",   // Azure subscription name or ID (a hex-string)
    "res_grp_name": "",   // Azure Resource Group name
    "location": "",       // Azure Data Centre region
    "node_count": 1,      // Number of nodes to deploy. 3 is preferrable for a stable cluster, but may be liable to caps.
    "vm_size": "",        // Azure virtual machine type to deploy
    "sp_app_id": null,    // Azure service principal ID (optional)
    "sp_app_key": null,   // Azure service principal password (optional)
    "sp_tenant_id": null  // Azure tenant ID (optional)
  },
  "binderhub": {
    "name": "",           // Name of your BinderHub
    "version": "",        // Helm chart version to deploy, should be 0.2.0-<commit-hash>
    "contact_email": ""   // Email for letsencrypt https certificate. CANNOT be left blank.
  },
  "docker": {
    "username": null,     // Docker username (can be supplied at runtime)
    "password": null,     // Docker password (can be supplied at runtime)
    "org": null,          // A DockerHub organisation to push images to (optional)
    "image_prefix": ""    // The prefix to preprend to Docker images (e.g. "binder-prod")
  }
}

You can copy template-config.json should you require.

Please note that all entries in template-config.json must be surrounded by double quotation marks ("), with the exception of node_count.

Important for Free Trial subscriptions

If you have signed up to an Azure Free Trial subscription, you are not allowed to deploy more than 4 cores. How many cores you deploy depends on your choice of node_count and vm_size.

For example, a Standard_D2s_v3 machine has 2 cores. Therefore, setting node_count to 2 will deploy 4 cores and you will have reached your quota for cores on your Free Trial subscription.

setup.sh

This script checks whether the required command line tools are already installed. If any are missing, the script uses the system package manager or curl to install the command line interfaces (CLIs). The CLIs to be installed are:

Any dependencies that are not automatically installed by these packages will also be installed.

deploy.sh

This script reads in values from config.json and deploys a Kubernetes cluster. It then creates config.yaml and secret.yaml files, respectively, using config-template.yaml and secret-template.yaml.

The script will ask for your Docker ID and password if you haven't supplied them in the config file. The ID is your Docker username, NOT the associated email. If you have provided a Docker organisation in config.json, then Docker ID MUST be a member of this organisation.

Both a JupyterHub and BinderHub are installed via a Helm Chart onto the deployed Kubernetes cluster and the config.yaml file is updated with the JupyterHub IP address. The BinderHub is then linked to the provided DockerHub account to store the created images.

config.yaml and secret.yaml are both git-ignored so that secrets cannot be pushed back to GitHub.

The script also outputs log files (<file-name>.log) for each stage of the deployment. These files are also git-ignored.

logs.sh

This script will print the JupyterHub logs to the terminal to assist with debugging issues with the BinderHub. It reads from config.json in order to get the BinderHub name.

info.sh

This script will print the pod status of the Kubernetes cluster and the IP addresses of both the JupyterHub and BinderHub to the terminal. It reads the BinderHub name from config.json.

upgrade.sh

This script will automatically upgrade the Helm Chart deployment configuring the BinderHub and then prints the Kubernetes pods. It reads the BinderHub name and Helm Chart version from config.json.

teardown.sh

This script will purge the Helm Chart release, delete the Kubernetes namespace and then delete the Azure Resource Group containing the computational resources. It will read the namespaces from config.json. The user should check the Azure Portal to verify the resources have been deleted. It will also purge the cluster information from your kubectl configuration file.

"Deploy to Azure" Button

To deploy BinderHub to Azure in a single click (and some form-filling), use the deploy button below.

Deploy to Azure

Service Principal Creation

You will be asked to provide a Service Principal in the form launched when you click the "Deploy to Azure" button above.

[NOTE: The following instructions can also be run in a local terminal session. They will require the Azure command line to be installed, so make sure to run setup.sh first.]

To create a Service Principal, go to the Azure Portal (and login!) and open the Cloud Shell:

Open Shell in Azure

You may be asked to create storage when you open the shell. This is expected, click "Create".

Make sure the shell is set to Bash, not PowerShell.

Bash Shell

Set the subscription you'd like to deploy your BinderHub on.

az account set -s <subscription>

This image shows the command being executed for an "Azure Pass - Sponsorship" subscription.

Set Subscription

You will need the subscription ID, which you can retrieve by running:

az account list --refresh --output table

List Subscriptions

Next, create the Service Principal with the following command. Make sure to give it a sensible name!

az ad sp create-for-rbac --name binderhub-sp --role contributor --scopes /subscriptions/<subscription ID from above>

Create Service PRincipal

The fields appId, password and tenant are the required pieces of information. These should be copied into the "Service Principal App ID", "Service Principal App Key" and "Service Principal Tenant ID" fields in the form, respectively.

Keep this information safe as the password cannot be recovered after this step!

Monitoring Deployment Progress

To monitor the progress of the blue-button deployment, go to the Azure portal and select "Resource Groups" from the left hand pane. Then in the central pane select the resource group you chose to deploy into.

Select Resource Group

This will give you a right hand pane containing the resources within the group. You may need to "refresh" until you see a new container instance.

Select Container Instance

When it appears, select it and then in the new pane go to "Settings->Containers". You should see your new container listed.

Container Events

Select it, then in the lower right hand pane select "Logs". You may need to "refresh" this to display the logs until the container starts up. The logs are also not auto-updating, so keep refreshing them to see progress.

Container Logs

Retrieving Deployment Output from Azure

When BinderHub is deployed using the "Deploy to Azure" button (or with a local container), output logs, YAML files, and ssh keys are pushed to an Azure storage account to preserve them once the container exits. The storage account is created in the same resource group as the Kubernetes cluster, and files are pushed into a storage blob within the account.

Both the storage blob name and the storage account name are derived from the name you gave to your BinderHub instance, but may be modified and/or have a random seed appended. To find the storage account name, navigate to your resource group by selecting "Resource Groups" in the left-most panel of the Azure Portal, then clicking on the resource group containing your BinderHub instance. Along with any pre-existing resources (for example, if you re-used an existing resource group), you should see three new resources: a container instance, a Kubernetes service, and a storage account. Make a note of the name of the storage account (referred to in the following commands as ACCOUNT_NAME) then select this storage account.

Storage Account

In the new pane that opens, select "Blobs" from the "Services" section. You should see a single blob listed. Make a note of the name of this blob, which will be BLOB_NAME in the following commands.

Blob Storage

Select Blob Storage

The Azure CLI can be used to fetch files from the blob (either in the cloud shell in the Azure Portal, or in a local terminal session if you've run setup.sh first). Files are fetched into a local directory, which must already exist, referred to as OUTPUT_DIRECTORY in the following commands.

You can run setup.sh to install the Azure CLI or use the cloud shell on the Azure Portal.

To fetch all files:

  az storage blob download-batch --account-name <ACCOUNT_NAME> --source <BLOB_NAME> --pattern "*" -d "<OUTPUT_DIRECTORY>"

The --pattern argument can be used to fetch particular files, for example all log files:

  az storage blob download-batch --account-name <ACCOUNT_NAME> --source <BLOB_NAME> --pattern "*.log" -d "<OUTPUT_DIRECTORY>"

To fetch a single file, specify REMOTE_FILENAME for the name of the file in blob storage, and LOCAL_FILENAME for the filename it will be fetched into:

  az storage blob download --account-name <ACCOUNT_NAME> --container-name <BLOB_NAME> --name <REMOTE_FILENAME> --file <LOCAL_FILENAME>

For full documentation, see the az storage blob documentation.

Accessing your BinderHub after Deployment

Once the deployment has succeeded and you've downloaded the log files, visit the IP address of your Binder page to test it's working.

The Binder IP address can be found by running the following:

cat <OUTPUT_DIRECTORY>/binder-ip.log

A good repository to test your BinderHub with is binder-examples/requirements

Running the Container Locally

The third way to deploy BinderHub to Azure would be to pull the Docker image and run it directly, parsing the values you would have entered in config.json as environment variables.

You will need the Docker CLI installed. Installation instructions can be found here.

First, pull the binderhub-setup image.

docker pull sgibson91/binderhub-setup:<TAG>

where <TAG> is your chosen image tag.

A list of availabe tags can be found here. It is recommended to use the most recent version number. The latest tag is the most recent build from master branch and may be subject fluctuations.

Then, run the container with the following arguments, replacing the <> fields as necessary:

docker run \
-e "BINDERHUB_CONTAINER_MODE=true" \
-e "SP_APP_ID=<Service Principal ID>" \
-e "SP_APP_KEY=<Service Principal Key>" \
-e "SP_TENANT_ID=<Service Principal Tenant ID>" \
-e "RESOURCE_GROUP_NAME=<Chosen Resource Group name>" \
-e "RESOURCE_GROUP_LOCATION=westeurope" \
-e "AZURE_SUBSCRIPTION=<Azure Subscription ID>" \
-e "BINDERHUB_NAME=<Chosen BinderHub name>" \
-e "BINDERHUB_VERSION=<Chosen BinderHub version>" \
-e "AKS_NODE_COUNT=1" \
-e "AKS_NODE_VM_SIZE=Standard_D2s_v3" \
-e "CONTACT_EMAIL=<email address>" \
-e "DOCKER_IMAGE_PREFIX=binder-dev" \
-e "DOCKER_USERNAME=<Docker ID>" \
-e "DOCKER_PASSWORD=<Docker password>" \
-it sgibson91/binderhub-setup:<TAG>

The output will be printed to your terminal and the files will be pushed to blob storage, as in the button deployment. See the Retrieving Deployment Output from Azure section for how to return these files.

Customising your BinderHub Deployment

Customising your BinderHub deployment is as simple as editing config.yaml and/or secret.yaml and then upgrading the BinderHub Helm Chart. The Helm Chart can be upgraded by running upgrade.sh (make sure you have the CLIs installed by running setup.sh first).

The Jupyter guide to customising the underlying JupyterHub can be found here.

The BinderHub guide for changing the landing page logo can be found here.

Contributors

We would like to acknowledge and thank the following people for their contributions to this project:

binderhub-deploy-gke's People

Contributors

sgibson91 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.