Code Monkey home page Code Monkey logo

nebari-slurm's Introduction

Nebari Slurm

This project is being renamed from QHub HPC to Nebari Slurm.

Nebari Slurm is an opinionated open source deployment of jupyterhub based on an HPC jobscheduler. Nebari Slurm is a "distribution" of these packages much like Debian and Ubuntu are distributions of Linux. The high level goal of this distribution is to form a cohesive set of tools that enable:

  • environment management via conda and conda-store
  • monitoring of compute infrastructure and services
  • scalable and efficient compute via jupyterlab and dask
  • deployment of jupyterhub on prem without requiring deep devops knowledge of the Slurm/HPC and jupyter ecosystem

Features

  • Scalable compute environment based on the Slurm workload manger to take advantage of entire fleet of nodes
  • Ansible based provisioning on Ubuntu 18.04 and Ubuntu 20.04 nodes to deploy one master server and N workers. These workers can be pre-existing nodes in your compute environment
  • Customizable Themes for JupyterHub

jupyterhub-theme

  • JupyterHub integration allowing users to select the memory, cpus, and environment that jupyterlab instances for users are launched in

jupyterhub

  • Dask Gateway integration allowing users to selct the memory, cpus, and environment that dask schedule/workers use

dask-gateway

  • Monitoring of entire cluster via grafana to monitor the nodes, jupyterhub, slurm, and traefik

grafana

  • Shared directories between all users for collaborative compute

Dependencies

Install ansible dependencies

ansible-galaxy collection install -r requirements.yaml

Testing

There are tests for deploying Nebari Slurm on a virtual machine provisioner and in the cloud.

Virtual Machines

Vagrant is a tool responsible for creating and provisioning vms. It has convenient integration with ansible which allows for easy effective control over configuration. Currently the Vagrantfile only has support for libvirt and virtualbox.

cd tests/ubuntu1804
# cd tests/ubuntu2004
vagrant up --provider=<provider-name>
# vagrant up --provider=libvirt
# vagrant up --provider=virtualbox

Notebook for testing functionality

  • tests/assets/notebook/test-dask-gateway.ipynb

Cloud

Services

Current testing environment spins up four nodes:

  • all nodes :: node_exporter for node metrics
  • master node :: slurm scheduler, munge, mysql, jupyterhub, grafana, prometheus
  • worker node :: slurm daemon, munge

Jupyterhub

Jupyterhub is accessible via <master node ip>:8000

You may need to find a way to port-forward, e.g. over ssh:

vagrant ssh hpc01-test -- -N -L localhost:8000:localhost:8000

then access http://localhost:8000/ on the host.

Grafana

Grafana is accessible via <master node ip>:3000

License

Nebari Slurm is BSD3 licensed.

Contributing

Contributions are welcome!

nebari-slurm's People

Contributors

costrouc avatar aktech avatar viniciusdc avatar tylerpotts avatar adam-d-lewis avatar sjdemartini avatar ericdwang avatar balast avatar danlester avatar dsrawat984 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.