Code Monkey home page Code Monkey logo

ohpc_vagrant's Introduction

Project to provision an OpenHPC + Open OnDemand cluster via Vagrant using the CRI_XCBC (XSEDE basic cluster) Ansible provisioning framework.

The Vagrantfile takes inspiration from the vagrantcluster project but is oriented toward deploying only a master node and using standard OHPC tools to provision the cluster, and therfore favors the CRI_XCBC approach to ansible scripts just for the master.

The Vagrantfile is stripped to the core (rather that carry all the cruft of a vagrant init). It leverages work from a pilot project (primaryly the development of an updated centos 7.5 image) but prefers a clean repo slate.

Project Setup

Clone this project recursively to get the correct version for the CRI_XSEDE submodule to build the OpenHPC(ohpc) and Open OnDemand (ood) nodes

git clone --recursive https://gitlab.rc.uab.edu/jpr/ohpc_vagrant.git

Cluster Setup

After setting up the project above create your single node OpenHPC cluster with vagrant:

vagrant up ohpc

NOTE: After you run the above command if you were to get a "kernel mismatch error". To get past this error please run: vagrant ssh ohpc -c "uname -r". Copy and paste this kernel version in the group_vars/all to update the kernel version in the build_kernel_ver variable.

The ansible config will bring the master node to the point where its ready to ingest compute nodes via wwnodescan and prompt to you start a compute node. You can create a compute node and start it with the helper scripts:

Create node c0 (choose whatever name makes sense, c0 matches the config):

compute_create c0

When prompted start compute node c0:

compute_start c0

If you want to stop the compute node:

compute_stop c0

If you want to get rid of the compute node VM:

compute_destroy c0

Note, the compute scripts work directly with the VirtualBox hypervisor. The machine created is a basic, lightweight diskless compute node the boots via iPXE from the OpenHPC master. You may need to adjust the path to the ipxe.iso in compute_create to match your local environment.

Cluster Check

After the vagrant up ohpc completes you can can log into the cluster with vagrant ssh ohpc.

To confirm the system is operational run sinfo and you should see the following text:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
low*         up 2-00:00:00      1   idle c0

You can run a test command on the compute node via slurm using:

srun hostname

This should return the name c0.

With these tests confirmed you have a working OpenHPC cluster running slurm.

Boot the Open OnDemand node

A primary function of this project is to provide a dev/test cluster for working with Open OnDemand. After the cluster is up boot the ood node with:

vagrant up ood

This will provision the node.

NOTE: Near the end of the ood provisioning, the ansible scripts will display several sudo commands that need to be run on the ohpc node to register the ood node with the cluster. The commands ensure system file synchronization and slurm work. You will need to copy and paste these sudo commands to a shell in ohpc. The ansible script will pause for 90 seconds to give you time to do this.

After the node is provisioned (or booted) you need to work around a mount issue with NFS mounts in the centos/7 vagrant box and issue the mount -a command on the ood node:

vagrant ssh ood -c "sudo mount -a"

After this point you can connect to the web ui of the ood node, typically via (the port mapping may change in your local vagrant env):

http://localhost:8080

The default user name and password for the web UI is 'vagrant'.

Issues and Work arounds

If you encounter an issue with OHPC node provisioning due to GPG key errors as mentioned in jprorama/CRI_XCBC#77. Please run the following command:

vagrant box update

If you encounter an issue with nodes_vivify role in updating the slurm status on nodes, specifically the error slurm_update error: Invalid node state specified. Please increase the compute node memory. For example if you're using 4GB already increase the memory to 6GB in your Virtual Box.

ohpc_vagrant's People

Contributors

diedpigs avatar eesaanatluri avatar flakrat avatar jprorama avatar mmoo97 avatar trupeshkumarpatel avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.