This repository contains the framework for a very basic HPC cluster based on Vagrant, Ansible, and OpenHPC. It is just enough to build four nodes, a frontend, and a master node on your laptop or desktop system. From there, you can customize it however you wish!
- Install Vagrant: https://www.vagrantup.com/
- Clone this repository
- Run the
gensshkeys.sh
script to generate ssh keys in the ansible repository - (Optional) Copy
localenv.sh.in
tolocalenv.sh
and populate it with any local environment variables you need during the vagrant provisioning step (HTTP proxy information, for example) - Run
vagrant up
to fire up the cluster - Once the cluster is booted, you can run
vagrant ssh master
to log in to the master node, orvagrant ssh fe1
to log in to the frontend - Run
sinfo
on the frontend or master node to see if Slurm sees that your nodes are up. If they are not, runsudo scontrol update nodename=node[01-04] state=resume
to wake them up. - Start using your cluster! At this point, you should be able to run a simple test across the cluster (
srun -N 4 /bin/hostname
) or run some more complex jobs. - When you are done, shut down your cluster by logging out of it and running
vagrant halt
. - If you want to completely rebuild your cluster, run
vagrant destroy
, and then runvagrant up
again.
This virtual cluster is built around convenience, not security. It uses Vagrant's default ssh keys for convenience, and it contains some private keys (for munge, for example). This is good enough to run on an isolated desktop or laptop for experimentation, but you shouldn't plan to base an actual cluster configuration on its ansible repository without doing a good security sanity check.