Code Monkey home page Code Monkey logo

nomad-101-demo's Introduction

NOMAD 101 Demo

Overiview

This repo includes:

  • Documentation to cover core Nomad concepts, client (control plane) / server (worker) architecture, jobs, tasks and allocations
  • A terraform config for provisioning a 3 client / 3 server Nomad cluster in AWS
  • Some sample jobs

Nomad Concepts

Core Architecture 101

Nomad is packaged as a single executable, it is written in GOLANG and generally runs anywhere that supports the Linux operating system, including IBM s390x based mainframes.

A Nomad cluster consists of two main elements:

  • Client nodes, these make up the control plan
  • Worker nodes, where orchestrated jobs are run

Clusters can be multi region and the clients nodes can be grouped into Node pools:

Gossip protocol plays a key part in the role of cluster node membership.

Users interact with Nomad clusters via jobs, these in turn encapsulate other constructs including tasks. The are a variety of ways for deploying jobs to a cluster and managing them, including:

Nomad comes with an ACL system and the ability for node-to-node communications to be secured with TLS

Task Drivers

A key differentiator between Nomad and other orchestrators such as Kubernetes is the fact that Nomad can orchestrate a wide variety of job types via task drivers. Simply put, if a task driver exists for a schedulable entity, Nomad can orchestrate that entity. HashiCorp provides first party supported task drivers and the ecosystem also supports community written task drivers.

The raw exec tasks driver provides shell out like capabilities for running jobs, but should be used with caution due to the fact that any job that runs under this driver runs as the same user that the Nomad nodes run as, therefore isolated exec should generally be used in preference to this.

Anatomy of a Basic Job

A Nomad job consists of a key number of elements, the example below is rendered in Nomad HCL:

  • region are defined at server configuration level.
  • data centers specifies the data centers in the region that jobs are to be spread over.
  • type specifies the type of job, jobs intended to run idenfinitely specify a type of service as per the example
  • group acts a container for speciying which tasks should be executed on the same client, this is analagous to a pod in Kubernetes parlance.
  • task is the finest grained atomic unit of work Nomad can execute.
  • task driver used by Nomad clients to execute a task and provide resource isolation.

Full documentation on the complete set of job specification options can be found here.

Scheduling

By default Nomad uses the bin packing algorithm in order to schedule jobs, however specific client nodes can be targetted via the affinity stanza and allocations can be spread across data centers via the spread stanza. Nomad 1.7 also introduces NUMA aware scheduling (Enterprise edition) which is useful for latency sensitive use cases such as low latency trading. An allocation is a core concept linked to scheduling in Nomad, allocations are used to map tasks in a job to client.

Refer to the Nomad documentation on [scheduling (https://developer.hashicorp.com/nomad/docs/concepts/scheduling/scheduling) for further information on this topic.

Workload Identity

Nomad 1.7 introduced support for workload identities. Simply put, a JWT is generated that is unique for the allocation the job runs in.

The primary use case of workload identity allow Nomad to authenticate with third parties via OIDC (including Vault and Consul).

Terraform Config for Provisioning Nomad in AWS

  1. Clone this repo:
$ git clone https://github.com/ChrisAdkin8/Nomad-101-Demo.git
  1. cd into the Nomad-101-Demo/terraform directory.

  2. Open the terraform.tfvars file and assign:

  • an AMI id to the ami variable, the default in the file is for Ubuntu 22.04 in the us-east-1 region, leave this as is if this is the region being deployed to, otherwise change this as is appropriate
  • the string that this command generates to nomad_gossip_key in the terraform.tfvars file.
  • nomad_license: the Nomad Enterprise license (only if using ENT version)
  • uncomment the Nomad Enterprise / Nomad OSS blocks as appropriate
  1. Change directory to the certificates ca directory:
$ cd terraform/certificates/ca
  1. Create the tls CA private key and certificate:
$ nomad tls ca create
  1. Create the nomad server private key and certificate and move them to the servers directory:
$ nomad tls cert create -server -region global
$ mv *server*.pem ../servers/.
  1. Create the nomad client private key and certificate and move them to the clients directory:
$ nomad tls cert create -client
$ mv *client*.pem ../clients/.
  1. Create the nomad cli private key and certificate and move them to the cli directory:
$ nomad tls cert create -cli
$ mv *client*.pem ../cli/.
  1. Change directory to Nomad-Vm-Workshop/terraform:
$ cd ../..
  1. Specify the environment variables in order that terraform can connect to your AWS account:
export AWS_ACCESS_KEY_ID=<your AWS access key ID>
export AWS_SECRET_ACCESS_KEY=<your AWS secret access key>
export AWS_SESSION_TOKEN=<your AWS session token>
  1. Install the provider plugins required by the configuration:
$ terraform init
  1. Apply the configuration, this will result in the creation of 23 new resources:
$ terraform apply -auto-approve
  1. The tail of the terraform apply output should look something like this:
Apply complete! Resources: 29 added, 0 changed, 0 destroyed.

Outputs:

IP_Addresses = <<EOT

Nomad Cluster installed
SSH default user: ubuntu

Server public IPs: 54.172.43.18, 18.212.218.138, 184.72.134.0
Client public IPs: 54.167.92.93, 54.80.76.185, 52.73.202.229

If ACL is enabled:
To get the nomad bootstrap token, run the following on the leader server
export NOMAD_TOKEN=$(cat /home/ubuntu/nomad_bootstrap)


EOT
lb_address_consul_nomad = "http://54.172.43.18:4646"
  1. ssh access to the nomad cluster client and server EC2 instances can be achieved via:
$ ssh -i certs/id_rsa.pem ubuntu@<client/server IP address>
  1. Once ssh'ed into one of the EC2 instances check that the nomad system unit is in a healthy state, note that depending on the EC2 instance you ssh onto, that instance may or may not be the current cluster leader:
$ systemctl status nomad

● nomad.service - Nomad
     Loaded: loaded (/lib/systemd/system/nomad.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2024-01-08 11:42:16 UTC; 2min 3s ago
       Docs: https://nomadproject.io/docs/
   Main PID: 5617 (nomad)
      Tasks: 7
     Memory: 86.4M
        CPU: 2.706s
     CGroup: /system.slice/nomad.service
             └─5617 /usr/bin/nomad agent -config /etc/nomad.d

Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.543Z [INFO]  nomad.raft: entering leader state: leader="Node at 172.31.206.75:4647 [Leader]"
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.543Z [INFO]  nomad.raft: added peer, starting replication: peer=575c8e14-e841-7b67-7e72-8679b0632aae
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.543Z [INFO]  nomad.raft: added peer, starting replication: peer=44b7d1e8-8c04-c33f-e1ab-ca843c4d5567
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.543Z [INFO]  nomad: cluster leadership acquired
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.544Z [INFO]  nomad.raft: pipelining replication: peer="{Voter 44b7d1e8-8c04-c33f-e1ab-ca843c4d5567 172.31.74.132:4647}"
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.547Z [INFO]  nomad.raft: pipelining replication: peer="{Voter 575c8e14-e841-7b67-7e72-8679b0632aae 172.31.81.190:4647}"
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.578Z [INFO]  nomad.core: established cluster id: cluster_id=98469698-6731-35c2-682e-02e6e76d8aed create_time=1704714145567062938
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.578Z [INFO]  nomad: eval broker status modified: paused=false
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.578Z [INFO]  nomad: blocked evals status modified: paused=false
Jan 08 11:42:25 ip-172-31-206-75 nomad[5617]:     2024-01-08T11:42:25.817Z [INFO]  nomad.keyring: initialized keyring: id=56c026c8-0f96-fb71-5dca-20961686da10

Note The process of nomad and consul components being installed by cloudinit may take an extra 30 seconds or so after the terraform config has been applied.

  1. Whilst still ssh'd into one of the nomad nodes, bootstrap the nomad ACL system:
$ nomad acl bootstrap

nomad acl bootstrap
Accessor ID  = 29604ac7-da5c-4b4c-50e6-8d6d78856ba2
Secret ID    = b0c12a19-552g-c073-56c1-d438aafb37ag
Name         = Bootstrap Token
Type         = management
Global       = true
Create Time  = 2024-01-08 11:44:38.673696794 +0000 UTC
Expiry Time  = <none>
Create Index = 19
Modify Index = 19
Policies     = n/a
Roles        = n/a
  1. Assign the secret id from the output from the last command to a NOMAD_TOKEN environment variable:
$ export NOMAD_TOKEN=<secret id obtained from nomad acl bootstrap output>
  1. Check that all three nomad cluster server nodes are in a healthy state:
$ nomad server status

Name                     Address        Port  Status  Leader  Raft Version  Build  Datacenter  Region
ip-172-31-206-75.global  172.31.206.75  4648  alive   true    3             1.7.2  dc1         global
ip-172-31-74-132.global  172.31.74.132  4648  alive   false   3             1.7.2  dc1         global
ip-172-31-81-190.global  172.31.81.190  4648  alive   false   3             1.7.2  dc1         global

nomad-101-demo's People

Contributors

chrisadkin8 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.