roblox / nomad-driver-containerd Goto Github PK

Nomad task driver for launching containers using containerd.

License: Other

Makefile 0.30% Go 56.84% HCL 6.98% Shell 35.89%

hashicorp-nomad containerd containers golang nomad dockerless contributions-welcome help-wanted

nomad-driver-containerd's Introduction

nomad-driver-containerd

We are actively looking for contributors and maintainers for this project. If you have experience in container internals e.g. cgroups, namespaces, or have contributed to any open source projects around containers e.g. docker, containerd, nerdctl, podman etc or build tooling which involves dealing with container internals, and are interested in contributing to this project, I would love to talk to you! Golang experience is preferred but not required.

Please reach out to me @_shishir_m or open an issue in this repository with your contact details, if you are interested in contributing to this project.

Overview

Nomad task driver for launching containers using containerd.

Containerd (containerd.io) is a lightweight container daemon for running and managing container lifecycle.
Docker daemon also uses containerd.

dockerd (docker daemon) --> containerd --> containerd-shim --> runc

nomad-driver-containerd enables nomad client to launch containers directly using containerd, without docker!
Docker daemon is not required on the host system.

nomad-driver-containerd architecture

Requirements

Nomad >=v1.0
Go >=v1.11
Containerd >=1.3
Vagrant >=v2.2
VirtualBox v6.0 (or any version vagrant is compatible with)

Building nomad-driver-containerd

Make sure your $GOPATH is setup correctly.

$ mkdir -p $GOPATH/src/github.com/Roblox
$ cd $GOPATH/src/github.com/Roblox
$ git clone [email protected]:Roblox/nomad-driver-containerd.git
$ cd nomad-driver-containerd
$ make build (This will build your containerd-driver binary)

If you want to compile for arm64, you can run:

make -f Makefile.arm64

Screencast

Wanna try it out!?

$ vagrant up

or vagrant provision if the vagrant VM is already running.

Once setup (vagrant up OR vagrant provision) is complete and the nomad server is up and running, you can check the registered task drivers (which will also show containerd-driver) using:

$ nomad node status (Note down the <node_id>)
$ nomad node status <node_id> | grep containerd-driver

NOTE: setup.sh is part of the vagrant setup and should not be executed directly.

Run Example jobs.

There are few example jobs in the example directory.

$ nomad job run <job_name.nomad>

will launch the job.

More detailed instructions are in the example README.md

To interact with images and containers directly, you can use nerdctl which is a docker compatible CLI for containerd. nerdctl is already installed in the vagrant VM at /usr/local/bin.

Supported options

Driver Config

Option	Type	Required	Default	Description
enabled	bool	no	true	Enable/Disable task driver.
containerd_runtime	string	yes	N/A	Runtime for containerd e.g. `io.containerd.runc.v1` or `io.containerd.runc.v2`.
stats_interval	string	no	1s	Interval for collecting `TaskStats`.
allow_privileged	bool	no	true	If set to `false`, driver will deny running privileged jobs.
auth	block	no	N/A	Provide authentication for a private registry. See Authentication for more details.

Task Config

Option	Type	Required	Description
image	string	yes	OCI image (docker is also OCI compatible) for your container.
image_pull_timeout	string	no	A time duration that controls how long `containerd-driver` will wait before cancelling an in-progress pull of the OCI image as specified in `image`. Defaults to `"5m"`.
command	string	no	Command to override command defined in the image.
args	[]string	no	Arguments to the command.
entrypoint	[]string	no	A string list overriding the image's entrypoint.
cwd	string	no	Specify the current working directory for your container process. If the directory does not exist, one will be created for you.
privileged	bool	no	Run container in privileged mode. Your container will have all linux capabilities when running in privileged mode.
pids_limit	int64	no	An integer value that specifies the pid limit for the container. Defaults to unlimited.
pid_mode	string	no	`host` or not set (default). Set to `host` to share the PID namespace with the host.
hostname	string	no	The hostname to assign to the container. When launching more than one of a task (using `count`) with this option set, every container the task starts will have the same hostname.
host_dns	bool	no	Default (`true`). By default, a container launched using `containerd-driver` will use host `/etc/resolv.conf`. This is similar to `docker behavior`. However, if you don't want to use host DNS, you can turn off this flag by setting `host_dns=false`.
seccomp	bool	no	Enable default seccomp profile. List of `allowed syscalls`.
seccomp_profile	string	no	Path to custom seccomp profile. `seccomp` must be set to `true` in order to use `seccomp_profile`. The default `docker` seccomp profile found `here` can be used as a reference, and modified to create a custom seccomp profile.
shm_size	string	no	Size of /dev/shm e.g. "128M" if you want 128 MB of /dev/shm.
sysctl	map[string]string	no	A key-value map of sysctl configurations to set to the containers on start.
readonly_rootfs	bool	no	Container root filesystem will be read-only.
host_network	bool	no	Enable host network. This is equivalent to `--net=host` in docker.
extra_hosts	[]string	no	A list of hosts, given as host:IP, to be added to /etc/hosts.
cap_add	[]string	no	Add individual capabilities.
cap_drop	[]string	no	Drop invidual capabilities.
devices	[]string	no	A list of devices to be exposed to the container.
auth	block	no	Provide authentication for a private registry. See Authentication for more details.
mounts	[]block	no	A list of mounts to be mounted in the container. Volume, bind and tmpfs type mounts are supported. fstab style `mount options` are supported.

Mount block
{
- type (string) (Optional): Supported values are volume, bind or tmpfs. Default: volume.
- target (string) (Required): Target path in the container.
- source (string) (Optional): Source path on the host.
- options ([]string) (Optional): fstab style mount options. NOTE: For bind mounts, atleast rbind and ro are required.
}

Bind mount example

mounts = [
           {
                type    = "bind"
                target  = "/target/t1"
                source  = "/src/s1"
                options = ["rbind", "ro"]
           }
        ]

In additon to the mounts option in Task Config, you can also mount your volumes into the container using nomad volume_mount stanza

See example job for volume_mount.

Custom seccomp profile example

The default docker seccomp profile found here can be downloaded, and modified (by removing/adding syscalls) to create a custom seccomp profile.
The custom seccomp profile can then be saved under /opt/seccomp/seccomp.json on the Nomad client nodes.

A nomad job can be launched using this custom seccomp profile.

config {
	seccomp         = true
	seccomp_profile = "/opt/seccomp/seccomp.json"
}

Sysctl example

config {
  sysctl = {
    "net.core.somaxconn"  = "16384"
    "net.ipv4.ip_forward" = "1"
  }
}

Authentication (Private registry)

auth stanza allow you to set credentials for your private registry e.g. if you want to pull an image from a private repository in docker hub.
auth stanza can be set either in Driver Config or Task Config or both.
If set at both places, Task Config auth will take precedence over Driver Config auth.

NOTE: In the below example, user and pass are just placeholder values which need to be replaced by actual username and password, when specifying the credentials. Below auth stanza can be used for both Driver Config and Task Config.

auth {
    username = "user"
    password = "pass"
}

Networking

nomad-driver-containerd supports host and bridge networks.

NOTE: host and bridge are mutually exclusive options, and only one of them should be used at a time.

Host network can be enabled by setting host_network to true in task config of the job spec (see under Supported options).
Bridge network can be enabled by setting the network stanza in the task group section of the job spec.

network {
  mode = "bridge"
}

You need to install CNI plugins on Nomad client nodes under /opt/cni/bin before you can use bridge networks.

Instructions for installing CNI plugins.

 $ curl -L -o cni-plugins.tgz https://github.com/containernetworking/plugins/releases/download/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz
 $ sudo mkdir -p /opt/cni/bin
 $ sudo tar -C /opt/cni/bin -xzf cni-plugins.tgz

Also, ensure your Linux operating system distribution has been configured to allow container traffic through the bridge network to be routed via iptables. These tunables can be set as follows:

$ echo 1 > /proc/sys/net/bridge/bridge-nf-call-arptables
$ echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
$ echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables

To preserve these settings on startup of a nomad client node, add a file including the following to /etc/sysctl.d/ or remove the file your Linux distribution puts in that directory.

net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1

Port forwarding

nomad supports both static and dynamic port mapping.

Static ports

Static port mapping can be added in the network stanza.

network {
  mode = "bridge"
  port "lb" {
    static = 8889
    to     = 8889
  }
}

Here, host port 8889 is mapped to container port 8889.
NOTE: static ports are usually not recommended, except for system or specialized jobs like load balancers.

Dynamic ports

Dynamic port mapping is also enabled in the network stanza.

network {
  mode = "bridge"
  port "http" {
    to = 8080
  }
}

Here, nomad will allocate a dynamic port on the host and that port will be mapped to 8080 in the container.

You can also read more about network stanza in the nomad official documentation

Service discovery

Nomad schedules workloads of various types across a cluster of generic hosts. Because of this, placement is not known in advance and you will need to use service discovery to connect tasks to other services deployed across your cluster. Nomad integrates with Consul to provide service discovery and monitoring.

A service stanza can be added to your job spec, to enable service discovery.

The service stanza instructs Nomad to register a service with Consul.

Tests

If you are running the tests locally, use the vagrant VM provided in the repository.

$ vagrant up
$ vagrant ssh containerd-linux
$ sudo make test

NOTE: These are destructive tests and can leave the system in a changed state.
It is highly recommended to run these tests either as part of a CI/CD system e.g. circleci or on a immutable infrastructure e.g vagrant VMs.

You can also run an individual test by specifying the test name. e.g.

$ cd tests
$ sudo ./run_tests.sh 001-test-redis.sh

Cleanup

make clean

This will delete your binary: containerd-driver

vagrant destroy

This will destroy your vagrant VM.

Currently supported environments

Ubuntu (>= 16.04)

License

Licensed under the Apache License, Version 2.0 (the "License"). For more information read the License.

nomad-driver-containerd's People

Contributors

Stargazers

Watchers

nomad-driver-containerd's Issues

[feature request] Extra hosts in the /etc/hosts

Hi,

Nomad recently fixed\implement long-standing issue with bridge network Docker workloads not being able to add extra entries into the /etc/hosts.

hashicorp/nomad#10766

I wonder if containerd driver could support this?

Some applications are dependant on some /etc/hosts entries in our case and we have to do a lot of hacks to workaround it.

containerd driver doesn't support passing args without command

In the docker driver, args are optional but they could be passed as with command and without command:
https://www.nomadproject.io/docs/drivers/docker#args

Without command, they will be passed to entrypoint script I guess.

contained driver produces an error if args are passed without command:

rpc error: code = Unknown desc = Error in creating container: Command is empty. Cannot set --args without --command.

This is an incompatibility with the docker driver and could block the migration.

Do you think it makes sense to make args behave in the same way as in docker driver?

inline seccomp_profile

kind of nightmare having to first deploy (and redeploy if it changes) a seccomp_profile for every container onto the filesystem of each nomad server.

considering it's just json, would be amazing if we could supply it inline with the task specification similar to how capabilities currently are. this is not currently possible right?

[feature request] plugin configuration level privileged mode

Hello!

Currently, it's possible to set the privileged mode in the Nomad job definition via:

config {
  privileged = true
}

I think it's a security risk. What do you think about making this a plugin-level configuration that will prevent such job configurations?

So it will be:

plugin "containerd-driver" {
  config {
    enabled = true
    containerd_runtime = "io.containerd.runc.v2"
    stats_interval = "5s"
    privileged = false
  }
}

Running with Nomad inside containerd

I'm interested in supporting this driver within the ResinStack distribution that I have developed for a more readily deployable version of the nomad ecosystem. In this environment I have nomad itself running as a containerd task, and I'm trying to work out either what needs to be mounted in, or if I can change the mount paths. Right now I'm hung up on this error and would appreciate advice:

2022-01-27T22:10:45-06:00  Driver Failure  rpc error: code = Unknown desc = Error in creating container: failed to mount /tmp/containerd-mount2802059906: no such file or directory

/tmp from the host is available to the container, so I'm not really sure what's wrong here.

containerd-driver builds for arm

There are prebuilt binaries for amd

How to use template stanza

I reconfigured tasks from docker to containerd driver without any big problems - except one: How to configure the mount options when using a template stanza for configuration, e.g.:

task "mosquitto" {
  driver = "docker"
  config {
    image = "docker.io/eclipse-mosquitto:2"
    ports = ["mqtt", "wss"]
    volumes = [
     "local/mosquitto.conf:/mosquitto/config/mosquitto.conf",
      "/srv/nomad/mosquitto:/mosquitto/data",
    ]
  }
  template {
      destination = "local/mosquitto.conf"
      data = <<EOF
bind_address 0.0.0.0
allow_anonymous true
persistence true
persistence_location /mosquitto/data/
log_dest stdout
EOF
  }
}

Default docker registry is not set

Setting docker image config to:

    task "redis" {
      driver = "containerd-driver"
      config {
        image = "redis:3.2"
      }

Work s with docker driver but fails with containerd with error:

rpc error: code = Unknown desc = Error in pulling image redis:3.2: failed to resolve reference "redis:3.2": parse "dummy://redis:3.2": invalid port ":3.2" after host

So looks like the default docker registry is not set. Changing this config to this fixes the issue:

    task "redis" {
      driver = "containerd-driver"
      config {
        image = "docker.io/library/redis:3.2"
      }

But it feels like inconsistency and will break Nomad jobs of ppl that are migrating from docker driver to contained driver.

What do you think about setting the default docker registry host?

How do config force_pull?

Hi there!
Docker driver have option force_pull, how do this with containerd?

Add support for sysctl

https://www.nomadproject.io/docs/drivers/docker#sysctl

Logging doesn't work

Currently, if I launch a job (nomad job run redis.nomad) using nomad-driver-containerd and try to tail on its logs (stdout/stderr) using nomad alloc logs command. It doesn't show anything.

nomad alloc logs -f -stderr -job redis

Will show nothing.

As a workaround, right now I have been using ctr (containerd command-line tool) to check on the logs.

$ export CONTAINERD_NAMESPACE=nomad
$ ctr task ls (This will get you the container name, which is prefixed with the allocation ID)
$ ctr task attach <container_name> (This should tail on the logs)

Where are containers logs store?

Where are containers logs store? like docker /var/lib/docker/containers/*/*-json.log

[feature request] windows support

we have several clients that are using nomad in combination with docker, we would like to move them from docker towards containerd , most of their servers are windows machines tho

We are already using the awesome nomad-driver-iis and works like a charm, is there any way or plan for this driver to support installation on windows servers?

thank you

Taking over maintainership

Hi, I am interested in moving this project forward.

I have experience with Nomad, nomad-pack, Kubernetes, Docker (though cannot say that I contributed a lot to these projects). I already maintain several FOSS projects (see my GitHub profile page). I have plans to use nomad-driver-containerd for my daily job, but it is not good enough yet in its current form.

Release 0.9.4

Would it be possible to make a new release, with the support for cgroups v2 included ?

#133 (comment)

It is blocking the upgrade from Ubuntu 20.04 to Ubuntu 22.04

Since the new release comes with cgroups v2 enabled by default

Cannot launch task: stdout.fifo and stderr.fifo already closed

Hello,

I'm trying to get this driver to work with a sample go program that just listens on an http port and prints a message. The task won't launch and looking through the logs I see:

Feb 11 12:06:22 nomad-client nomad[74354]:     2022-02-11T12:06:22.287-0600 [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: failed to read from log fifo: alloc_id=2e96234e-24b1-8a05-77f2-6e6620986232 task=c-hello @module=logmon error="read /opt/nomad/alloc/2e96234e-24b1-8a05-77f2-6e6620986232/alloc/logs/.c-hello.stdout.fifo: file already closed" timestamp=2022-02-11T12:06:22.286-0600

Feb 11 12:06:22 nomad-client nomad[74354]:     2022-02-11T12:06:22.287-0600 [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: failed to read from log fifo: alloc_id=2e96234e-24b1-8a05-77f2-6e6620986232 task=c-hello @module=logmon error="read /opt/nomad/alloc/2e96234e-24b1-8a05-77f2-6e6620986232/alloc/logs/.c-hello.stderr.fifo: file already closed" timestamp=2022-02-11T12:06:22.286-0600

Feb 11 12:06:22 nomad-client nomad[74354]:     2022-02-11T12:06:22.294-0600 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.stdio: received EOF, stopping recv loop: alloc_id=2e96234e-24b1-8a05-77f2-6e6620986232 task=c-hello err="rpc error: code = Unavailable desc = error reading from server: EOF"
Feb 11 12:06:22 nomad-client nomad[74354]:     2022-02-11T12:06:22.296-0600 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=2e96234e-24b1-8a05-77f2-6e6620986232 task=c-hello path=/usr/local/bin/nomad pid=74844

Feb 11 12:06:22 nomad-client nomad[74354]:     2022-02-11T12:06:22.296-0600 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=2e96234e-24b1-8a05-77f2-6e6620986232 task=c-hello

Feb 11 12:06:22 nomad-client nomad[74354]:     2022-02-11T12:06:22.296-0600 [DEBUG] client.alloc_runner.task_runner: task run loop exiting: alloc_id=2e96234e-24b1-8a05-77f2-6e6620986232 task=c-hello

I have verified that the image runs using nerdctl. It also runs using the Nomad docker and podman task drivers.
I was able to launch the redis example using the driver, so I feel like the driver is generally working. Any help or pointers would be greatly appreciated.

Details:

Nomad v1.2.5
Version 0.9.3 of this driver
Host: Debian 5.10.92-1 arm64 running in a Vagrant box on Apple silicon-based MAC

Job File:

job "containerd" {
  datacenters = ["dc1"]

  group "c-service" {
    network {
      port "http" {
        to = 8080
      }
    }
    service {
      name = "c-service"
      tags = ["urlprefix-/"]
      port = "http"

      check {
        type = "http"
        path = "/health"
        interval = "2s"
        timeout  = "2s"
      }
    }


    task "c-hello" {
      driver = "containerd-driver"

      config {
        image = "docker.io/michaelerickson/go-hello-docker:latest"
        host_network = true
        # ports = ["web"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

The code for the service I'm trying to launch (go v 1.17):

package main

import (
	"encoding/json"
	"fmt"
	"log"
	"net"
	"net/http"
	"os"

	"github.com/gorilla/mux"
)

// serviceStatus represents the health of our service
type serviceStatus struct {
	Status string
}

// loggingMiddleware logs all requests to our service
func loggingMiddleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		log.Printf("%s %s", r.Method, r.RequestURI)
		next.ServeHTTP(w, r)
	})
}

// notAllowedHandler is called for all requests that are not specifically
// handled. It returns HTTP not allowed
func notAllowedHandler(w http.ResponseWriter, r *http.Request) {
	log.Printf("%s %s method not allowed", r.Method, r.RequestURI)
	http.Error(w, "Not Allowed", http.StatusMethodNotAllowed)
}

// healthCheckHandler responds to /health and verifies that the service is up
func healthCheckHandler(w http.ResponseWriter, _ *http.Request) {
	status := serviceStatus{Status: "OK"}
	response, err := json.Marshal(status)
	if err != nil {
		log.Printf("JSON error: %s", err)
		http.Error(w, "JSON error", http.StatusInternalServerError)
		return
	}
	w.Header().Set("Content-Type", "application/json")
	w.WriteHeader(http.StatusOK)
	w.Write(response)
}

// rootHandler responds to /
func rootHandler(w http.ResponseWriter, r *http.Request) {
	ctx := r.Context()
	srvAddr := ctx.Value(http.LocalAddrContextKey).(net.Addr)
	response := fmt.Sprintf("Hello, Docker! from: %s\n", srvAddr)
	w.Write([]byte(response))
}

func main() {
	httpPort := os.Getenv("HTTP_PORT")
	if httpPort == "" {
		httpPort = "8080"
	}

	log.Printf("Starting echo service on %s", httpPort)

	r := mux.NewRouter()

	r.HandleFunc("/health", healthCheckHandler)
	r.HandleFunc("/", rootHandler)
	r.Use(loggingMiddleware)

	log.Fatal(http.ListenAndServe(":"+httpPort, r))
}

The dockerfile that builds the image:

# syntax=docker/dockerfile:1

# Multistage build to generate the smallest possible runtime image.

##
## BUILD
##
FROM golang:1.17.6-bullseye AS build

WORKDIR /app

COPY go.mod ./
COPY go.sum ./

RUN go mod download

COPY *.go ./

# Build for linux-arm64
RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -o /docker-gs-ping

##
## Deploy
##
FROM gcr.io/distroless/static

COPY --from=build /docker-gs-ping /docker-gs-ping

EXPOSE 8080

USER nonroot:nonroot

ENTRYPOINT ["/docker-gs-ping"]

[feature request] Docker registry auth via driver config

Registry auth in task config was recently added in #89

This is great! But adding auth info into every Nomad job hcl seems like a bit inconvenient.

Nomad docker drive has an option to set registry auth in drive config: https://www.nomadproject.io/docs/drivers/docker#authentication

And it's very useful from both operations and security perspectives.

Any plans to add it?

Allow mount source to be relative to task working directory

Per hashicorp/nomad#13229 (comment) and #123 we can mount stuff from ${NOMAD_TASK_DIR}/file with relative path local/file but not for ${NOMAD_TASK_DIR}/secrets. I think it should be allowed to have a consistent behavior, and be of convenience to some container that is not easy to change their config seeking behavior.

For reference here's how docker driver doing this for volume bind and simple bind:
https://github.com/hashicorp/nomad/blob/2697e63ad67c254d0d8f1be02a477807fe40c50a/drivers/docker/driver.go#L678-L689
https://github.com/hashicorp/nomad/blob/2697e63ad67c254d0d8f1be02a477807fe40c50a/drivers/docker/driver.go#L1253-L1263

Running with custom containerd snapshotter

So, I am trying to setup a Nomad in Docker kind of dev setup with containerd to run Nomad tasks. Due to the "containerd" on top "docker" situation, the default "overlayfs" snapshotter doesn't work.

With some effort, I have been able to get "fuse-overlayfs" working with containerd inside a docker container. So with some config and env variable updates (namely CONTAINERD_SNAPSHOTTER), I can pull images using ctr and run nerdctl without problems.

Now, I want to run containers via nomad -> nomad-driver-containerd -> containerd. With some experimentation, the only way I can get this to work is using the following patch:

--- a/containerd/containerd.go
+++ b/containerd/containerd.go
@@ -109,6 +109,7 @@ func (d *Driver) pullImage(imageName, imagePullTimeout string, auth *RegistryAut
 
 	pullOpts := []containerd.RemoteOpt{
 		containerd.WithPullUnpack,
+		containerd.WithPullSnapshotter("fuse-overlayfs"),
 		withResolver(d.parshAuth(auth)),
 	}
 
@@ -339,6 +340,7 @@ func (d *Driver) createContainer(containerConfig *ContainerConfig, config *TaskC
 	return d.client.NewContainer(
 		ctxWithTimeout,
 		containerConfig.ContainerName,
+		containerd.WithSnapshotter("fuse-overlayfs"),
 		containerd.WithRuntime(d.config.ContainerdRuntime, nil),
 		containerd.WithNewSnapshot(containerConfig.ContainerSnapshotName, containerConfig.Image),
 		containerd.WithNewSpec(opts...),

The image is pulled and the container started without issues, if I use the above version.

So, I have two questions:

Am I missing something, or is this the only way to override the snapshotter setting? (I assumed that setting the default snapshotter in the containerd config would work, but it did not.)
If this is the only way, would you accept a PR making the snapshotter configurable?

stdin and stdout of existing processes are lost after a restart of nomad

Issue

Nomad is successfully able to reattach a job using nomad-driver-containerd after a restart but stdin and stdout are lost in the process.

Versions

nomad-driver-containerd v0.9.1
nomad v1.1.4

How to reproduce

Consider the following changes to the Vagrant agent configuration to have Nomad run in standard mode, not dev mode to persist state.

$ cat example/agent.hcl 
log_level = "DEBUG"
data_dir = "/tmp/nomad"

plugin "containerd-driver" {
  config {
    enabled = true
    containerd_runtime = "io.containerd.runc.v2"
    stats_interval = "5s"
  }
}

server {
  enabled = true
  bootstrap_expect = 1
  default_scheduler_config {
    scheduler_algorithm = "spread"
    memory_oversubscription_enabled = true

    preemption_config {
      batch_scheduler_enabled   = true
      system_scheduler_enabled  = true
      service_scheduler_enabled = true
    }
  }
}

client {
  enabled = true
  host_volume "s1" {
    path = "/tmp/host_volume/s1"
    read_only = false
  }
}

Run the Hello job

$ nomad job run example/hello.hcl

See new log lines are appended every 3 seconds

$ nomad alloc logs -f -job hello
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.

Restart nomad

$ sudo systemctl restart nomad

See logs aren't appended anymore despite the loop still running

$ nomad alloc logs -f -job hello
Hello world: sleeping for 3 seconds.
Hello world: sleeping for 3 seconds.

ps faux from the guest

$ nomad alloc exec -job hello /bin/ps faux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       221  0.0  0.1  34424  2752 pts/0    Rs+  23:11   0:00 /bin/ps faux
root         1  0.0  0.1  18040  2956 ?        Ss   23:01   0:00 /bin/bash /tmp/
root       220  0.0  0.0   4376   648 ?        S    23:11   0:00 sleep 3

ps faux from the host

root      9112  0.0  0.4 112244  8636 ?        Sl   23:01   0:00 /usr/local/bin/containerd-shim-runc-v2 -namespace nomad -id hello-task-ea92913f-1951-8117-6f4c-b2fcd28e898b -address /run/containerd/containerd.sock
root      9138  0.0  0.1  18040  2956 ?        Ss   23:01   0:00  \_ /bin/bash /tmp/print.sh
root      9632  0.0  0.0   4376   672 ?        S    23:11   0:00      \_ sleep 3

Nomad still knows about the job

vagrant@vagrant:~/go/src/github.com/Roblox/nomad-driver-containerd$ nomad job status hello
ID            = hello
Name          = hello
Submit Date   = 2021-08-30T23:01:06Z
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group   Queued  Starting  Running  Failed  Complete  Lost
hello-group  0       0         1        0       1         0

Latest Deployment
ID          = 5bfe4965
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group   Desired  Placed  Healthy  Unhealthy  Progress Deadline
hello-group  1        1       1        0          2021-08-30T23:11:18Z

Allocations
ID        Node ID   Task Group   Version  Desired  Status    Created     Modified
ea92913f  0b680b8d  hello-group  2        run      running   11m58s ago  11m47s ago

Observations

Looking at open file descriptors for the containerd driver it seems that the driver is not recovering the stdin and stdout named pipes for running jobs.

Before restart

root      9253  0.0  1.8 1339240 38656 ?       Sl   23:02   0:00  \_ /tmp/nomad-driver-containerd/containerd-driver
vagrant@vagrant:~/go/src/github.com/Roblox/nomad-driver-containerd$ sudo lsof -np 9253                                                                                                                     
COMMAND    PID USER   FD      TYPE             DEVICE SIZE/OFF    NODE NAME                                                                                                                                
container 9253 root  cwd       DIR              253,0     4096       2 /                                                                                                                                   
container 9253 root  rtd       DIR              253,0     4096       2 /                                                                                                                                   
container 9253 root  txt       REG              253,0 43373088 3801138 /tmp/nomad-driver-containerd/containerd-driver                                                                                      
container 9253 root  mem       REG              253,0   101168 2097669 /lib/x86_64-linux-gnu/libresolv-2.27.so                                                                                             
container 9253 root  mem       REG              253,0    26936 2097643 /lib/x86_64-linux-gnu/libnss_dns-2.27.so                                                                                            
container 9253 root  mem       REG              253,0    47568 2097645 /lib/x86_64-linux-gnu/libnss_files-2.27.so                                                                                          
container 9253 root  mem       REG              253,0  2030544 2097578 /lib/x86_64-linux-gnu/libc-2.27.so                                                                                                  
container 9253 root  mem       REG              253,0   144976 2097665 /lib/x86_64-linux-gnu/libpthread-2.27.so                                                                                            
container 9253 root  mem       REG              253,0    14560 2097595 /lib/x86_64-linux-gnu/libdl-2.27.so                                                                                                 
container 9253 root  mem       REG              253,0   170960 2097554 /lib/x86_64-linux-gnu/ld-2.27.so                                                                                                    
container 9253 root    0r      CHR                1,3      0t0       6 /dev/null                                                                                                                           
container 9253 root    1w     FIFO               0,12      0t0   54392 pipe                                                                                                                                
container 9253 root    2w     FIFO               0,12      0t0   54393 pipe                                                                                                                                
container 9253 root    3u     unix 0xffff97c5b45a4400      0t0   54398 type=STREAM                                                                                                                         
container 9253 root    4u  a_inode               0,13        0    9576 [eventpoll]                                                                                                                         
container 9253 root    5r     FIFO               0,12      0t0   54397 pipe                                                                                                                                
container 9253 root    6w     FIFO               0,12      0t0   54397 pipe                                                                                                                                
container 9253 root    7u     unix 0xffff97c5b45a6400      0t0   54401 /tmp/plugin735931456 type=STREAM                                                                                                    
container 9253 root    8r     FIFO               0,12      0t0   54402 pipe                          
container 9253 root    9w     FIFO               0,12      0t0   54402 pipe                          
container 9253 root   10r     FIFO               0,12      0t0   54403 pipe                          
container 9253 root   11w     FIFO               0,12      0t0   54403 pipe                          
container 9253 root   12u     unix 0xffff97c5b45a7c00      0t0   54242 /tmp/plugin735931456 type=STREAM
container 9253 root   15u     FIFO              253,0      0t0 3801166 /tmp/nomad/alloc/20294144-56b6-fc55-e971-02028f7c7e2a/alloc/logs/.hello-task.stdout.fifo                                            
container 9253 root   16u     FIFO              253,0      0t0 3801168 /tmp/nomad/alloc/20294144-56b6-fc55-e971-02028f7c7e2a/alloc/logs/.hello-task.stderr.fifo                                            
container 9253 root   17u     FIFO               0,22      0t0     706 /run/containerd/fifo/739129716/hello-task-20294144-56b6-fc55-e971-02028f7c7e2a-stdout
container 9253 root   18u     FIFO               0,22      0t0     710 /run/containerd/fifo/739129716/hello-task-20294144-56b6-fc55-e971-02028f7c7e2a-stderr
container 9253 root   19r     FIFO               0,22      0t0     706 /run/containerd/fifo/739129716/hello-task-20294144-56b6-fc55-e971-02028f7c7e2a-stdout
container 9253 root   20r     FIFO               0,22      0t0     710 /run/containerd/fifo/739129716/hello-task-20294144-56b6-fc55-e971-02028f7c7e2a-stderr

After restart

root     10218  0.3  1.5 1259156 32552 ?       Sl   23:21   0:00  \_ /tmp/nomad-driver-containerd/containerd-driver
vagrant@vagrant:~/go/src/github.com/Roblox/nomad-driver-containerd$ sudo lsof -np 10218
COMMAND     PID USER   FD      TYPE             DEVICE SIZE/OFF    NODE NAME
container 10218 root  cwd       DIR              253,0     4096       2 /
container 10218 root  rtd       DIR              253,0     4096       2 /
container 10218 root  txt       REG              253,0 43373088 3801138 /tmp/nomad-driver-containerd/containerd-driver
container 10218 root  mem       REG              253,0  2030544 2097578 /lib/x86_64-linux-gnu/libc-2.27.so
container 10218 root  mem       REG              253,0   144976 2097665 /lib/x86_64-linux-gnu/libpthread-2.27.so
container 10218 root  mem       REG              253,0    14560 2097595 /lib/x86_64-linux-gnu/libdl-2.27.so
container 10218 root  mem       REG              253,0   170960 2097554 /lib/x86_64-linux-gnu/ld-2.27.so
container 10218 root    0r      CHR                1,3      0t0       6 /dev/null
container 10218 root    1w     FIFO               0,12      0t0   61256 pipe
container 10218 root    2w     FIFO               0,12      0t0   61257 pipe
container 10218 root    3u     unix 0xffff97c5b16a4400      0t0   61273 type=STREAM
container 10218 root    4u  a_inode               0,13        0    9576 [eventpoll]
container 10218 root    5r     FIFO               0,12      0t0   61272 pipe
container 10218 root    6w     FIFO               0,12      0t0   61272 pipe
container 10218 root    7u     unix 0xffff97c5b16a6c00      0t0   61274 /tmp/plugin654816476 type=STREAM
container 10218 root    8r     FIFO               0,12      0t0   61275 pipe
container 10218 root    9w     FIFO               0,12      0t0   61275 pipe
container 10218 root   10r     FIFO               0,12      0t0   61276 pipe
container 10218 root   11w     FIFO               0,12      0t0   61276 pipe
container 10218 root   12u     unix 0xffff97c5b16f6800      0t0   61278 /tmp/plugin654816476 type=STREAM

Running nomad as non-root user with rootless containerd

Hi!

I would like to run nomad locally on my developer machine and connect to locally running containerd that I have started in rootless mode.

Basically I would not want to involve root user when possible in either running containerd or nomad.

However, when I run nomad with:

nomad agent -dev -plugin-dir="/usr/lib/nomad/plugins" -config="./local-nomad.hcl"

where /usr/lib/nomad/plugins contains the containerd-driver
and ./local-nomad.hcl looks like:

plugin "containerd-driver" {
	config {
		enabled = true
		containerd_runtime = "io.containerd.runc.v2"
		stats_interval = "5s"
		allow_privileged = false
	}
}

I get following error:

2022-08-09T17:38:00.462+0300 [ERROR] agent.plugin_loader.containerd-driver: Error in creating containerd client: plugin_dir=/usr/lib/nomad/plugins @module=containerd-driver err="failed to dial \"/run/containerd/containerd.sock\": context deadline exceeded" timestamp="2022-08-09T17:38:00.462+0300"

and nomad dies.

I suppose the containerd-driver is trying to look for the sock file in wrong place. Is there a way to configure the containerd address?

hostname not populated in /etc/hosts for containerd tasks

nomad version: v1.1.2
os version: Linux archlinux 5.9.14-arch1-1 #1 SMP PREEMPT Sat, 12 Dec 2020 14:37:12 +0000 x86_64 GNU/Linux

jobspec:

job "python" {
    datacenters = ["dc1"]
    type = "service"

    group "python" {
        count = 1

        network {
            mode = "bridge"
        }

        task "python" {
            driver = "containerd-driver"
            config {
                image = "python:3.7.11-buster"
                command = "sh"
                args = ["-c", "while true; do sleep 300; done"]
            }
        }
    }
}

after run python task, run nomad exec -task python c81d0472 bash

root@python-c81d0472-e79f-3656-debd-97afada978d1:/# hostname
python-c81d0472-e79f-3656-debd-97afada978d1

root@python-c81d0472-e79f-3656-debd-97afada978d1:/# cat /etc/hostname
debuerreotype

root@python-c81d0472-e79f-3656-debd-97afada978d1:/# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters

root@python-c81d0472-e79f-3656-debd-97afada978d1:/# hostname -I
172.26.64.3

root@python-c81d0472-e79f-3656-debd-97afada978d1:/# hostname -i
hostname: Name or service not known

root@python-c81d0472-e79f-3656-debd-97afada978d1:/# hostname -f
hostname: Name or service not known

I think there are many issue:

hostname python-c81d0472-e79f-3656-debd-97afada978d1 did not in /etc/hostname
should add 172.26.64.3 python-c81d0472-e79f-3656-debd-97afada978d1 to /etc/hosts ?
hostname -i and hostname -f should work?

change task.driver to docker, it will work fine.

so, I think this is bug of containerd driver.

maybe this link is helpful: hashicorp/nomad#10766

Unable to build on clean go install

Hello,

It seems that one of the dependencies is broken and on new go installs the build fails.

+ make build
/usr/bin/go build -o containerd-driver .
go: github.com/hashicorp/[email protected] requires
        github.com/hashicorp/[email protected] requires
        github.com/tencentcloud/[email protected]+incompatible: reading github.com/tencentcloud/tencentcloud-sdk-go/go.mod at revision v3.0.83: unknown revision v3.0.83

I've also added the comment to a bug report on go-discover (hashicorp/go-discover#172)

[feature request] Nomad 1.1+ memory oversubscription support

Hi!

Nomad 1.1 was released today and it comes with a very cool feature of memory oversubscription. You can read about it here:
https://www.hashicorp.com/blog/managing-resources-for-workloads-with-nomad-1-1

They also mention that:

HashiCorp will help third-party drivers support memory oversubscription.

I wonder if it could be supported in the contained driver.

kata-container support ?

how would i enable to run my containers using kata-container not runc

Add support for --runtime

Currently containerd-driver gives the ability to set the runtime at the plugin level

It doesn't allow you to choose runtime at the job level similar to this

Add a flag --runtime to allow selecting runtime per job (Assuming runtime is installed and available on the nomad client node)
This will allow users to run containers with multiple runtimes e.g. runc and runsc (gVisor) on the same node.

Remove network mbits from example jobs.

Details here: #51 (comment)

/cc @robloxrob

v0.9.3 reports as v0.9.2

I don't think v0.9.3 was actually built.

$ wget https://github.com/Roblox/nomad-driver-containerd/releases/download/v0.9.3/containerd-driver
$ chmod 755 containerd-driver
$ ./containerd-driver -v
containerd-driver v0.9.2

Can you fix ASAP? Thanks!

cgroups not getting applied on containers launched using nomad-driver-containerd

When I launch a container using nomad-driver-containerd, and it exceeds its limits, cgroups are not applied and the container doesn't get OOM killed. To give a comparison between docker and nomad-driver-containerd driver:

stress.nomad

job "stress" {
  datacenters = ["dc1"]

  group "stress-group" {
    task "stress-task" {
      driver = "docker"

      config {
        image = "docker.io/shm32/stress:1.0"
      }

      restart {
        attempts = 5
        delay    = "30s"
      }

      resources {
        cpu    = 500
        memory = 256
        network {
          mbits = 10
        }
      }
    }
  }
}

$ nomad job run stress.nomad

When stress.nomad exceeds 500 Mhz of CPU or 256 MB of memory, it's OOM killed.

However when I launch the same job (stress.nomad) using nomad-driver-containerd it keeps running and doesn't get OOM killed.

In the case of docker driver, IIUC docker is managing the cgroups for the container.
The question probably is, how does nomad manage resource constraints (cgroups) on workloads launched by other drivers e.g. QEMU, Java, exec, etc.
Does nomad apply/manage cgroups at the orchestration level?

The same image seems to be pulled in parallel causing disk exhaustion

We have about 100 parameterized job definitions that use the same image config:

config {
        image   = "username/backend:some_tag"

The problem is that disk space is exhausted on Nomad clients and it looks like the reason is that the image is being pulled individually for each job, despite specifying the same exact image with the same tag. When using docker Nomad driver this didn't happen and all jobs made use of a single image that was pulled and extracted once.

I might be wrong on the explanation but this is what I get from multiple (hundreds) of error messages like:

[ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=62ab19a7-4e67-c941-cc39-340394800fa1 task=main error="rpc error: code = Unknown desc = Error in pulling image username/backend:some_tag: failed to prepare extraction snapshot \"extract-138110298-tmpn sha256:bf868a0e662ae83512efeacb6deb2e0f0f1694e693fab8f53c110cb503c00b99\": context deadline exceeded"

I.e. it looks like each allocation has it's own extraction snapshot? Is it possible to configure the driver (or containerd) so that all jobs will share a single image snapshot?

Portmap Capabilities + CNI how to use in the driver

I just started to test how to use the capabilities by passing the parameters on the job description using cap_add, and I created a CNI network like the following:

{
  "cniVersion": "0.3.1",
  "name": "mycoolcnichain",
  "plugins": [
     {
        "type": "bridge",
        "isGateway": true,
        "ipMasq": false,
        "bridge": "mybridge",
        "ipam": {
            "type": "host-local",
            "subnet": "10.10.30.0/24",
            "routes": [
                { "dst": "0.0.0.0/0" }
            ],
         "dataDir": "/run/ipam-out-net"
        },
        "dns": {
          "nameservers": [ "8.8.8.8" ]
        }
    },
    {
      "type": "portmap",
      "capabilities": {"portMappings": true},
      "snat": false
    },
}

I would like to know how to use the CAP_ARGS to specify a port to Redis different from the default Redis port 6379 to 8888, my question is how to pass the CAP_ARGS to achieve that option.

CAP_ARGS should be

CAP_ARGS='{"portMappings":[{"hostPort":8888,"containerPort":6379,"protocol":"tcp","hostIP":"127.0.0.1"}]}

and job should be:

job "redis" {
  datacenters = ["dc1"]

  group "redis-group" {
    task "redis-task" {
      driver = "containerd-driver"

      config {
        image   = "docker.io/library/redis:alpine"
        seccomp = true
        cwd     = "/home/redis"
        cap_add         = ["CAP_ARGS"],
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

[question] Is it production ready?

Hello,

I know this question usually have an answer like "it depends" but I wonder what is the current state of this module from your point of view?

Also, I see in Nomad docs what:

filesystem isolation none

Does it mean that allocations run with it have full access to underlying fs?

Containers (spec) not getting cleaned up after stopping the jobs.

Seeing this intermittently (not easy to reproduce at this point), when I launch multiple nomad jobs

e.g.

$ nomad job run redis.nomad
$ nomad job run capabilities.nomad
$ nomad job run privileged.nomad
$ nomad job run stress.nomad

After the containers (tasks) are running. Stop the containers (tasks).

$ nomad job stop redis
$ nomad job stop capabilities
$ nomad job stop privileged
$ nomad job stop stress

This stops all the tasks. However, containers (spec) are left behind and not cleaned up properly.
This was working before, and there could likely be a regression.

root@linux:/opt/gopath/src/github.com/nomad-driver-containerd/example# ctr task ls
TASK    PID    STATUS
root@linux:/opt/gopath/src/github.com/nomad-driver-containerd/example# ctr container ls
CONTAINER                     IMAGE                             RUNTIME
32522383_lucid_noyce5         docker.io/library/ubuntu:16.04    io.containerd.runc.v2
63c86855_agitated_cray6       docker.io/library/ubuntu:16.04    io.containerd.runc.v2
ba8f121f_ecstatic_volhard3    docker.io/library/ubuntu:16.04    io.containerd.runc.v2
bcddf7ad_jolly_bartik7        docker.io/library/redis:alpine    io.containerd.runc.v2
fb07ad0a_focused_thompson9    docker.io/library/redis:alpine    io.containerd.runc.v2

Forward Redis port 6379

I'm currently evaluating nomad to replace our lxd stack.

The standard docker driver works well, but I can't figure out what it brings; so I'd like to get rid of it, and use containerd directly instead - I followed your instructions and hello.nomad runs without issues.

However I'm testing your redis.nomad now and I can't find out how to forward ports. Redis runs and listens on its internal port 6379, but I can't forward this port to the host (dynamic or static port). I tried many combinations without success - host_network = true forwards to port 6379.

Do you please have a working example? Or, do I really have to install CNI plugin to get a working bridge, and have a config file in /opt/cni/config (nomad.hcl / client / cni_config_dir)? Where can I find such a config file?

Registry authentication

Hi, thanks for opensourcing this plugin.
I try to set it up with a private registry (gitlab), but I need to authenticate before pulling images. As I can see the auth config stanza is not supported.
Am I missing something ? Is it something doable ?

[question] Will Nomad Consul Connect Envoy proxy work with containerd driver?

Hello,

By default, Nomad launches the Envoy proxy as a docker container: https://www.nomadproject.io/docs/job-specification/sidecar_task#default-envoy-configuration

I wonder if it could successfully run with a containerd driver.

CPU cgroup quotas are not properly applied

Related to Issue #8 and PR #44,

CPU quotas/periods are currently not properly applied. Currently the built-in Nomad docker driver has the capability to enable hard CPU limiting via https://www.nomadproject.io/docs/drivers/docker#cpu_hard_limit

There is also special flags around period at https://www.nomadproject.io/docs/drivers/docker#cpu_cfs_period

there is no ip address in the lo network

there is no ip address(127.0.0.1/8) in the lo network, so app can not listen on 127.0.0.1

root@bypass-route:/# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 2e:f4:d0:00:17:83 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.2/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2cf4:d0ff:fe00:1783/64 scope link 
       valid_lft forever preferred_lft forever

when change to docker driver, the lo network is fine.

root@9b68f72d354e:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if118: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 32:2c:db:6a:dc:f0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.107/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever

the job file:

job "test2" {
  datacenters = ["dc1"]

  group "test2" {

    network {
      mode = "bridge"
    }

    task "test2" {
      driver = "containerd-driver"
      config {
        image           = "docker.io/library/ubuntu:20.04"
        command         = "sleep"
        args            = ["600s"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Is there any suggestion here?

thanks.

Support bind mount propagation mode

In hashicorp/nomad#13229 (comment) and hashicorp/nomad#13229 (comment) I found out that the lack of passing through bind propagation mode breaks CSI drivers that stage volumes in their container first before publishing. Implementing it should unbreak these drivers.

[feature requests] extra_hosts support

Hi,

It'd be nice to see the same feature as in Docker driver: https://www.nomadproject.io/docs/drivers/docker#extra_hosts

roblox / nomad-driver-containerd Goto Github PK

nomad-driver-containerd's Introduction

nomad-driver-containerd

Overview

nomad-driver-containerd architecture

Requirements

Building nomad-driver-containerd

Screencast

Wanna try it out!?

Run Example jobs.

Supported options

Authentication (Private registry)

Networking

Port forwarding

Service discovery

Tests

Cleanup

Currently supported environments

License

nomad-driver-containerd's People

Contributors

Stargazers

Watchers

Forkers

nomad-driver-containerd's Issues

Issue

Versions

How to reproduce

Observations

Recommend Projects

Recommend Topics

Recommend Org