Code Monkey home page Code Monkey logo

rundeck-nomad-plugin's Introduction

Rundeck Nomad Plugin

This is early work. Use with extreme caution!

Purpose

This is a Workflow Step plugin for submitting jobs to a Nomad cluster via Rundeck UI. The plugin interacts with a Nomad server via HTTP API.

Rundeck is a popular and well established automation tool. It features, besides other things, a rich customizable UI, role-based access control, scheduling, logging, alerts, cli and API support and an already extensive plugin ecosystem. It fits well into the CI/CD pipelines (for instance, there is a Jenkins Rundeck integration plugin). Rundeck does not lend itself easily to running in HA mode or scaling worker nodes. Therefore, it seems a good idea put a distributed scheduler such as Nomad behind it to offload resource-intensive jobs.

Alt Screenshot

Installation

  • Download and start Rundeck. It will automatically create the necessary directories.
  • Clone this repository. Test and build using gradle wrapper:
      ./gradlew test
      ./gradlew build
    
  • Drop rundeck-nomad-plugin-<version>.jar to libext/ under Rundeck installation directory.
  • Restart Rundeck.

Usage

Download Nomad and start an agent in server mode. For evaluation you may use development mode for zero-configuration start.

In Rundeck UI create a new project and a new job in that project. Under "Add a Step" section swich to "Workflow Steps" tab. If the plugin was recognized successfully, you should see "Run Docker container on Nomad" in the list of the available workflow step plugins. Click on the plugin entry to bring up the input form, fill in Nomad agent URL, docker image name and any other available fields. Save and run the job.

What is in scope

Currently the scope is limited to batch and service jobs of simple structure (1 job, 1 task group, 1 task). The reason is such jobs fit well into the Rundeck operating model and map onto the available UI configuration in a straightforward way. It is possible to set the task count within the task group thereby increasing parallelism where that matters.

Nomad supports a range of Drivers to execute tasks. At the moment only Docker driver task configuration is supported by the plugin. However, best effort has been made to isolate driver-specific code and make the extension process simple.

Job lifecycle

Monitoring of the running jobs is performed in several stages the outcome of which is reported in the log output. Please consult Nomad documentation for the relevant terminology. First it is checked if the job has been successfully submitted to the scheduler. Then it is verified if the job passed the evaluation (evaluation ID is reported). Depending on the desired task count the corresponding number of allocations will be placed by Nomad. Some or all of the allocations may fail for various reasons (resource limitations, driver error, etc), however, the job as a whole can only have pending, running or dead status which may not be representative of the success/failure of the outcome. Hence, in order to allow for some flexibility, we poll for the status of the individual allocations and raise an error if more than a configurable percentage of them end up in a failed status.

Note that logs from individual tasks are not streamed here. Given the arbitrary number of task instances that can be deployed it could be challenging to read all of their streams into Rundeck output. Some support for that may be added in future.

Nomad supports scheduling of periodic jobs and defining restart policies, and also Nomad SDK implements time-outs and back-off strategy for all API calls. However, all of the above settings also belong to core functionality of Rundeck. Therefore, in order to avoid confusion, it was decided to delegate them to Rundeck job-level configuration. That is why API calls are configured to wait indefinitely and periodic stanza from Nomad job specification is not supported. It may be implemented in future, if this plugin is enhanced to be able to deploy long running services.

Alt Screenshot

Minimal version requirements

  • Java 1.8
  • Rundeck 2.9.x
  • Nomad 0.6.0

Similar projects

Thanks

TODO

  • Better test coverage
  • Driver support
  • More detailed logging
  • TLS support
  • Contraints configuration

rundeck-nomad-plugin's People

Contributors

valfadeev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rundeck-nomad-plugin's Issues

Ability to create multiple groups and tasks

Hey! Awesome project, thank you.
But what about support for multiple groups and tasks in that groups?

Our goal is to "stick" docker containers to datacenters, because of it in our case, we have one job, two groups - one per datacenter, and one task per that groups.

Monitor evaluation status

If an evaluation is triggered during the job runtime (e.g. if a node is lost) we should refresh the list of allocations, as new ones may have been placed. This probably requires making a blocking query against the original evaluation, using modify index, in a separate thread.

Track deployment status

For jobs launched with service scheduler and update_strategy defined check if deployment has been created successfully, otherwise fail Rundeck execution.

Support deployment of long-running services

By design the plugin currently only supports launching batch Nomad jobs whose lifecycle is fully contained within the lifecycle of a Rundeck job.

In order for the plugin to be successfully used in CI/CD pipelines it would be necessary to support deploying long-running services where Rundeck job only hands over the workload to Nomad and exits. There are many settings in the job specification, for example resource limits, namespaces or port mappings, that have no relevance during the build, say in Jenkins. It would be good for separation of concerns to encapsulate them in a Rundeck job definition and only expose parameters such as image/tag name, and let Jenkins trigger the job.

We want to minimise the likelihood of scenarios where Rundeck job execution indicates success, while the job fails to run as expected on Nomad and is potentially left in an inconsistent state. We also want to limit the scope of responsibility for Rundeck, because other mechanisms, such as service health checks, should kick in to monitor the job status at runtime.

Some less obvious decisions need to be made here.

  • What would be a reasonable "handover policy" from Rundeck to Nomad? Should we wait for a configurable amount of time, once all allocations a re running, and indicate success if they have run without failures during that period? What if the tasks go through a number of restarts before reaching a healthy state (e.g. members of a cluster discovering each other), do we then reset the wait and start again?
  • What if deployment fails (according to whatever criteria we have defined)? Do we attempt to deregister the job? Should we make this optional?
  • How do we map this onto the UI? Add a drop-down to select job type? How do we minimise the number of job-type-specific controls with implicit dependencies?
  • How do we make this compatible with the existing code so we can still support all drivers equally for all job types

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.