Code Monkey home page Code Monkey logo

wardroom's Introduction

wardroom

Wardroom provides tooling that helps simplify the deployment of a Kubernetes cluster. More specifically, Wardroom provides the following functionality:

  • Image Building: Building of Kubernetes-ready base operating system images using Packer and Ansible.
  • Deployment Orchestration: Ansible-based orchestration to deploy highly-available Kubernetes clusters using kubeadm.

Both use cases share a common set of Ansible roles that can be found in the ansible directory.

Image Building

Wardroom leverages Packer to build golden images of Kubernetes deployments across a wide variety of operating systems as well as image formats. During the build phase, Wardroom leverages Ansible to configure the base operating system and produce the Kubernetes-ready golden image.

This functionality is used to create base images for the Heptio aws-quickstart.

Supported Image Formats

  • AMI

Supported Operating Systems

  • Ubuntu 16.04 (Xenial)
  • Ubuntu 18.04 (Bionic)
  • CentOS 7

Deployment Orchestration

The swizzle directory contains an Ansible playbook that can be used to orchestrate the deployment of a Kubernetes cluster using kubeadm.

Documentation

Documentation and usage information can be found in the docs directory.

Contributing

See our contributing guidelines and our code of conduct. Contributions welcome by all.

Development

Vagrant may be used to test local ansible playbook development. In this scenario, Vagrant makes use of the ansible provisioner to configure the resulting operating system image. To test all operating systems simultaneously:

vagrant up

You may also selectively test a single operating system as such:

vagrant up [xenial|bionic|centos7]

To enable verbose ansible logging, you may do so by setting the WARDROOM_DEBUG environment variable to 'vvvv'.

The default Vagrant provisioner is Virtualbox, but other providers are possible by way of the vagrant-mutate plugin.

wardroom's People

Contributors

alexbrand avatar chuckha avatar craigtracey avatar detiber avatar johnharris85 avatar johnschnake avatar jpweber avatar lander2k2 avatar liztio avatar randomvariable avatar ryanchapple avatar scottslowe avatar stevesloka avatar timothysc avatar vincepri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wardroom's Issues

Enumerate necessary permissions

We should enumerate the permissions necessary for the packer builds, per cloud provider. It can be frustrating when this is unclear, and this would make a much better user experience.

AWS region now hard-coded in packer.json

Commit cab02b6 (part of PR #121) hardcodes the "aws_region" variable in packer/packer.json, setting the value to "us-east-1". This causes the instructions in packer/README.md for building AWS images to break for regions other than us-east-1.

It looks like there are (at least) three potential fixes:

  1. Update packer/README.md to instruct users to add -var aws_region=<region> for building AMIs in regions other than us-east-1.
  2. Update packer/packer.json to remove the hard-coded value for the "aws_region" variable.
  3. Update the AWS region variable files (aws-us-east-1.json and aws-us-west-2.json) to set the "aws_region" variable for that specific region.

If you will let me know which approach you prefer for the project, I'm happy to craft a PR implementing that approach. Thanks!

etcd source url hardcoded

In corporate envs github may not be directly available. Currently the etcd url is hardcoded as a var, which doesn't seem to be able to be overridden by host/group vars set elsewhere. Also, group "etcd" is hard coded, but if user has differently named hosts and is just consuming the roles this can't be overridden.
Moving those to etcd/defaults/main.yml seems to allow users to override as needed

@craigtracey

Fix deprecation warnings

There's a couple of places where we are using deprecated ansible features/instructions/commands.

TASK [common : install baseline dependencies] **********************************************************************************************************************************************************
[DEPRECATION WARNING]: Invoking "apt" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: {{ item }}`, please use
`name: '{{ common_debs }}'` and remove the loop. This feature will be removed in version 2.11. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
TASK [docker : install docker] *************************************************************************************************************************************************************************
[DEPRECATION WARNING]: Invoking "apt" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: {{ item }}`, please use
`name: ['docker-ce={{ docker_debian_version }}']` and remove the loop. This feature will be removed in version 2.11. Deprecation warnings can be disabled by setting deprecation_warnings=False in
ansible.cfg.
[DEPRECATION WARNING]: Invoking "apt" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: {{ item }}`, please use
`name: ['docker-ce={{ docker_debian_version }}']` and remove the loop. This feature will be removed in version 2.11. Deprecation warnings can be disabled by setting deprecation_warnings=False in
ansible.cfg.
TASK [docker : add docker dependencies] ****************************************************************************************************************************************************************
[DEPRECATION WARNING]: Invoking "yum" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: {{ item }}`, please use
`name: ['device-mapper-persistent-data', 'lvm2']` and remove the loop. This feature will be removed in version 2.11. Deprecation warnings can be disabled by setting deprecation_warnings=False in
ansible.cfg
TASK [kubernetes : install kubernetes packages] ********************************************************************************************************************************************************
[DEPRECATION WARNING]: Invoking "apt" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: {{ item }}`, please use
`name: ["kubelet={{ kubernetes_version | kube_platform_version('debian') }}", "kubeadm={{ kubernetes_version | kube_platform_version('debian') }}", "kubectl={{ kubernetes_version |
kube_platform_version('debian') }}", "kubernetes-cni={{kubernetes_cni_version | kube_platform_version('debian') }}"]` and remove the loop. This feature will be removed in version 2.11. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
TASK [kubernetes : install the kubernetes yum packages] ************************************************************************************************************************************************
[DEPRECATION WARNING]: Invoking "yum" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: {{ item }}`, please use
`name: ["kubelet-{{ kubernetes_version | kube_platform_version('redhat') }}", "kubeadm-{{ kubernetes_version | kube_platform_version('redhat') }}", "kubectl-{{ kubernetes_version |
kube_platform_version('redhat') }}", "kubernetes-cni-{{kubernetes_cni_version | kube_platform_version('redhat')}}"]` and remove the loop. This feature will be removed in version 2.11. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.

Remove swap from fstab

Currently the code simply disables swap at runtime. We should also remove any entries from fstab.

This is currently in ansible/roles/kubernetes-common/tasks/main.yml

kubelet config file

I just noticed, running kubeadm init that --pod-manifest-path has been deprecated in favor of the kubelet's --config flag.

This feels like it might be an upstream thing where the kubelet package should be updated (maybe it's already under way). I want to track this somewhere since it's a deprecation warning. Feel free to close if this is not the right place for that.

Apr 30 18:22:32 ip-172-31-38-145 kubelet[3157]: Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ is the full message I saw.

Clean up AWS usage

Right now the images are created and live in hepito private dev accounts. The AMIs are made public so it's not really a big deal, but it is hard to track/cleanup if we ever need to. Images are also subject to disappearing if a user's account is deleted and all resources are cleaned up.

cc @asenchi @stevesloka

Install docker-py only when required

We currently install docker-py unconditionally as part of the docker installation.

Docker-py is required to use the ansible docker_image module. This module is used in roles/kubernetes/tasks/main.yml to download and cache docker images on the nodes.

The caching of images is only performed when kubernetes_enable_cached_images == true. It seems like we could make the installation of docker-py conditional on that same variable.

IP routing being configured in multiple places

Right now IP routing is being configured in places it shouldn't be:

  • ansible/roles/kubernetes-cni/tasks/flannel.yml
  • ansible/roles/common/tasks/redhat.yml

As this is always required (for both images as well as swizzle), move these sysctl settings to a common place (maybe the kubernetes role?).

Add kubelet cloud provider arg

Wardroom should be putting the appropriate kubelet --cloud-provider flag in the systemd conf file.

We could reasonably edit the file directly after installing the package and not try to use KUBELET_EXTRA_ARGS.

[swizzle] Kubernetes packages are being installed on etcd nodes

Currently, the swizzle playbook is installing the kubernetes packages (and prereqs) to the etcd nodes. This is fine when deploying stacked masters, but seems unnecessary when deploying etcd as an external cluster on dedicated nodes.

In the external cluster scenario, there is no need for docker, kubelet, kubectl, etc on the etcd machines, given that etcd is running as a systemd service.

Set hostname to FQDN

For AWS cloud provider integration, it's necessary for the hostname of the node to match AWS' private DNS entry. Is it worth us building a task/role into Wardroom to set the hostname to the FQDN so that it matches the AWS private DNS entry?

Enable TLS for etcd

Right now etcd is running without TLS. We should explore options for having wardroom generate and apply certs.

etcd role listed as dependency for k8s-master

In /roles/kubernetes-master/meta/main.yml etcd is a dependency of master. Swizzle.yml already installs etcd to nodes with ansible role etcd, but this dependency also installs etcd to the masters - not desirable when etcd and masters are separate computers (vagrant file seems to have them the same)

@craigtracey

Dog-fooding

Ideally we use this for helping create the quick-start.

Dunno if this will be in ready in time for the next iteration.

/cc @chuckha

Firewall rules

Create the punches up front to avoid the logistics of running around.

Wardroom is not compatible with Ubuntu 18.04

This is a longer-term issue and not an immediate need, so feel free to defer until a later time.

I did some testing today with Ubuntu 18.04 to see if Wardroom could be used to build Kubernetes-ready images. I found problems with a number of areas:

  • Installing Docker (can't install CE 17.03.x on Ubuntu 18.04, none of the repositories have it)
  • The Kubernetes role fails when adding the repo (looks for "bionic" when it should still be "xenial")

There may be other problems; this was as far as my limited testing allowed me to discover. I know that this isn't urgent/needed right now, but wanted to capture this information for use later when this is something we want to tackle.

[swizzle] Support environments that are behind a proxy

Many large organizations typically require all internet-bound traffic to traverse an HTTP proxy.

We should support the ability to specify proxy configuration and flow that config to the swizzle tasks that reach out to the internet.

Output Artifact -> meta container

Today we produce images for VM infra, but for on prem environments there are a set of use cases that could benefit from a meta-container that lumps all of the repos deps into a local container.

@luxas created a container like this in the past, but the purpose of using wardroom is to allow others to generate the artifact(s).

/cc @detiber

Playbooks specify Docker CE 17.03.2 but actually install 18.03.1

The "docker" role in the Ansible playbooks specifies to install Docker CE 17.03.2, but when they are run against a CentOS 7.4 host (and this presumably will affect RHEL also, although I haven't tested yet) they actually install Docker CE 18.03.1. This version isn't yet validated for use with Kubernetes (although it may work just fine).

This output was taken from a CentOS 7.4 host after running the playbooks:

[centos@ip-10-16-13-114 ~]$ docker --version
Docker version 18.03.1-ce, build 9ee9f40
[centos@ip-10-16-13-114 ~]$ 

I ran into this problem recently with a customer. When I have a moment, I'll see about creating a PR to port the changes I implemented to fix this issue.

NTP knobs..

Need to setup sane defaults for NTP and have a knob'd override for large scale vendors.

create a python module for the scripts

As we script this with python, we will have dependencies that need to be managed. Likewise, we may want to take advantage of some of the features of being an actual module (ie. entry points). So, make of python module of this code.

[dev branch] kubeadm template depends on kubeadm token which gets created after template is rendered

The kubeadm.conf template contains the kubeadm token, but it looks like the token is no longer generated before the template is rendered.

Changeset of interest: fc7cce9

TASK [kubernetes-master : drop kubeadm template] ************************************************************************************************************************************************************************************************************
fatal: [master-0 -> 54.202.136.38]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'generated_token' is undefined"}

Intermittent kubeadm failures due to /proc/sys/net/bridge/bridge-nf-call-iptables

Using Wardroom to generate a CentOS 7.4-based AMI results in intermittent failures running kubeadm on instances launched from said AMI. The kubeadm error is:

	[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1

This is using kubeadm version 1.11.1 on CentOS 7.4.

Use of "primary_master" group breaks use of playbooks apart from provision.py

The use of the "primary_master" inventory group (introduced via #29) breaks the use of the Ansible playbooks apart from running them with the provision.py script (which creates and populates the "primary_master" group).

A potential workaround is to use a nested group in the Ansible inventory file, like this:

[primary_master:children]
<name of group that contains master nodes>

This would allow use of the randomisation functionality in provision.py as well as preserve functionality when using the playbooks apart from the script, but it does require specific configuration in the inventory in order to work.

What do you think about adding `jq` to the base image?

I have found this is a fairly common utility and it would simplify the aws-quickstart to have jq already present on the image.

Is this reasonable for wardroom to have or should consumers be modifying the image as needed?

Add support for image building for Oracle Cloud

There is a need for enabling Wardroom to build base images on Oracle Cloud. It looks as if HashiCorp Packer (as of 1.2.0 or later) includes a builder to build base OS images for Oracle Cloud, so this should be a fairly straightforward addition.

Won't add nodes if masters have already initialized

The "kubernetes-master" role uses kubeadm token generate to output a new token but not actually create the token on the server. This generated token is captured in a variable which is then used in a templated configuration file supplied to kubeadm init on the master(s). If the masters have already been initialized, then the token is never actually created on the server, and therefore can't be used to join nodes to the cluster. The end result is that the playbooks cannot be used to configure nodes once the masters have been initialized.

Node tests fail to run

https://github.com/heptiolabs/wardroom/tree/master/packer#testing-images mentions testing nodes directly via e2e_node.test but there have been upstream changes that seem to have broken this functionality at this time.

I get issues with the kubelet failing to respond to readiness checks even though the cluster otherwise is in a healthy state and passes conformance tests (via sonobuoy).

Errors in the logs look like:

Failure [130.525 seconds]
[BeforeSuite] BeforeSuite
_output/dockerized/go/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:142

  Node 1 disappeared before completing BeforeSuite

  _output/dockerized/go/src/k8s.io/kubernetes/test/e2e_node/e2e_node_suite_test.go:142
------------------------------

And the kubelet itself has lots of warnings:

06:12:48 ip-10-0-21-210 kubelet[2270]: W0126 06:12:48.570608    2270 container.go:409] Failed to create summary reader for "/libcontainer_9469_systemd_test_default.slice": none of the resources are being tracked.
Jan 26 06:12:58 ip-10-0-21-210 kubelet[2270]: W0126 06:12:58.309639    2270 setters.go:144] replacing cloudprovider-reported hostname of ip-10-0-21-210.ec2.internal with overridden hostname of ip-10-0-21-210.ec2.internal
Jan 26 06:12:58 ip-10-0-21-210 kubelet[2270]: W0126 06:12:58.598792    2270 container.go:523] Failed to update stats for container "/libcontainer_9542_systemd_test_default.slice": failed to parse memory.failcnt - open /sys/fs/cgroup/memory/libcontainer_9542_systemd_test_default.slice/memory.failcnt: no such file or directory, continuing to push stats
Jan 26 06:12:58 ip-10-0-21-210 kubelet[2270]: W0126 06:12:58.599096    2270 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/libcontainer_9542_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/libcontainer_9542_systemd_test_default.slice: no such file or directory
Jan 26 06:12:58 ip-10-0-21-210 kubelet[2270]: W0126 06:12:58.599243    2270 raw.go:87] Error while processing event ("/sys/fs/cgroup/devices/libcontainer_9542_systemd_test_default.slice": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/devices/libcontainer_9542_systemd_test_default.slice: no such file or directory

Add tests

This repo needs tests. Minimally, we should:

  • attempt to build images upon PR
  • use molecule to run unit tests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.