sapcc / kubernikus Goto Github PK

View Code? Open in Web Editor NEW

135.0 45.0 26.0 43.18 MB

Kubernetes as a Service for Openstack

License: Apache License 2.0

Makefile 1.11% Go 92.51% Smarty 0.75% Shell 1.12% HTML 3.22% CSS 0.51% Dockerfile 0.60% Mustache 0.14% Procfile 0.04%

kubernetes caas kaas k8s golang openstack

kubernikus's Introduction

Kubernikus

Kubernikus is "Kubernetes as a Service" for Openstack.

It allows to easily manage Kubernetes clusters that are natively integrated with Openstack. The architecture is designed to facilitate the operation as a managed service.

Features

Architectured to be operated as a managed service
Masters are managed centrally
Nodes are decentralized in customer's projects
100% Vanilla Kubernetes
100% Compatible Openstack API
Air-Gapped Masters and Nodes
Full TLS encryption between all components
Auto-Updating nodes based on CoreOS Container Linux
Authentication Tooling
Unified Authorization Policy between Openstack and Kubernetes RBAC

Guiding Principles

Running Kubernetes using Kubernetes
Automation is driven by Operators
Cloud Native Tooling: Golang, Helm, Swagger, Prometheus

Prerequisites

Openstack (including LBaaS)
Kubernetes Seed-Cluster (1.7+)

Documentation

License

This project is licensed under the Apache2 License - see the LICENSE file for details

kubernikus's People

Stargazers

Watchers

kubernikus's Issues

Multi-Kluster Route Conflicts

Multiple Klusters on the same private network/router currently are assigned overlapping podcidrs. This crashes the openstack route-controller with an array out of bound exception. This is turn sends the controller-manager for all clusters into crash loops.

We need to make sure that the podcidrs can't overlap. Or introduce a convention like:

Validate PodCIDR assignement
1 Kluster per Project
1 Kluster per Router
?

Openstack CloudProvider Reauth

There's a bug in gophercloud that causes reauthentication to fail. It's missing to add a scope.

Naturally, this update is missing up to at least Kubernetes 1.6.4. The symptoms are that various Openstack related functionality stops working after 24h. It affects apiserver, controller-manager and to some extend kubelets. Manually restarting the components fixes the trouble.

There's an issue and patch:
kubernetes/kubernetes#45545
kubernetes/kubernetes#44461

Until this is cherry-picked (if it gets picked at all) we need to include the fix into our own image.

Start Smoke/Integration Test Suite

Create a test suite for executing e2e integration tests. Ideally in Go using existing client functionality.

Initially build a smoke test that:

Create a Kluster via API
Wait for Kluster to become ready
Wait for Nodes to become ready
Fetch credentials via API
Schedule Pods. Wait for ready
Test Pod-to-Pod communication (maybe steal from kube-detective?)
Create Service. Test internal connectivity
Expose Service using LoadBalancer type. Test external connectivity
Volume creation and attachment.
Delete Kluster via API. Test for successful removal of all debris.

Ideally, the suite consists of single tests instead of one big monolithic test. It should be executable from a laptop and allow to replay a part of the suite to reproduce failures locally.

Fix Restriction on Length of API URL

At the moment, the name of the kluster can only be 1 character long... Otherwise, the nginx-controller freaks out and completely refuses to load its config. With that bringing a whole region down... 😄 😄 😄 😄

We need our own ingress, in order to not interfere with the business critical control plane
We need to configure it to accept longer hostnames
Ultimately, to have a clean seperation, Kubernikus needs to move out of the Openstack control plane.

RKT Pods for Kubelet + Wormhole Client

Stick the kubelet and wormhole into the same rkt pod.

Bootstrapped Node Certificate Gets Deleted on Reboot

We're using the TLS bootstrap method for creating node specific client certificates. Upon the first time the kubelet is started it will create a CSR and request a certificate. The bootstrap mechanism creates:

/var/lib/kubelet/kubeconfig
/var/run/kubernetes/kubelet-client.crt

Now, as long as the kubeconfig file exists the registration will not run again. /var/run is either mounted as tmpfs or cleared on reboot. This will delete the kubelet certificates.

Bubble Up Events Log

Attach k8s events to the Kluster TPR for important events in the cluster, e.g. Quota Exceeded while spawning nodes. Persistence are events in k8s.

Bubble up via API for display in Elektra.

Versioned Helm Releases

Add API call for getting kluster credentials

We need an API call to get a personalised kubeconfig.

Maybe /v1/clusters/{name}/credentials or something.

This should generate a cert which contains the users uid as a CN or something.

Requires #18

Add CORS support

For the elektra integration we need to allow CORS requests in the apiserver.

probaply using https://github.com/rs/cors

Create javascript client

For the elektra plugin we need a javascript client.
I would like to have that generated from the swagger document. Its a little overkill now but hopefully saves us time as the api expands.
We need to consult with @andypf regarding the elektra integration general. A Javascript client they can't use would not be helpful

Detect when kluster is ready

At the moment groundctl only implements one state transition

KlusterPending -> KlusterCreating which basically means helm install has returned.

We need to monitor the deployed kluster components and determine when the cluster is ready and flip the state to KlusterReady. We also should have a timeout and flip the state to KlusterError after 30 minutes or something.

Kluster Garbage Collector

When a kluster is deleted we need a controlled demolition that tears down all Openstack objects that have been created.

Routes
LoadBalancers, Listeners, Pools
Volumes

apiserver: Improve logs

currently we just do a little bit of glog and a simple access log.

There are a two things I would like to do.

Add authenticated user and project to access log. So that we can see which user triggered what api call.
Add a request_id so that we can match the glog lines to an HTTP request

Seed ClusterRoleBindings

For some, yet unknown reason, the kube-controller-manger goes insane when using the dedicated service account credentials via --use-service-account-credentials. It gets into a endless loop adding hundreds of invalid tokens to the created service accounts. This needs to be fixed but in the meantime, we run it without the individual service accounts. Unfortunately, the system:kube-controller-manager user is missing the roles to approve CSRs. To "fix"/hack add the admin role:

kubectl create clusterrolebinding hack-cmadmin --clusterrole=cluster-admin --user=system:kube-controller-manager

The bootstrap token rolled out to nodes is entangled with the kubelet-bootstrap user. It needs permissions to create the CSR for bootstrapping a node:

kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap

Now the nodes will create the CSR and hang until it is approved. In order to automatically approve the request the certificate-controller needs to have permissions:

cat <<EOF | kubectl create -f
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: approve-node-client-csr
rules:
- apiGroups: ["certificates.k8s.io"]
  resources: ["certificatesigningrequests/nodeclient"]
  verbs: ["create"]
EOF

kubectl create clusterrolebinding auto-approve-csrs-for-group --clusterrole=approve-node-client-csr --group=system:bootstrappers

Fix Continuous Watch Error

Running groundctl we get this error every few minutes or so:

W0721 10:45:47.755885   19415 reflector.go:323] github.com/sapcc/kubernikus/pkg/controller/ground/controller.go:101: watch of *v1.Kluster ended with: 401: The event in requested index is outdated and cleared (the requested history has been cleared [492650215/492637872]) [492651214]
I0721 10:45:48.756031   19415 reflector.go:236] Listing and watching *v1.Kluster from github.com/sapcc/kubernikus/pkg/controller/ground/controller.go:101

It annoys me like hell and cludders up the otherwise nice log of groundctl.

Its not clear to me why I get this error. Seems to be its related to the client-go version because I had not seen this with operators using older versions.

Cluster-State Aware LaunchController

The LaunchController needs to take the kluster state into account. Not all information is available as soon as the Kluster resource is created.

GroundControl enriches it before the Kluster goes into state Creating.

We need to wait for at least this preparation to have happened.

Add Kube-Proxy to Nodes

Add README

Cleanup and enhance spec

Right now the TPR is very basic:

apiVersion: kubernikus.sap.cc/v1
kind: Kluster
metadata:
  annotations:
    creator: D062284
  labels:
    account: 8b25871959204ff1a27605b7bcf873f7
  name: test-8b25871959204ff1a27605b7bcf873f7

spec:
  account: 8b25871959204ff1a27605b7bcf873f7
  name: test
status:
  message: Creating Cluster
  state: Creating

So basically the user can only choose the name of the cluster.

A few things I have in mind:

The Spec probably needs fields for:
- Specifying the desired kubernetes version
- Specifying the number of nodes (kubelets), including size, AZ
- Specifying the Router and subnet ID all nodes should be deployed to
- Specifying the region (in case we have more then one)
- ...
We also need a place to store runtime state we generate about the cluster (probaply Status but I'm not sure this fits 100%). At minimum we need to store
- The open stack server user name, password and domain, author
- All CA certs and keys

Add AZ to NodePools

The NodePools need to take an AZ parameter. For starters the cluster should reside completely in a single AZ. It's unclear how to handle multi-AZ clusters as mounting volumes will not work out-of-the box.

Versioning for Helm Charts

Configurable Defaults

Make Openstack things configurable in theory:

Router
Network
AVZ

Add to KlusterSpec

Its kluster not cluster stupid

To avoid name clashes with all other types of clusters (e.g. etcd operator) I named the TPR Kluster. We should stick to that internally. Its not consistent at the moment, sometimes I named the variable cluster something klusters.

I don't think we should change the user facing api though.

Proper Resource Limits for the Kluster Pods

We should have resource limits for the customer components in the parent cluster in place from the start.

At the moment I just added this for every pod:

  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 128Mi

This obviously is too low because the components start failing with zero payload on them.
We need same sane defaults and probably also a something in the TPR to modify those. The customer should not be able to modify those so they should be outside Spec. See #3

Improve NodeAPI

Enforce Uniqueness of NodePool Name
Make NodePool Name required
Make NodePool Name, Flavor and Image unchangable

Implement nodes controller

We need a control loop that creates and destroys nodes according to spec.

This required to watch the TPR on one side but also reconcile with the state in openstack. (e.g. If instances get deleted, a replacement needs to be spawned)

This Task depends on #3 meaning we need to spec out how to describe nodes in the TPR.

I'm not clear atm if this would be a separate binary called launchctl or part of groundctl.

Unified Length for Internal Name Representation

kubernikus/pkg/tpr/v1/types.go

Line 8 in aa951b1

Name string `json:"name"`

Kube-Proxy br_netfilter Missing

The kube-proxy still doesn't load the br_netfilter module. Depending on the start order of things the module might not be loaded when the proxy starts.

Results in:

Sep 22 07:05:49 kubernikus-m21-small-rczz6.novalocal rkt[1264]: E0922 07:05:49.192655    1264 proxier.go:1601] Failed to execute iptables-restore: exit status 2 (iptables-restore v1.4.21: Couldn't load target `KUBE-MARK-DROP':No such file or directory
Sep 22 07:05:49 kubernikus-m21-small-rczz6.novalocal rkt[1264]: Error occurred at line: 28
Sep 22 07:05:49 kubernikus-m21-small-rczz6.novalocal rkt[1264]: Try `iptables-restore -h' or 'iptables-restore --help' for more information.
Sep 22 07:05:49 kubernikus-m21-small-rczz6.novalocal rkt[1264]: )
Sep 22 07:06:19 kubernikus-m21-small-rczz6.novalocal rkt[1264]: E0922 07:06:19.108706    1264 proxier.go:1601] Failed to execute iptables-restore: exit status 2 (iptables-restore v1.4.21: Couldn't load target `KUBE-MARK-DROP':No such file or directory
Sep 22 07:06:19 kubernikus-m21-small-rczz6.novalocal rkt[1264]: Error occurred at line: 28
Sep 22 07:06:19 kubernikus-m21-small-rczz6.novalocal rkt[1264]: Try `iptables-restore -h' or 'iptables-restore --help' for more information.
Sep 22 07:06:19 kubernikus-m21-small-rczz6.novalocal rkt[1264]: )
...

Deleting a Kluster via API Fails

W0912 22:05:03.483088   43050 ground.go:111] Error running handler: rpc error: code = Unknown desc = release: "e2-ae63ddf2076d4342a56eb049e37a7621" not found
I0912 22:05:03.491514   43050 ground.go:285] Deleting openstack user kubernikus-e2-ae63ddf2076d4342a56eb049e37a7621@kubernikus
E0912 22:05:03.777527   43050 ground.go:190] Failed to terminate kluster e2-ae63ddf2076d4342a56eb049e37a7621: no kind "DeleteOptions" is registered for version "kubernikus.sap.cc/v1"
W0912 22:05:03.777554   43050 ground.go:111] Error running handler: no kind "DeleteOptions" is registered for version "kubernikus.sap.cc/v1"

Kube-Proxy Can't Find NodeIP

Sep 22 07:05:49 kubernikus-m21-small-rczz6.novalocal rkt[1264]: W0922 07:05:49.039086    1264 server.go:787] Failed to retrieve node info: nodes "kubernikus-m21-small-rczz6.novalocal" not found
Sep 22 07:05:49 kubernikus-m21-small-rczz6.novalocal rkt[1264]: W0922 07:05:49.039463    1264 proxier.go:483] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP

NodePorts don't work. Load-Balancing not working...

Instance Termination Strategy

What do we do with instances that go to Error, Shutdown, ... ?
What happens when a NodePool is scaled down?

Which Instance would be terminated?
Cross-Referencing with NodeStatus?

What about testing?

We need tests.

The controller logic tends to be an untestable mess that pulls everything together.

We should do some refactoring to pull some of our own logic out into testable modules and cover them with unit tests.

We should also have an integration test suite that tests the whole thing.

Kluster State Reflector

Reflect the Kluster/Node State/Resource Usage back into KlusterStatus.

Ideas:

Controller in Customer Namespace collects state
Recycle Prometheus exporters /metrics and kube-state-metrics

Generate swagger.yaml from Spec

Remove Dependency OpenstackSeeder

The OpenstackSeeder is the only dependency Kubernikus has to Converged Cloud. It is being used for initial creation of service accounts. De-seeding/Deletion of the service account is done internally already. By doing the seeding ourselves too, we can make Kubernikus independant of CCloud.

Openstack Client Cache Doesn't Invalide Deleted Klusters

Happy failed authentications when creating a Kluster with the same name within a day.

Github Workflow

Automate project management nuisances:

https://probot.github.io/apps/
https://github.com/lgtmco/lgtm
https://github.com/zalando/zappr

Add Kube-Proxy to Nodes

Run Kube-Proxy on spawned nodes using RKT wrapped in a nice systemd ~~present~~ unit.

Sane Infrastructure Setup

Put namespaced Kubernikus instances into the Admin cluster. Use them to build up regional Kubernikus Control Clusters.

Move from Openstack Control Planes to regional virtualized Kubernikus Control Planes
Manage Kubernikus with Kubernikus
Utilise Admin cluster for managing regional Kubernikus Control Planes

Kluster persistence

We need to store generated values for a cluster somewhere:

CA certs and keys
openstack cloud provider settings
- service user
- router id
- lb subnet
- etc...

I propose we don't stuff them in the Kluster TPR resource but a secret named the same as the cluster. We already have this secret created by the helm release and it would be removed automatically when a cluster is terminated.

Implement Kluster State Awareness

Wait with starting to create instances until Kluster is creating (or created?)
Delete Instances when Kluster is Deleted

Implement groundctl Reconcilliation

I skipped the reconciliation step for watching the Kluster TPR in groundctl for now.

The entry point for this is here

Log/Tracing Utility

The operator log is already messy. Need a clear way to filter it by controller, kluster, project...

Tracing Lib?
Easy Mode Util?

Implement cluster edit

Implement kluster deletion

This needs a general decision how we want to delete the Kluster:

React to the TPR deletion event. That would be here
Set the TPR status to something like Deleting and only delete the TPR once everything is removed from k8s and other backend systems.

I favour approach number 2 because it maintains visibility during cluster deletion (same like it is with pods). If we do that we should at least also thing about garbage collecting clusters that have their TPR manually deleted.

Valid TLS Certificates for Kube-Apiserver

Spawns Too Many Nodes

For some reason, occasionally the listing of nodes falsely returns 0 nodes. Then the controller spawns need nodes. The old ones appear again and there is too many... Hm?

I0909 09:22:22.872757   55252 launch.go:127] Handling kluster michi-8b25871959204ff1a27605b7bcf873f7
I0909 09:22:23.807830   55252 client.go:302] Listing nodes for c1370e95-e45b-4b48-80f5-c5a478118006/ff3ab
I0909 09:22:24.626139   55252 client.go:315] Found node 0232117e-125b-4c2e-afa9-a1d5e55f0a05
I0909 09:22:24.626170   55252 client.go:315] Found node 30f7d016-d794-48e1-90dd-33c9893a85e7
I0909 09:22:24.626379   55252 launch.go:153] Pool michi-8b25871959204ff1a27605b7bcf873f7/ff3ab: Running 2/2. All good. Doing nothing.
I0909 09:22:27.572206   55252 launch.go:68] Running periodic recheck. Queuing all Klusters...
I0909 09:22:27.572328   55252 launch.go:127] Handling kluster michi-8b25871959204ff1a27605b7bcf873f7
I0909 09:22:28.416157   55252 client.go:302] Listing nodes for c1370e95-e45b-4b48-80f5-c5a478118006/ff3ab
I0909 09:22:31.690324   55252 launch.go:147] Pool michi-8b25871959204ff1a27605b7bcf873f7/ff3ab: Running 0/2. Too few nodes. Need to spawn more.
I0909 09:22:31.690360   55252 launch.go:160] Pool michi-8b25871959204ff1a27605b7bcf873f7/ff3ab: Creating new node
I0909 09:22:31.690411   55252 client.go:337] Creating node kubernikus-ff3ab-vwsls
c5a478118006/ff3ab

Manange Cluster Addons

Implement a way to automatically add cluster addons. First throw kube-dns