giantswarm / kvm-operator Goto Github PK

View Code? Open in Web Editor NEW

89.0 16.0 13.0 69.99 MB

Handles Kubernetes clusters running on a Kubernetes cluster with workers and masters in KVMs on bare metal

Home Page: https://godoc.org/github.com/giantswarm/kvm-operator

License: Apache License 2.0

Go 98.09% Dockerfile 0.13% Makefile 1.44% Mustache 0.34%

kubernetes operator bare-metal

kvm-operator's Introduction

kvm-operator

kvm-operator manages Kubernetes clusters running on-premises in KVM VMs within a host Kubernetes cluster.

Getting the Project

Download the latest release: https://github.com/giantswarm/kvm-operator/releases/latest

Clone the git repository: https://github.com/giantswarm/kvm-operator.git

Download the latest docker image from here: https://quay.io/repository/giantswarm/kvm-operator

How to build

go build github.com/giantswarm/kvm-operator

Pre-commit Hooks

kvm-operator uses pre-commit to ensure that only good commits are pushed to the git repository. It will have no effect unless pre-commit hooks have been installed after cloning the repository on your develpoment machine. First, ensure that it is installed with pip install pre-commit or brew install pre-commit (macOS). Then, install the git hooks in the root of the kvm-operator directory with pre-commit install. Any future git commits will automatically run the automated checks which include the following:

end-of-file-fixer: Adds a final newline to any files missing one.
trailing-whitespace: Removes trailing whitespace from all committed files.
no-commit-to-branch: Prevents committing to master, main, and release-* branches.
check-merge-conflict: Ensures that no merge conflict markers are found in source files.
go-test-repo-mod: Ensures that all tests pass (go test ./...).
go-imports: Ensures that imports are correctly sorted.
golangci-lint: Ensure that golangci-lint run finds no problems.
go-build: Ensures that go build returns no errors.
go-mod-tidy: Ensures that go mod tidy doesn't change go.sum.

Note: goimports and golangci-lint should be available in your $PATH for these to run.

Architecture

The operator uses our operatorkit framework. It watches KVMConfig CRs using a generated client stored in our apiextensions repo. Workload clusters each have a version known as a "workload cluster version" which defines a tested set of component versions such as Kubernetes and CoreDNS and are managed as Release CRs on the control plane.

The operator provisions Kubernetes workload clusters running on-premises. It runs in a Kubernetes management cluster running on bare metal or virtual machines.

Controllers and Resource Handlers

kvm-operator contains four controllers each composed of one or more resource handlers:

cluster-controller watches KVMConfigs and has the following handlers:
- clusterrolebinding: Manages the RBAC and PSP role bindings used by WC node pods
- configmap: Ensures that a configmap exists for each desired node containing the rendered ignition
- deployment: Ensures that a deployment exists for each desired node
- ingress: Manages the Kubernetes API and etcd ingresses for the WC
- namespace: Manages the namespace for the cluster
- nodeindexstatus: Manages node indexes in the KVMConfig status
- pvc: Manages the PVC used to store WC etcd state (if PVC storage is enabled)
- service: Manages the master and worker services
- serviceaccount: Manages the service account used by WC node pods
- status: Manages labels reflecting calculated status on WC objects
deleter-controller watches KVMConfigs and has the following handlers:
- cleanupendpointips: Ensures that worker and master Endpoints only contains IPs of Ready pods
- node: Deletes Nodes in the WC if they have no corresponding MC node pod
drainer-controller watches Pods and has the following handlers:
- endpoint: Ensures that worker and master Endpoints exist and contain IPs of Ready pods only
- pod: Prevents node pod deletion until draining of the corresponding WC node is complete
unhealthy-node-terminator-controller watches KVMConfigs and has the following handlers:
- terminateunhealthynodes: Deletes node pods when nodes are not ready for a certain period of time

Kubernetes Resources

The operator creates a Kubernetes namespace per workload cluster with a service and endpoints. These are used by the management cluster to access the workload cluster.

Certificates

Authentication for the cluster components and end-users uses TLS certificates. These are provisioned using Hashicorp Vault and are managed by our cert-operator.

Endpoint management

Every Service object in Kubernetes generally has a corresponding Endpoints object with the same name, but if a Service has no pod selectors, Kubernetes will not create an Endpoints object automatically. This allows the IPs a Service points to be managed with a separate controller to point to any IPs (pod IPs or external IPs). kvm-operator uses this approach to manage worker and master Endpoints objects for WCs.

Endpoints are managed by two different resource handlers, drainer-controller's endpoint handler, and deleter-controller's cleanupendpointips handler. deleter-controller reconciles when events for a KVMConfig are received whereas the drainer-controller updated when events for a Pod are received. This allows the operator to add or remove endpoints when the cluster changes (such as during scaling) or when a pod changes (such as when draining an MC node or when a pod becomes NotReady).

Remote testing and debugging

Using okteto, we can synchronize local files (source files and compiled binaries) with a container in a remote Kubernetes cluster to reduce the feedback loop when adding a feature or investigating a bug. Use the following commands to get started.

Start okteto

Download the latest Okteto release from https://github.com/okteto/okteto/releases.
Ensure that architect is available in your $PATH (https://github.com/giantswarm/architect).
Run make okteto-up.
From this point on, you will be in a remote shell. You can modify files on your local machine, and they will be synced to the remote pod container.

Build and run operator inside remote container

go build
./kvm-operator daemon --config.dirs=/var/run/kvm-operator/configmap/ --config.dirs=/var/run/kvm-operator/secret/ --config.files=config --config.files=dockerhub-secret

Remote debugging with VS Code or Goland

Install delve debugger in the remote container cd /tmp && go get github.com/go-delve/delve/cmd/dlv && cd /okteto
Start delve server in the remote container dlv debug --headless --listen=:2345 --log --api-version=2 -- daemon --config.dirs=/var/run/kvm-operator/configmap/ --config.dirs=/var/run/kvm-operator/secret/ --config.files=config --config.files=dockerhub-secret.
Wait until debug server is up (should see API server listening at: [::]:2345).
Start the local debugger.
If you make any changes to the source code, you will need to stop the debugging session, stop the server, and rebuild.

Clean up

Run make okteto-down.

Contact

Mailing list: giantswarm
Bugs: issues

Contributing & Reporting Bugs

See CONTRIBUTING for details on submitting patches, the contribution workflow as well as reporting bugs.

For security issues, please see the security policy.

License

kvm-operator is under the Apache 2.0 license. See the LICENSE file for details.

Credit

kvm-operator's People

Contributors

Stargazers

Watchers

Forkers

callmegar alexxnica kryndex vaibhavpage curx tidehc socioprophet mdheller swipswaps do4 mihaimelonari stefan06ro

kvm-operator's Issues

Use a hash id to identify the workers

We currently use the scaling group but we need to replace this by a short hash to distinguish the workers of each customer

ping @giantswarm/team-positive

calico certs are missing

We dont create etcd certs for calico on master

core@localhost ~ $ ./kubectl -n kube-system logs calico-node-bmvz0 calico-node
time="2017-06-07T09:06:37Z" level=info msg="NODENAME environment not specified - check HOSTNAME" 
time="2017-06-07T09:06:37Z" level=info msg="Loading config from environment" 
Calico node failed to start
ERROR: Error accessing the Calico datastore: open /etc/kubernetes/ssl/etcd/client-crt.pem: no such file or directory

core@localhost /etc/kubernetes/ssl/etcd $ ls
server-ca.pem   server-crt.pem  server-key.pem

we need to add

client-ca.pem   client-crt.pem  client-key.pem

Sort out service types

The services we create could use some unification.

The master and worker services should be of type clusterIP

Don't use LoadBalancer service type for api and etcd endpoint

on-prem does not support service type LoadBlancer
we don't even need NodePort that is exposed, because we use ingress controller for external traffic.

Regular service is enough for api and etcd service.

WIP Make the core of the controller scalable

We should be able to handle multiple concurrent calls, and probably also separate the logic of the reconciliation than the one of creation. So we could have diff routings that are doing their work in parallel without blocking each other...

TBD (to be defined)

New cluster is not created if deletion is in progress

I saw this few times and last time very clearly in anubis.

All services are healthy as i created cluster 10 min before issue.

How to reproduce:

Create cluster, make sure namespaces created (Better wait few minutes)
Delete cluster
wait 10 sec and create new cluster.

Expected result:

namespace (pods, etc.) for new cluster wll be created

Actual result:

only flannel namespaces created

Running tests locally is a bit weird

A README with such information could help

Vendor latest k8scloudconfig and manually disable RBAC for compatibility

We can not enable RBAC until we have old clusters w/o RBAC. This blocks vendoring latest k8scloudconfig. To avoid this we agreed to:

vendor latest k8scloudconfig to kvm-operator
apply patch manually on vendored k8scloudconfig with AllowAll default RBAC policy

Example cluster is not up to date

apiVersion should be cluster.giantswarm.io/v1 instead of kvm.cluster.giantswarm.io/v1
cidr in calico section should be int, not string

Allow graceful VM shutdown

Currently we are terminating VMs in hard way. By default kubernetes sends SIGTERM to container, qemu by default just shuts down itself and don't try to shutdown OS. As result we having hard to debug networking issues (tcp connections stuck in ESTABLISHED state), potentially we can have etcd filesystem corruption.

To prevent this we can use qemu monitor that exposes interface to functions like "system_powerdown", which sends ACPI signals to OS. So for every kubectl delete pod we will have graceful CoreOS shutdown.

Restart of master pod loses all etcd data

Problem is that etcdshare p9 volume is not mounted inside VM.

for me following unit fixed the problem

cat /etc/systemd/system/etc-kubernetes-data-etcd.mount
[Unit]                              
Wants=etcd3.service                 
Before=etcd3.service                

[Mount]                             
What=etcdshare                      
Where=/etc/kubernetes/data/etcd     
Type=9p                             
Options=rw

Anyway this is only part 1 of the problem. Second, if we'll reschedule pod to another node, it will anyway lose all data, because it's stored in local host dir. I'll create separate issue for that.

Validate the correctness of the cluster specification

TBD

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Update giantswarm modules (github.com/giantswarm/cluster-api, github.com/giantswarm/k8sclient/v5, github.com/giantswarm/k8scloudconfig/v10, github.com/giantswarm/microendpoint, github.com/giantswarm/microerror, github.com/giantswarm/microkit, github.com/giantswarm/micrologger, github.com/giantswarm/statusresource/v3, github.com/giantswarm/to, github.com/giantswarm/versionbundle, quay.io/giantswarm/golang)
Update misc modules (actions/checkout, alpine, github.com/google/go-cmp, github.com/prometheus/client_golang, github.com/spf13/viper, golang.org/x/sync)
Update giantswarm modules (major) (github.com/giantswarm/apiextensions/v3, github.com/giantswarm/badnodedetector, github.com/giantswarm/certs/v3, github.com/giantswarm/cluster-api, github.com/giantswarm/k8sclient/v5, github.com/giantswarm/k8scloudconfig/v10, github.com/giantswarm/microendpoint, github.com/giantswarm/microkit, github.com/giantswarm/micrologger, github.com/giantswarm/operatorkit/v5, github.com/giantswarm/randomkeys/v2, github.com/giantswarm/statusresource/v3, github.com/giantswarm/tenantcluster/v4, github.com/giantswarm/versionbundle)
Click on this checkbox to rebase all open PRs at once

Ignored or Blocked

These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.

Update module sigs.k8s.io/controller-runtime to v0.6.5
Update k8s modules to v0.20.15 (k8s.io/api, k8s.io/apimachinery, k8s.io/client-go)

Detected dependencies

circleci

.circleci/config.yml

dockerfile

Dockerfile

quay.io/giantswarm/golang 1.16.6

alpine 3.14.0

github-actions

.github/workflows/tidy.yml

actions/checkout v3.0.0

gomod

go.mod

go 1.16

github.com/giantswarm/apiextensions/v3 v3.27.1

github.com/giantswarm/badnodedetector v1.0.1

github.com/giantswarm/certs/v3 v3.1.1

github.com/giantswarm/errors v0.3.0

github.com/giantswarm/k8sclient/v5 v5.11.0

github.com/giantswarm/k8scloudconfig/v10 v10.8.1

github.com/giantswarm/microendpoint v0.2.0

github.com/giantswarm/microerror v0.3.0

github.com/giantswarm/microkit v0.2.2

github.com/giantswarm/micrologger v0.5.0

github.com/giantswarm/operatorkit/v5 v5.0.0

github.com/giantswarm/randomkeys/v2 v2.1.0

github.com/giantswarm/statusresource/v3 v3.1.0

github.com/giantswarm/tenantcluster/v4 v4.1.0

github.com/giantswarm/to v0.3.0

github.com/giantswarm/versionbundle v0.2.0

github.com/google/go-cmp v0.5.6

github.com/prometheus/client_golang v1.11.0

github.com/spf13/viper v1.8.1

golang.org/x/sync v0.0.0-20210220032951-036812b2e83c@036812b2e83c

k8s.io/api v0.18.19

k8s.io/apimachinery v0.18.19

k8s.io/client-go v0.18.19

sigs.k8s.io/controller-runtime v0.6.4

github.com/dgrijalva/jwt-go/v4 v4.0.0-preview1

github.com/giantswarm/cluster-api v0.3.13-gs

Check this box to trigger a request for Renovate to run again on this repository

Specify CPU/memory limits on k8s-kvm container

Follow up issue, as requested in https://github.com/giantswarm/giantswarm/issues/1173#issuecomment-346309295

node-controller: increase intitial delay for healthcheck

When we create a new cluster, node-controller is restarted about 6 times because k8s-api is not yet ready.

We should set initial health check delay to 1 or 2 minutes to avoid this.

cc @r7vme

define unique hostname for each kvm

calico is using kvm hostname to distinguish between separate machines in k8s cluster, but in kVM we have localhost for all

and this causes the problem because etcd looks like this

/calico/ipam/v2
/calico/ipam/v2/host
/calico/ipam/v2/host/localhost
/calico/ipam/v2/host/localhost/ipv4
/calico/ipam/v2/host/localhost/ipv4/block
/calico/ipam/v2/host/localhost/ipv4/block/192.168.20.0-26
/calico/ipam/v2/host/localhost/ipv6
/calico/ipam/v2/host/localhost/ipv6/block

so each machine rewrite data

rename K8S_DOMAIN to K8S_KUBEDNS_DOMAIN

See

k8s-endpoint-updates deployment produces errors like sleep: invalid number 'inf'

I see a lot of errors like this

kubectl logs master-ipp39-1445104292-m4s30 -n k6dxf -c k8s-endpoint-updater

sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'
sleep: invalid number 'inf'

Bind guest master pods to host master nodes

Guest master pods are very critical. So we should treat them respectively. In particular, we usually can not just recreate master pod w/o issues 1.

Currently to upgrade host worker we need to make sure which master pods will be affected and manually verifu cluster health afterwards.

I propose to bind guest masters to host masters, this at least will relax worker node upgrades.

Put under template - spec into deployment

          nodeSelector:
            role: master

README

The readme should explain a couple of things.

Where does this project live in the over all scheme of things?
How do I get started with it? A simple invocation or example with output of what it does is always handy.
How do I run the tests? (Even if it is 'known' how to do it, it is always nice to have a reminder for others, and sometimes there are some tags, or special things you expect people to set up before running the test. This should be documented as well)

We have a template for this: https://github.com/giantswarm/example-opensource-repo

You can pick out what makes most sense to you. leave out the future steps and stuff if it's not clear yet.

Prioritize this issue as you see fit for what you are currently working on.

use kubeadm (latest) k8scloudconfig

We had to pin k8scloudconfig to the version before introducing kubeadm, as it is broken and we're testing other things.

We have to wait till #115 is done.

Race error: pods created before service causes endpoint problem

if pods for deployments are created earlier than service for ingress then the k8s-endpoint-updater will fail and not retry

this causes missing endpoints and broken cluster

calvix@00008df14a32b2b9 ~ $ kubectl -n t4rap get pods    
NAME                            READY     STATUS    RESTARTS   AGE
master-p61mj-523644078-mz7gr    2/2       Running   0          1h
worker-0dsfg-3018408861-n8g48   2/2       Running   0          1h
worker-8zojk-1871625835-mtfrh   2/2       Running   0          1h
worker-i677v-1818930606-fb9sh   2/2       Running   0          1h
worker-izln0-385779568-9vts2    2/2       Running   0          1h
worker-rdal3-3874929347-8h9j5   2/2       Running   0          1h
worker-sj3q1-2735442007-l97gv   2/2       Running   0          1h
worker-we4zo-3555175652-mn5d3   2/2       Running   0          1h
calvix@00008df14a32b2b9 ~ $ kubectl -n t4rap get ep  
No resources found.
calvix@00008df14a32b2b9 ~ $ kubectl -n t4rap logs master-p61mj-523644078-mz7gr k8s-endpoint-updater
{"caller":"github.com/giantswarm/k8s-endpoint-updater/command/update/command.go:100","info":"start updating Kubernetes endpoint","time":"17-09-20 10:55:53.033"}
{"caller":"github.com/giantswarm/k8s-endpoint-updater/command/update/command.go:184","debug":"creating in-cluster config","time":"17-09-20 10:55:53.033"}
{"IP":"172.18.1.194","caller":"github.com/giantswarm/k8s-endpoint-updater/command/update/command.go:251","debug":"found pod info","pod":"master-p61mj-523644078-mz7gr","time":"17-09-20 10:55:53.036"}
{"caller":"github.com/giantswarm/k8s-endpoint-updater/command/update/command.go:110","error":"[{/go/src/github.com/giantswarm/k8s-endpoint-updater/command/update/command.go:110: } {/go/src/github.com/giantswarm/k8s-endpoint-updater/command/update/command.go:260: } {/go/src/github.com/giantswarm/k8s-endpoint-updater/service/updater/updater.go:62: } {services \"master\" not found}]","time":"17-09-20 10:55:53.086"}
calvix@00008df14a32b2b9 ~ $ kubectl -n t4rap get svc                                          
NAME      CLUSTER-IP       EXTERNAL-IP   PORT(S)                           AGE
master    172.18.209.66    <pending>     2379:31416/TCP,443:32161/TCP      1h
worker    172.18.219.132   <pending>     30010:32203/TCP,30011:31821/TCP   1h

I have stil bad feeling about the EP updater. I woudl rather see some operator which is looking for pods on each namespaces and if it sees some vm up it will add it to the endpoints, If the machine went down it should remove the endpoint. It could do also some live checks for the endpoint ports to see it should be there.

Add command-line specifiable selector

We're probably gonna want to run test builds of the cluster-controller on G8S

If we can specify a selector for the cluster-controller by the command line, which the listwatch respect, we should be able to run multiple cluster-controllers, and have them work on specifically labelled cluster resources

e.g: production cluster-controller runs with --selector=cluster=production
and a test cluster-controller runs with --selector=cluster=joe-test-1

The production cluster only works on cluster resources labelled with cluster=production, and this exemplar test cluster-controller only works on cluster resources labelled with cluster=joe-test-1

Cleanup VM disks

I only have 3 Kvm clusters

~|⇒ kubectl get Kvm -n default                                
NAME      KIND
8tjx9     Kvm.v1.cluster.giantswarm.io
b6huy     Kvm.v1.cluster.giantswarm.io
n3j8e     Kvm.v1.cluster.giantswarm.io

But still have all vm disks on host nodes

1.2G    /home/core/vms/4lps5
36M     /home/core/vms/57wr9
629M    /home/core/vms/62mbj
1.6G    /home/core/vms/71pje
18M     /home/core/vms/8oqig
11M     /home/core/vms/8tjx9
1.7G    /home/core/vms/90qeh
1.6G    /home/core/vms/920jv
1.2G    /home/core/vms/9pi7f
21M     /home/core/vms/b6huy
18M     /home/core/vms/el8d6
1.2G    /home/core/vms/hayzo
18M     /home/core/vms/ieknr
275M    /home/core/vms/n3j8e
36M     /home/core/vms/npe2g
18M     /home/core/vms/px12j
1.1G    /home/core/vms/ud8f9
18M     /home/core/vms/udrvb
1.2G    /home/core/vms/v3zlm
38M     /home/core/vms/w9x70```

Flannel and Master/Worker pod should be scheduled on the same node.

Currently in CarGarantie we have 8 nodes in G8s. When i create 4 node guest cluster, i'm getting some worker/master pods stucked in Init state. This is because flannel pods were not scheduled on node.

Inter-pod affinity should be used.

flannel-client-2199377980-6n9ng   2/2       Running    0          1m        10.4.10.4   10.4.10.4
flannel-client-2199377980-h3w9d   2/2       Running    0          1m        10.4.10.2   10.4.10.2
flannel-client-2199377980-l7jzs   2/2       Running    0          1m        10.4.10.6   10.4.10.6
flannel-client-2199377980-nz62p   2/2       Running    0          1m        10.4.10.3   10.4.10.3
master-djhlh-1305051120-rgrj6     0/1       Init:1/2   0          1m        10.4.10.8   10.4.10.8
worker-n22vz-184020069-nq2mz      0/1       Init:0/1   0          1m        10.4.10.5   10.4.10.5
worker-vs72f-3153587355-kbnjz     1/1       Running    0          1m        10.4.10.6   10.4.10.6
worker-z16s4-2445011952-ztv82     0/1       Init:0/1   0          1m        10.4.10.7   10.4.10.7

add persistenVolume option for master etcd

So I want to implement another option for storage for kvm master node. atm we use hostPath which is not reliable everywhere but at-least works.
On lycan we have working storage class so the missing thing is to create pvc and specify it in master deployment.
My thinking is that i add some option to configmap which will be either hostPath or persistenVolume

My current issue is that i dont have much knowledge about all the projects and their connection. Could someone give me hints where i have to put the new configuration value in order to be use it here https://github.com/giantswarm/kvm-operator/blob/master/service/resource/master/deployment.go#L15.

use k8scloudconfig in KVM operator

This should be done after #64.

Reschedule of master pod will cause data etcd loss

We are storing data in local directory on host, this means that data will be lost if for some reason kubernetes decides to move pod to another node (e.g. node failure)

Update worker endpoints

Currently, we don't update the worker endpoints.

E.g:

joseph on dev-workspace-16bg3 at 11:13:54 in /home/joseph
$ kubectl describe -n=m3e5c svc worker
Name:                   worker
Namespace:              m3e5c
Labels:                 app=worker
                        cluster=m3e5c
                        customer=api-test-dece152271
Selector:               app=worker,cluster=m3e5c,customer=api-test-dece152271
Type:                   NodePort
IP:                     172.31.35.191
Port:                   http    4194/TCP
NodePort:               http    31183/TCP
Endpoints:              10.0.4.17:4194,10.0.4.19:4194
Session Affinity:       None

these endpoints are based off of the selector here: https://github.com/giantswarm/kvm-operator/blob/master/resources/worker.go#L270, which is incorrect - the IPs 10.0.4.17:4194, 10.0.4.19:4194 and 10.0.4.29:4194 are used by all the worker service endpoints

we should update the endpoints with the actual VM ips (as with the master vm)

Remove old worker endpoint if pod was rescheduled

After recreating all my workers i have following ips for kubectl get ep/worker -n XXX -o yaml. Cluster has only 3 workers.

subsets:
- addresses:
  - ip: 172.23.0.130
  - ip: 172.23.0.246
  - ip: 172.23.0.90
  - ip: 172.23.0.94
  - ip: 172.23.1.114
  - ip: 172.23.1.158
  - ip: 172.23.1.94

We need a VERSION control file

Remove node-controller conditions from the code, when all clusters will have proper value in TPO

See the comment #221 (comment)

kvm-operator: recover from failures or just die

After etcd broke on lycan and k8s was throwing errors, kvm-operator stuck and even after the etcd state was recovered it was unable to recover and had to be restarted (by killing pod)

I would expect that it can recover when all other components are fine.

Problem with this is that kvm-operator was running and it doesn't look like there was any problem, it just silently failed and not doing anything else anymore.

last error message

{"caller":"github.com/giantswarm/kvm-operator/service/operator/service.go:107","error":"[{/go/src/github.com/giantswarm/kvm-operator/vendor/github.com/giantswarm/operatorkit/tpr/tpr.go:207: creating TPR kvm.cluster.giantswarm.io} {/go/src/github.com/giantswarm/kvm-operator/vendor/github.com/giantswarm/operatorkit/tpr/tpr.go:271: creating TPR kvm.cluster.giantswarm.io} {etcdserver: mvcc: database space exceeded}]","time":"17-09-13 10:10:56.616"}

Reporting a vulnerability

Hello!

I hope you are doing well!

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

Remove `ClusterSpec`

It lives in https://github.com/giantswarm/clusterspec/

Write tests

kvm-operator|master⚡ ⇒ go test
?       github.com/giantswarm/kvm-operator      [no test files]

Feature request: Allow configuring external openID connect providers

For details please see https://github.com/giantswarm/cargarantie/issues/8#issuecomment-319385402

In general only three additional flags should be put into k8s-apiserver unit

--oidc-issuer-url=https://keycloak-intern.cargarantie.com/auth/realms/CG-Test \
--oidc-client-id=k8s-login-test \
--oidc-username-claim=email \

Add ingress controller per customer

We need to add an ingress controller per customer afaik.

ping @JosephSalisbury @giantswarm/team-positive

flannel-client does not create bridge on LW

flannel-client logs:

I0703 16:38:33.287780       7 main.go:132] Installing signal handlers
I0703 16:38:33.288029       7 manager.go:124] Searching for interface using 10.0.4.114
I0703 16:38:33.299774       7 manager.go:149] Using interface with name bond0.3 and address 10.0.4.114
I0703 16:38:33.299823       7 manager.go:162] Using 10.0.4.114 as external address

Then k8s-network-bridge says:

Waiting for /run/flannel/networks/br-l4etk.env to be created

This is because /run/flannel/networks is empty on the container, and k8s-network-bridge waits for file like this https://github.com/giantswarm/k8s-network-bridge/blob/1c3c3d7292481ba8bbb959eb678401c834c83821/docker-entrypoint.sh#L10

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (github>whitesource/merge-confidence:beta)

Flannel client should use certificate for etcd connection

Our host cluster etcd has enabled client auth

We should use some certificate for flannel client.

We should mount etcd-certs volume inside flannel-client container and few flannel options

--etcd-keyfile="": SSL key file used to secure etcd communication.
--etcd-certfile="": SSL certification file used to secure etcd communication.
--etcd-cafile="": SSL Certificate Authority file used to secure etcd communication.

fix pod 1.6 pod affinity

Details here: https://github.com/giantswarm/cargarantie/issues/27

inject KVM specifics into k8scloudconfig via extensions interface

See https://github.com/giantswarm/k8scloudconfig/pull/9/files#r105893501. This should be done prior to #65.

pin-worker-to-worker-nodes

we should pin worker vms to worker nodes

this is currently blocked by lycan only having master nodes

flannel needs to run on all masters machines

When we spawn less kvm than we have available nodes in g8s, we can reach a problem when ingress controller pod is running on node without flannel interface for one cluster. This means that this ic pod can't forward traffic to that guest cluster api/etcd/ingress.

Example:
we have machine nodes 1 2 3 4 5.
We spawn cluster with 1 master and 3 workers. VNI 10.
they are spawned on nodes 1, 3, 4 and 5.
If there is incress controller pod running on node 2, it won't be able to forward traffic because there is no flannel.10 interface on node 2.

We usually forward traffic from loadblancer to master nodes.
Not sure about the solution atm, spawning flannel on all master nodes may not be the right one.