amimof / kubernetes-the-right-way Goto Github PK
View Code? Open in Web Editor NEWInstall Kubernetes with Ansible
License: MIT License
Install Kubernetes with Ansible
License: MIT License
Not urgent (at least for me), but using this issue to track any eventual progress of this. :) It was released a few days ago (https://discuss.kubernetes.io/t/kubernetes-1-14-released/5648).
I have three nodes:
k8s-worker-01
k8s-worker-02
k8s-worker-03
When starting the kubelet on the first one, it all works good. But on the other two nodes, I get the following errors:
Jan 07 11:41:52 k8s-worker-02 kubelet[7251]: E0107 11:41:52.642553 7251 kubelet_node_status.go:94] Unable to register node "k8s-worker-02" with API server: nodes is forbidden: User "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope
Jan 07 11:41:52 k8s-worker-02 kubelet[7251]: E0107 11:41:52.537626 7251 kubelet.go:2266] node "k8s-worker-02" not found
Jan 07 11:41:52 k8s-worker-02 kubelet[7251]: E0107 11:41:52.637743 7251 kubelet.go:2266] node "k8s-worker-02" not found
... along with a bunch other, all complaining about unauthorized user "system:anonymous".
Is this something you've seen before? I've tried manually using the certificates on the nodes, and that works:
$ curl \
--cacert /etc/kubernetes/pki/ca.pem \
--cert /etc/kubernetes/pki/cert.pem \
--key /etc/kubernetes/pki/key.pem \
https://k8s-master-01.viskanint.local:6443/api/v1/nodes
Would it be a good idea to clean up the ~/.kube/config
when running cleanup.yml
?
$ kubectl config unset clusters.{{ cluster_name }}
$ kubectl config unset contexts.{{ cluster_name }}
$ kubectl config unset users.{{ cluster_name }}
We don't have the cluster_name
variable in cleanup.yml
though. :)
I saw that you recently added an unmounting step in cleanup.yml
, which is nice. I had that issue a few days ago.
However, today I tried to clean up a cluster that had a few deployments with config maps and secrets, and they appear to be mounted separately, so I had to do something like this manually before continuing:
anton@node01:~$ sudo umount /var/lib/kubelet/pods/1d8b27bf-02ba-11e9-8e3f-080027281615/volumes/kubernetes.io~secret/nginx-ingress-token-qpwsq
anton@node01:~$ sudo umount /var/lib/kubelet/pods/6d3fb723-03a6-11e9-b379-080027281615/volumes/kubernetes.io~secret/default-token-9wjxq
anton@node01:~$ sudo umount /var/lib/kubelet/pods/eb91be51-02bd-11e9-8e3f-080027281615/volumes/kubernetes.io~secret/default-token-tg277
anton@node01:~$ sudo umount /var/lib/kubelet/pods/0d18545c-03ae-11e9-b379-080027281615/volumes/kubernetes.io~secret/default-token-9wjxq
anton@node01:~$ sudo umount /var/lib/kubelet/pods/0d18545c-03ae-11e9-b379-080027281615/volume-subpaths/config/kibana/0
anton@node01:~$ sudo umount /var/lib/kubelet/pods/6d3fb723-03a6-11e9-b379-080027281615/volume-subpaths/config/elasticsearch/1
Do you think it would be possible to find these in cleanup.yml
and unmount them, in a good way?
EDIT: Or is the preferred way to make sure that the cluster is empty before running the cleanup?
kelseyhightower/kubernetes-the-hard-way/issues/78
To be able to run kubectl port-forward
, the package socat
needs to be installed on the workers. Should it be installed with the playbook from this repository, or should those "additional" things be installed separately?
Port forwarding is mostly done during development I would suppose, so maybe you wouldn't want extra packages in a production environment?
I'm planning on doing an LDAP-integration for authentication to our cluster, so we can use a detailed permission system. To do this, I need to use aditional options to the API server:
--oidc-issuer-url=https://dex.example.com
--oidc-client-id=example-app
--oidc-ca-file=/etc/ssl/certs/openid-ca.pem
--oidc-username-claim=email
--oidc-groups-claim=groups
Any idea on how we can add a generic way of adding API-server options?
Maybe some prefixed Ansible variables? Like kube-apiserver.oidc-issuer-url=https://dex.example.com
?
Release Kubernetes v1.14.4 is the latest and current version in KTRW should be bumped (v1.14.1).
I just tried adding and removing masters to my cluster (etcd is running on my master nodes). My plan was to replace my existing three (really bad) masters with three new masters.
I added the servers to the Ansible inventory and just ran the playbook. It didn't turn out the way I expected.
After googling around, I found out that adding new nodes to (and removing from) etcd requires you to use etcdctl
. So to add nodes, you run (from one of the existing etcd nodes):
$ etcdctl --cert-file /etc/etcd/pki/etcd.pem --key-file /etc/etcd/pki/etcd-key.pem --ca-file /etc/etcd/pki/ca.pem --endpoints https://127.0.0.1:2379 member add k8s-master-10 https://<ip-address>:2380
Then you get some info back that you should use in your service file:
--initial-cluster k8s-master-10=https://<ip-address>:2380,k8s-master-03=https://<ip-address>:2380,k8s-master-02=https://<ip-address>:2380,k8s-master-01=https://<ip-address>:2380
--initial-cluster-state existing
Note the list of initial cluster nodes and also the initial cluster state.
After doing this manually for each node, I could run the Ansible playbook again to tweak the configuratin files so they are equal on all servers. That worked.
Same goes when removing servers. I had to do:
$ etcdctl --cert-file /etc/etcd/pki/etcd.pem --key-file /etc/etcd/pki/etcd-key.pem --ca-file /etc/etcd/pki/ca.pem --endpoints https://127.0.0.1:2379 member remove <node-id>
... before actually shutting them down and excluding them from the cluster.
This requires a bit of manual work (which is just fine, if it is necessary). Do you have any suggestion on how this can be automated in a better way? Or if not, could/should it be documented within KTRW?
Currently all certificates expire 5 years after creation.
Do we want to utilize a parameter for this value? Also, maybe a separate parameter specifically for the common authority certificates for kube-apiserver
and etcd
, maybe also with a bit longer default?
Maybe also have another parameter for forcing recreation of common authorities, regenerate_ca_certificates=True
(in additionl to regenerate_certificates
).
When the time comes to renew certificates (common authorities specifically) it would be nice with a zero-downtime routine. I'll see if I can try to test this routine (as soon as I have time). If it only means downtime for state updates (such as Ingress controller config and node updates and similar), I think it's OK. As long as traffic are still routed properly to the containers.
The playbook does a good job of not restarting services that haven't changed. However, there are still some places that are considered changed, even if you run the playbook on a cluster where you expect no changes.
PLAY RECAP **********************************************************************************************************************************************************************************************************************************
k8s-master-1 : ok=50 changed=3 unreachable=0 failed=0
k8s-master-2 : ok=50 changed=3 unreachable=0 failed=0
k8s-node-1 : ok=33 changed=2 unreachable=0 failed=0
k8s-node-2 : ok=33 changed=2 unreachable=0 failed=0
localhost : ok=25 changed=0 unreachable=0 failed=0
I use same machines for etcd and masters, so above is a bit misleading. It's actually etcd and not matsers causing unwanted changes. For etcd, the following ones are considered changed:
TASK [etcd : Download etcd] *****************************************************************************************************************************************************************************************************************
changed: [k8s-master-1]
TASK [etcd : Unarchive etcd tarball] ********************************************************************************************************************************************************************************************************
changed: [k8s-master-1]
TASK [etcd : Remove tmp download files] *****************************************************************************************************************************************************************************************************
changed: [k8s-master-1] => (item=etcd-v3.3.12-linux-amd64)
changed: [k8s-master-1] => (item=etcd-v3.3.12-linux-amd64.tar.gz)
For the nodes, these are the ones that are considered changed:
TASK [cni : Ensure directories exist] *******************************************************************************************************************************************************************************************************
ok: [k8s-node-1] => (item=/etc/cni/net.d)
changed: [k8s-node-1] => (item=/opt/cni/bin)
TASK [cni : Download cni] *******************************************************************************************************************************************************************************************************************
changed: [k8s-node-1]
Can we keep the downloaded etcd tarball? It would consume unnecessary disk space, but at least it wouldn't trigger a change. Or maybe there is a way of ignoring changes for certain steps in the playbook? In theory, only the step Move etcd binaries into place
is important (which doesn't trigger any changes).
Not really sure what the problem is here. It looks like it triggers a change on the /opt/cni/bin
directory... Then it triggers another change when we extract CNI into that directory.
The task attempts to set the directory permissions to 755
. But after CNI is extracted, it looks like the permissions are 775
. Not sure why though...
EDIT: Seems it works the same when I do this locally using tar
. Maybe we should just set 775
for /opt/cni/bin` instead?
It's probably not KTRW's responsibility, but would it be a good idea to install additional tools on the nodes, if specific parameters are set in the inventory of course?
For example, I feel it would be great to have crictl
installed so it's easier to debug when issues occur. There are maybe other useful tools that could be optional for KTRW.
Again, it's not really KTRW's responsibility. Thoughts?
The kubelet is configured with the following resolv config:
resolvConf: "/run/systemd/resolve/resolv.conf"
This implies that one is using systemd-resolvd
, right? I just had systemd-resolvd
disabled due to a temporary local DNS issue and manually configured my /etc/resolv.conf
. This means that the directory /run/systemd/resolve/
does not exist and the kubelet is complaining about that.
It feels like it would be better to use /etc/resolv.conf
directly? Not sure what implications this might cause though. If I have systemd-resolvd
enabled, /etc/resolv.conf
is a symlink to /run/systemd/resolve/stub-resolv.conf
which does not have the same contents.
We'll get back to using systemd-resolvd
as soon as the issue is fixed, so it's not a superbig deal. It's just an idea/discussion.
I'm trying to install the metrics server into my cluster. It requires you to add an APIService
that registers itself as an API extension in the API server. However, my masters needs to be able to access this using a Service
clusterIP
, which it currently cannot, so the APIService
fails.
Reading around a bit:
kubernetes/kubernetes#66231
It looks like people install kube-proxy
on the masters to achieve this, but it feels a bit weird.
Have you got any idea on how to do this best with Kubernetes The Right Way? There is also a discussion here where they recommend adding an additional API server as a pod inside the cluster, but I'm not quite sure...
EDIT: Running kube-proxy
on masters feels really odd. It's not something that I'd want to do.
Note: Technically not related to this repository other than the fact that I might need custom switches on other components than kube-apiserver
. But I'll give it a go here anyway, maybe it's a good discussion topic. :)
I noticed that all Kubernetes components and etcd
exposes a /metrics
path with Prometheus metrics. So I was thinking that I should start scraping these, and see if I can find any pre-built dashboards for Grafana.
I just have something to ask/discuss here.
kube-apiserver
should easily be accessible by my Prometheus pod, as long as I give the serviceaccount access to the /metrics
path (not sure how I do that, though, will need to investigate).
Regarding kube-scheduler
and kube-controller-manager
, I can access them over HTTP on ports 10259 and 10257 respectively. However, they have quite some strange CA certificates and I'm not able to use my own access token. I suppose switches --tls-cert-file
and --tls-private-key-file
will solve the strange CA certificate, but I'm not sure how to actually authenticate (avoiding 401 Unauthorized). Do you any ideas?
When it comes to etcd
, I can access that pretty easily. However, I need to use the client certificate and key stored on the masters (etcd.pem
and etcd-key.pem
), and I can't really access them from my Prometheus pod. I'm not sure I want to either. I guess this is something that is interesting here.
kube-proxy
should be fairly simple. It only listens to 127.0.0.1:10249
by defualt, but that's changable with a switch, so it should be fine.
Finally: I wouldn't want to hardcode all server IPs in my Prometheus configuration file. It would be great if I could use Kubernetes services for this. I see that I have some endpoints (kubectl get endpoints -n kube-system
), like kube-controller-manager
, but they're set to <none>
. I guess I could create my services manually (once) and utilize them. But I wouldn't want Prometheus to round-robin requests to them. I would want it to perform a DNS-lookup and scrape all targets of that DNS-lookup. Somehow... :) Ideas? For the worker nodes, it would be nice if I could utilize kubectl get nodes
to find IP adresses of nodes, and there reach kube-proxy
.
Just close this if you feel it's too off-topic, and I'll try elsewhere.
Even though we've enabled some kind of HA solution with serial_all=50%
, I still get some downtime during upgrades. I assume this is because when containerd is up again on node 1, we almost immediately take containerd down on node 2. If deployments only have two replicas (which is a common case), it makes sense that this deployment will have downtime.
What can we do in order to properly wait between nodes? One idea would be to actually ask the API server. We could check node status with kubectl get node <node-name>
. Could we also check something like number of non-ready pods on a specific node? Maybe you actually have non-ready pods intentionally?
Another idea could be a configurable delay between nodes? Maybe that's sufficient?
Does Ansible have a way of waiting for user input here? If so, that would be even better, so I can manually confirm that the node is back up at 100%.
I know pod networking isn't something that should be handled by this repository, but I have a minor question. I've just gotten into pod networking, I've previously worked with a single worker node, meaning that the bridge supplied by this repository has worked wonders.
This repository supplies a bridge using the subnet 10.19.0.0/16
.
This repository installs kube-controller-manager
with the argument --cluster-cidr=10.19.0.0/16
, which I believe is in charge of appointing IP addresses to each new pod.
I've now installed Flannel that should be used as an overlay network for proper pod networking. The configuration for it uses the subnet 10.244.0.0/16
(from the default in the flannel repository).
My question: How am I supposed to tell my cluster to use the Flannel subnet instead? One idea would be to have an Ansible parameter for this, but I'm kind of clueless here. Is that the right way to go? Or am I missing something? :)
According to the kube-proxy.service.j2 we're running kube-proxy with this command:
/usr/local/bin/kube-proxy --config=/etc/kubernetes/config/kube-proxy.yml
When I do that I get the following error:
F1214 16:56:03.067953 14535 server.go:361] invalid configuration: no configuration has been provided
I googled around a bit, and found this, which shows a different command. If I run this, my kube-proxy runs:
/usr/local/bin/kube-proxy --master=https://{{ cluster_hostname }}:{{ cluster_port }} --kubeconfig=/etc/kubernetes/config/kube-proxy.kubeconfig --v=2
The above command makes kube-proxy.yml a bit useless. Am I missing something? It's worth noting that I haven't gotten my cluster fully set up yet, so I haven't verified that everything works with the above command. :)
EDIT: Probably same issue as kelseyhightower/kubernetes-the-hard-way/issues/391
The playbook sets a bunch of permissions to 755, see here.
We had an idea of using a controller host (a very simple VM), where we execute the playbook for different clusters. This way, we make sure we always have the correct ~/.ktrw
directory, we can easily back it up and we avoid risks of re-creating certificates, etc. It also seems a bit quicker to run it like that compared to over from localhost over VPN (which we do a lot these days).
The fact that KTRW wants 755 makes it a bit difficult to work with these with different users. It would be nice if they could be 775 instead, so we could have group permissions. But maybe that's not optimal for when they actually reach the destination servers... There, we'd want 755 I guess?
Do you have any ideas or suggestions, @amimof?
The service cluster IP range is hardcoded to 10.32.0.0/24
. I'm guessing we should make this a parameter, with a default value instead?
Also, why /24
? Doesn't this limit the number of services in the cluster to 256? I see that Kubernetes default is 10.0.0.0/24
though, so maybe it's common.
We previously added support for configuring kube-apiserver
parameters dynamically. This is nice, and we should probably have this for all components?
However, how do we do it for kubelet
, where the configuration comes from a file instead of command line arguments? Maybe the exact same way?
For example, one parameter that I might want to change is maxPods
(or --max-pods
).
Test playbook tests/main.yml
outputs deprecation warnings which needs to be adressed. The warnings are related to the docker_container
module. An example of a warnings is the following
TASK [Bring up etcd] ***********************************************************
[DEPRECATION WARNING]: Please note that docker_container handles networks
slightly different than docker CLI. If you specify networks, the default
network will still be attached as the first network. (You can specify
purge_networks to remove all networks not explicitly listed.) This behavior
will change in Ansible 2.12. You can change the behavior now by setting the new
`networks_cli_compatible` option to `yes`, and remove this warning by setting
it to `no`. This feature will be removed in version 2.12. Deprecation warnings
can be disabled by setting deprecation_warnings=False in ansible.cfg.
Also
[DEPRECATION WARNING]: Param 'ipam_options' is deprecated. See the module docs
for more information. This feature will be removed in version 2.12. Deprecation
warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
See this build for more details
I started messing around with pod security policies to easily notice if my containers are running as root (and doing more complex rules of course). I couldn't get it working, and I ended up in this issue: kubernetes/kubernetes#53063
The argument value for --enable-admission-plugins
is hardcoded in KTRW. What about having it as an Ansible setting, with a default value matching the currently hardcoded value?
Note: I haven't tested if this actually works for me yet, I just stumbled across it. What's your take on this?
https://github.com/containerd/containerd/blob/8706a355dd603338d4a3026e66f6477fbdb18ef9/docs/ops.md
Looks like containerd
can expose Prometheus metrics using this configuration snippet:
[metrics]
# tcp address!
address = "127.0.0.1:1234"
Can we get this into KTRW? By default, or optionally somehow? It would be nice to be able to see these metrics. Also, I can use this port to see if containerd is up and running (I'm experimenting with sequencial Ansible).
Background:
I have a working cluster with 3 masters and 5 nodes. I ask my colleague to add another node into the cluster. He clones KTRW and our repository that contains our inventory file. He adds the new node into the inventory file.
If he runs the Ansible playbook now, it will destroy the cluster, since he has no keys on his machine.
I was thinking that we could have a validation parameter. If the parameter is set to true
, it could check if vital keys are missing (for example the service-account-key.pem
) and if so, simply fail the playbook, explaining that the user needs the keys to continue.
What do you think? Just as a safety measure.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.