안녕하세요. 20년에 출판해주신 책을 통해 잘 공부하고 있습니다. 좋은 책 감사합니다.
실습 중 잘 안 되는 것이 있어 문의를 드리려고 찾아뵈었습니다.
2부 5장의 kubeadm을 이용한 컨트롤 플레인에서의 클러스터 구축이 잘 되지 않아 문의드립니다. 검색도 많이 해봤지만 극히 기초적인 단계에서 발생한 문제임에도 도움이 되는 것을 찾지 못했습니다.
우선 테스트 환경은 간단하게 다음과 같았습니다.
플랫폼: AWS 내 우분투 22.04 LTS, 그리고 로컬 머신 내 VM 머신의 우분투 22.04 LTS 모두 테스트
사용 CRI: containerd
버전: 쿠버네티스 1.26을 포함한 모든 구성요소 최신 버전(별도 버전 지정을 하지 않음)
참고로 책의 예제 중 (아주 적지만)일부분은 현재 시점에선 잘 동작하지 않아 아래 공식 사이트의 가이드를 보며 진행했습니다.
containerd 설치 전 준비 : https://kubernetes.io/docs/setup/production-environment/container-runtimes/
containerd 설치(도커엔진과 함께 패키지로 설치) : https://github.com/containerd/containerd/blob/main/docs/getting-started.md
kubeadm 설치 : https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
kubeadm으로 컨트롤 플레인 노드에 클러스터 생성 : https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
증상은 kubeadm init으로 클러스터 기동 직후 쿠버네티스의 기본 컨테이너들(kube-apiserver, kube-controller-manager, kube-proxy, kube-scheduler)이 마치 정상적으로 기동된 것처럼 보이지만 컨테이너들이 죽고 리스타트를 반복하다가 일정 시점 이후로는 kube-apiserver의 6443 포트가 닫혀 어떤 컨트롤도 불가능하게 됩니다. 정확히 특정하진 않았지만 기동부터 완전히 먹통이 되기까지 약 5~10분 정도가 소요되는 것 같습니다. 참고로 본 문제는 CNI 플러그인 설치 이전에 발생하며 워커 노드는 참여시키지 않았습니다.
아래는 기동 시 메시지이며 별다른 문제는 없어보입니다.
root@kube-control:/var/log/containers# kubeadm init --apiserver-advertise-address 192.168.132.131 --pod-network-cidr=192.167.0.0/16
[init] Using Kubernetes version: v1.26.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kube-control kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.132.131]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kube-control localhost] and IPs [192.168.132.131 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kube-control localhost] and IPs [192.168.132.131 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 4.501740 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node kube-control as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node kube-control as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: ys42dt.iy4mfur22loefwat
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.132.131:6443 --token ys42dt.iy4mfur22loefwat \
--discovery-token-ca-cert-hash sha256:0b6e9766c12499bad2ef737a3022ae5ab1bee839d0c44801f9025a92895222e7
그리고 아래는 기동 후 정상적인 모습부터 점차 리스타트가 증가하고 컨테이너들이 죽는 모습입니다.(다수의 시도로 일부 컨테이너의 리스타트 숫자가 많이 누적되어 있습니다)
root@kube-control:/var/log/containers# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-787d4945fb-tjmxl 0/1 Pending 0 42s
kube-system coredns-787d4945fb-z4p7r 0/1 Pending 0 42s
kube-system etcd-kube-control 1/1 Running 31 (108s ago) 75s
kube-system kube-apiserver-kube-control 1/1 Running 31 (78s ago) 111s
kube-system kube-controller-manager-kube-control 1/1 Running 14 (108s ago) 111s
kube-system kube-proxy-nvfqf 1/1 Running 1 (41s ago) 42s
kube-system kube-scheduler-kube-control 1/1 Running 34 (108s ago) 110s
root@kube-control:/var/log/containers# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-787d4945fb-tjmxl 0/1 Pending 0 2m18s
kube-system coredns-787d4945fb-z4p7r 0/1 Pending 0 2m18s
kube-system etcd-kube-control 1/1 Running 31 (3m24s ago) 2m51s
kube-system kube-apiserver-kube-control 1/1 Running 31 (2m54s ago) 3m27s
kube-system kube-controller-manager-kube-control 1/1 Running 14 (3m24s ago) 3m27s
kube-system kube-proxy-nvfqf 1/1 Running 3 (34s ago) 2m18s
kube-system kube-scheduler-kube-control 0/1 CrashLoopBackOff 36 (14s ago) 3m26s
root@kube-control:/var/log/containers# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-787d4945fb-tjmxl 0/1 Pending 0 4m14s
kube-system coredns-787d4945fb-z4p7r 0/1 Pending 0 4m14s
kube-system etcd-kube-control 1/1 Running 31 (5m20s ago) 4m47s
kube-system kube-apiserver-kube-control 1/1 Running 32 (84s ago) 5m23s
kube-system kube-controller-manager-kube-control 1/1 Running 15 (102s ago) 5m23s
kube-system kube-proxy-nvfqf 1/1 Running 4 (48s ago) 4m14s
kube-system kube-scheduler-kube-control 1/1 Running 37 (2m10s ago) 5m22s
root@kube-control:/var/log/containers# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-787d4945fb-tjmxl 0/1 Pending 0 5m24s
kube-system coredns-787d4945fb-z4p7r 0/1 Pending 0 5m24s
kube-system etcd-kube-control 1/1 Running 32 (62s ago) 5m57s
kube-system kube-apiserver-kube-control 1/1 Running 32 (2m34s ago) 6m33s
kube-system kube-controller-manager-kube-control 0/1 CrashLoopBackOff 16 (10s ago) 6m33s
kube-system kube-proxy-nvfqf 0/1 CrashLoopBackOff 4 (14s ago) 5m24s
kube-system kube-scheduler-kube-control 0/1 CrashLoopBackOff 37 (46s ago) 6m32s
root@kube-control:/var/log/containers# kubectl get pods --all-namespaces
The connection to the server 192.168.132.131:6443 was refused - did you specify the right host or port?
(이후 완전히 접근 불가)
너무 기본적인 부분에서 막혀서 며칠 째 자체 해결을 시도해봤으나 원인을 찾을 수 없어 문의드립니다.
확인에 추가로 필요한 로그가 있다면 말씀 부탁드립니다. 감사합니다.