Code Monkey home page Code Monkey logo

kubernetes-ansible's Introduction

Kubernetes-ansible

changeLog见本页的wiki界面

https://github.com/zhangguanzhang/Kubernetes-ansible/wiki

关于本项目(必看)

https://github.com/zhangguanzhang/Kubernetes-ansible/wiki/What-I-did

ansible部署Kubernetes

系统可采用Ubuntu 16.x(未完成)与CentOS 7.x(建议7.x里使用最新的) 本次安装的版本:

  • Kubernetes v1.17.11 (HA高可用)
  • CNI plugins v0.8.6
  • Etcd v3.4.10
  • flanneld v0.11.0
  • Calico (不写,可以自行去找yaml部署)
  • Docker CE 19.03(可以19.06+,自行测试)

版本选择建议如果最新版本的小版本号没到5就使用上一个大版本,不要盲目追求新版本 其次每个版本之间的差异性大多是cs的三个组件的配置参数

使用指南

部署前

https://github.com/zhangguanzhang/Kubernetes-ansible/wiki/Before-deploy

部署文档

https://github.com/zhangguanzhang/Kubernetes-ansible/wiki/Deploy

README信息后续更倾向于wiki页面更新,毕竟分支不再是一个

kubernetes-ansible's People

Contributors

zhangguanzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-ansible's Issues

ansible-playbook deploy.yml --tags tls 出错

ansible-playbook deploy.yml --tags tls 出错(我用的是独立ansible机器,不是master0),信息如下:

TASK [tls : apiserver-etcd-client --- part.1] *************************************************************************************************************************************************
changed: [localhost]

TASK [tls : apiserver-etcd-client --- part.2] *************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "openssl x509 -in apiserver-etcd-client.csr -req -CA etcd/ca.crt -CAkey etcd/ca.key -CAcreateserial -extensions v3_req_etcd -extfile openssl.cnf -out apiserver-etcd-client.crt -days 10000\n", "delta": "0:00:00.013557", "end": "2019-07-10 15:03:09.043126", "msg": "non-zero return code", "rc": 1, "start": "2019-07-10 15:03:09.029569", "stderr": "Error Loading extension section v3_req_etcd\n140280250644368:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=etcd001.k8s.local\n140280250644368:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_names_etcd", "stderr_lines": ["Error Loading extension section v3_req_etcd", "140280250644368:error:220A4076:X509 V3 routines:a2i_GENERAL_NAME:bad ip address:v3_alt.c:476:value=etcd001.k8s.local", "140280250644368:error:22098080:X509 V3 routines:X509V3_EXT_nconf:error in extension:v3_conf.c:95:name=subjectAltName, value=@alt_names_etcd"], "stdout": "", "stdout_lines": []}

{"changed": false, "msg": "Could not find or access 'common/time/ntp.conf.j2'

use master branches:
error:
["changed": false, "msg": "Could not find or access 'common/time/ntp.conf.j2']
because:
path:[/tasks/time/chrony.yml] or [ntp.yml]

- name: Send ntp configuration file
  template: src=common/time/ntp.conf.j2 dest=/etc/ntp.conf

modify

- name: Send ntp configuration file
  template: src=ntp.conf.j2 dest=/etc/ntp.conf```

最后coredns一直在Creating,求助,多谢

Events:
Type Reason Age From Message


Normal Scheduled 46m default-scheduler Successfully assigned kube-system/coredns-58b448b5d9-hhlxt to 158.143.70.21
Warning FailedCreatePodSandBox 33m (x19 over 46m) kubelet, 158.143.70.21 Failed create pod sandbox: rpc error: code = Unknown desc = failed pulling image "gcr.azk8s.cn/google_containers/pause-amd64:3.1": Error response from daemon: Get http://gcr.azk8s.cn/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning FailedCreatePodSandBox 33m kubelet, 158.143.70.21 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "aa57fe03f3b22c54a17783490252498e67bcdd7bb4c226c7055df91625d4551c" network for pod "coredns-58b448b5d9-hhlxt": NetworkPlugin cni failed to set up pod "coredns-58b448b5d9-hhlxt_kube-system" network: failed to find plugin "loopback" in path [/opt/cni/bin], failed to clean up sandbox container "aa57fe03f3b22c54a17783490252498e67bcdd7bb4c226c7055df91625d4551c" network for pod "coredns-58b448b5d9-hhlxt": NetworkPlugin cni failed to teardown pod "coredns-58b448b5d9-hhlxt_kube-system" network: failed to find plugin "portmap" in path [/opt/cni/bin]]
Normal SandboxChanged 11m (x104 over 33m) kubelet, 158.143.70.21 Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 3m9s (x26 over 8m43s) kubelet, 158.143.70.21 Pod sandbox changed, it will be killed and re-created.
[root@node1 .kube]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-58b448b5d9-hhlxt 0/1 ContainerCreating 0 47m
coredns-58b448b5d9-mptg5 0/1 ContainerCreating 0 47m
metrics-server-86c9cbd9f5-zblwz 0/1 ContainerCreating 0 47m
[root@node1 .kube]#

把group_vars/all.yml中docker版本改成最新的18.09后,安装失败

vim group_vars/all.yml
version: '18.06'
改成version: '18.09' 后出现以下错误:

fatal: [192.168.11.172]: FAILED! => {"changed": false, "msg": "No package matching 'docker-ce-3:18.09.7-3.el7' found available, installed or updated", "rc": 126, "results": ["No package match
ing 'docker-ce-3:18.09.7-3.el7' found available, installed or updated"]}

Metric-server service 443 不能与 api通信,求助

张工,好,以下问题求助,有空帮忙分析下那里出了问题,

failing or missing response from https://10.96.18.80:443/apis/metrics.k8s.io/v1beta1: Get https://10.96.18.80:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (FailedDiscoveryCheck)   7m20s

[root@ CoreAddons]#kubectl get all -n kube-system -o wide
NAME                                  READY   STATUS    RESTARTS   AGE     IP           NODE            NOMINATED NODE   READINESS GATES
pod/coredns-59fc9fcd9b-4fcgl          1/1     Running   0          11h     192.1.0.12   10.249.13.160   <none>           <none>
pod/coredns-59fc9fcd9b-rms4b          1/1     Running   0          11h     192.1.3.19   10.249.13.162   <none>           <none>
pod/metrics-server-576f8588d9-fcbhv   1/1     Running   0          7m46s   192.1.3.21   10.249.13.162   <none>           <none>

NAME                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
service/kube-controller-manager   ClusterIP   10.96.13.224    <none>        10252/TCP                11d     <none>
service/kube-dns                  ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   11h     k8s-app=kube-dns
service/kube-scheduler            ClusterIP   10.96.112.106   <none>        10251/TCP                11d     <none>
service/metrics-server            ClusterIP   10.96.18.80     <none>        443/TCP                  7m46s   k8s-app=metrics-server

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS       IMAGES                                            SELECTOR
deployment.apps/coredns          2/2     2            2           11h     coredns          10.249.12.47/k8sv17/coredns:1.6.5                 k8s-app=kube-dns
deployment.apps/metrics-server   1/1     1            1           7m46s   metrics-server   10.249.12.47/k8sv17/metrics-server-amd64:v0.3.6   k8s-app=metrics-server

NAME                                        DESIRED   CURRENT   READY   AGE     CONTAINERS       IMAGES                                            SELECTOR
replicaset.apps/coredns-59fc9fcd9b          2         2         2       11h     coredns          10.249.12.47/k8sv17/coredns:1.6.5                 k8s-app=kube-dns,pod-template-hash=59fc9fcd9b
replicaset.apps/metrics-server-576f8588d9   1         1         1       7m46s   metrics-server   10.249.12.47/k8sv17/metrics-server-amd64:v0.3.6   k8s-app=metrics-server,pod-template-hash=576f8588d9

group master 大小写错误

Kubernetes-ansible/roles/master/tasks/main.yml

when: inventory_hostname in groups['Master']

Kubernetes-ansible/roles/master/templates/kube-apiserver.service.j2

--apiserver-count={{ groups['Master'] | length }} \

需要改进

你的连接搞成这样,又依靠分支分支搞,实在没办法给你推送新的东西,本来想修改点东西给到你。

flanneld error

Nov 26 13:41:09 master01 sshd[16199]: pam_unix(sshd:session): session opened for user root by (uid=0)
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.001237 6675 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Get https://172.16.128.24
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.004250 6675 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Get https://172.16.128.240:84
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.009359 6675 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: Get https://172.16.128
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.010773 6675 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://172.16.128.24
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.013051 6675 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://172.16.128.240:8443/
Nov 26 13:41:10 master01 kubelet[6675]: E1126 13:41:10.344709 6675 controller.go:125] failed to ensure node lease exists, will retry in 7s, error: Get https://172.16.128.240:8443/apis/coor
Nov 26 13:41:10 master01 kube-apiserver[2256]: E1126 13:41:10.411097 2256 authentication.go:65] Unable to authenticate the request due to an error: [invalid bearer token, square/go-jose: e
Nov 26 13:41:10 master01 flanneld[15541]: E1126 13:41:10.828927 15541 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to list *v1.Node: Unauthorized

flanneld 无法启动

按照你的guide, 试了几次都是到flannel 失败。

TASK [KubernetesCoreAddons : 开机并启动flanneld] *********************************************************************
fatal: [192.168.1.17]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.18]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.38]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.28]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}
fatal: [192.168.1.27]: FAILED! => {"changed": false, "msg": "Unable to restart service flanneld: Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.\n"}

[root@vm17 CoreAddons]# journalctl -xe |grep flanneld
Jul 08 16:36:05 vm17.suibian.int flanneld[32209]: E0708 16:36:05.632948 32209 reflector.go:201] github.com/coreos/flannel/subnet/kube/kube.go:310: Failed to list *v1.Node: Unauthorized

ifconfig
[root@vm17 CoreAddons]# ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:8d:03:72:29 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.17 netmask 255.255.255.0 broadcast 192.168.1.255
ether 52:54:00:4c:eb:28 txqueuelen 1000 (Ethernet)
RX packets 4056110 bytes 543595368 (518.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3506639 bytes 782355415 (746.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 809083 bytes 110526361 (105.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 809083 bytes 110526361 (105.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

请教:三个master有两个flanneld启动不起来,帮忙指导下。

Searching for interface using 10.249.21.198
Using interface with name eth0 and address 10.249.21.198
Using 10.249.21.198 as external address
Waiting 10m0s for node controller to sync
Starting kube subnet manager
Node controller sync successful
Created subnet manager: Kubernetes Subnet Manager - 10.249.21.198
Installing signal handlers
Found network config - Backend type: vxlan
VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
VXLAN device already exists
Returning existing device
Error registering network: failed to acquire lease: node "10.249.21.198" p
Stopping shutdownHandler...
Start healthz server on 10.249.21.198:8471

请教kubelet.crt证书的有效期,thanks!!!

请教:

为何kubelet 的证书就一年,ansible安装的参数文件配置的是10年
/etc/kubernetes/pki/kubelet.crt
notBefore=Aug 20 11:52:15 2020 GMT
notAfter=Aug 20 11:52:15 2021 GMT

其它证书没问题,还有个admin.crt是做什么用的?有时也是一年。

/etc/kubernetes/pki/front-proxy-client.crt
notBefore=Aug 20 12:47:08 2020 GMT
notAfter=Jun 5 12:47:08 2294 GMT
/etc/kubernetes/pki/kube-scheduler.crt
notBefore=Aug 20 12:47:09 2020 GMT
notAfter=Jun 5 12:47:09 2294 GMT
/etc/kubernetes/pki/sa.crt
notBefore=Aug 20 12:47:09 2020 GMT
notAfter=Jun 5 12:47:09 2294 GMT
/etc/kubernetes/pki/admin.crt
notBefore=Aug 20 12:47:10 2020 GMT
notAfter=Jun 5 12:47:10 2294 GMT

k8s metrics Request

  • 版本:v1.16.9
  • 问题描述:重启了下集群然后metrics就GG了检查了metrics日志没有报错,检查了下kubelet少量不相关的报错,以下yaml是扒拉下你预装的metris的yaml文件,请查看下是否有问题
  • 问题截图

  • 解决:尝试过删除重建不起作用,特来找馆长咨询
  • 额外话题:kubectl get cs 都是unknow这个是Bug吗?describe看状态倒是正常的
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: metrics-server
      name: metrics-server
    spec:
      containers:
      - args:
        - --kubelet-preferred-address-types=InternalIP
        - --kubelet-insecure-tls
        image: zhangguanzhang/metrics-server:v0.3.6
        imagePullPolicy: IfNotPresent
        name: metrics-server
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
        - mountPath: /etc/localtime
          name: host-time
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: metrics-server
      serviceAccountName: metrics-server
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /etc/localtime
          type: ""
        name: host-time
      - emptyDir: {}
        name: tmp-dir

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.