easzlab / kubeasz Goto Github PK

View Code? Open in Web Editor NEW

10.4K 10.4K 3.5K 6.46 MB

使用Ansible脚本安装K8S集群，介绍组件交互原理，方便直接，不受国内网络环境影响

Home Page: https://github.com/easzlab/kubeasz

Shell 36.45% Mustache 3.65% Jinja 59.90%

ansible calico cilium docker etcd flannel k8s kubeasz kubernetes

kubeasz's People

Contributors

Stargazers

Watchers

Forkers

yak2048 toyangdon eehuangyanwen dtlisir bopo bupttcl il300il300 hanhongyuan lf1029698952 yangchuansheng coagent nl30du chriszhang789 resolvewang yexingqi chunnie123 txg1550759 aland-zhang wsppt githupzlm newcrane trestea phillip2019 nuptzp muidea optimuse zhushilu jishandong graham-aker curls mvpzhangkai penglq keven0706 fge2016 heidsoft javacspring fangdaidai soxueren tracybin hyteer luokeychen moujf 277270678 evlon rainhard xnycool junneyang lzqmyb lxcsjk mgicode aleftao buyunweinuo rmond niuchp zhanglei ajiader winter-hi s0mejunbao nevermore-muyi caojinglei zhan-jie guodonghe yushantao yangzhizhen zhangzhong1018 wanglimeng zw12078 ljb-2000 ssrshz evatlsong squawell jamestzeng kc17 hippolin franplk mxylin jimmy-chuang ogre0403 tengzhexiao huangdelong chris-my jackhuang007 forestlp mendickxiao yahoon dwler1228 cuiyingd hifour zacklin923 blankxyz lcchris shouwen1986 linuxvip lsy643 ethan-2017 ahdong2007 zhiwei55 wd101 4ecolbyte xiazheng6

kubeasz's Issues

我手动安装7中每一步的,在$HOME(/root)下没有找到bootstrap.kubeconfig这个文件，会是什么原因导致没有生成了？

name: 安装bootstrap.kubeconfig配置文件 shell: "mv $HOME/bootstrap.kubeconfig /etc/kubernetes/bootstrap.kubeconfig"

ALLinONE模式安装

我采用allinone的模式，按照你的教程步骤执行。我装的是virtualbox的虚拟机，两个网卡：一个NAT，一个HostOnly，地址是192.168.56.100，所以我把hosts中所有节点的地址都改为这个地址，ssh-copy-id时就会报错，需要[email protected]的密码，ubuntu缺省是没有的，我自己建了一个，可是一直不对，“Permission Deneid”。如果不管的话，后面执行安装，无法ssh 192.168.56.100这台机器。请问，可以不用root用户吗？

all in one 部署方式出现的问题

哈喽
"msg": "Failed to connect to the host via ssh: Permission denied (publickey,password).\r\n", "unreachable": true
我用all-in-one的方式部署的时候出现了认证问题，
一台机器部署,也需要在这台机器上执行以下步骤？
执行ssh-copy-id $IP #$IP为本虚机地址，按照提示输入yes 和root密码？

还需要本机对本机进行认证？@gjmzj

看到只有两个节点的master，不能是3节点吗

我之前了解的k8s的scheduler和controller-manager都是通过选举的方式产生leader的，2个节点要是网络原因不通了，不就脑裂了吗？有没有考虑node端用ng只是反向代理apiserver，也就不用配置ha和keepalived这么麻烦了

机器重启后异常

----- dashboard ----

2017/12/14 06:25:31 Starting overwatch
2017/12/14 06:25:31 Using in-cluster config to connect to apiserver
2017/12/14 06:25:31 Using service account token for csrf signing
2017/12/14 06:25:31 No request provided. Skipping authorization
2017/12/14 06:25:34 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.68.0.1:443/version: dial tcp 10.68.0.1:443: getsockopt: connection timed out
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

k8s 官方建议的etcd版本为 3.1.10

具体详情可以看这里

External Dependencies
The supported etcd server version is 3.1.10, as compared to 3.0.17 in v1.8 (#49393, @hongchaodeng)
The validated docker versions are the same as for v1.8: 1.11.2 to 1.13.1 and 17.03.x
The Go version was upgraded from go1.8.3 to go1.9.2 (#51375, @cblecker)
The minimum supported go version bumps to 1.9.1. (#55301, @xiangpengzhao)
Kubernetes has been upgraded to go1.9.2 (#55420, @cblecker)

kubectl logs报错 error: You must be logged in to the server

error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log tomcat-69574bf8d5-47nm7))

容器正常启动，但是使用kubectl logs命令就出现上述错误，请教下这个怎么解决。

[root@k8s-master-1 ssl]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master-1 Ready master 54m v1.8.5 CentOS Linux 7 (Core) 4.11.8-1.el7.elrepo.x86_64 docker://17.11.0-ce
k8s-master-2 Ready master 54m v1.8.5 CentOS Linux 7 (Core) 4.11.8-1.el7.elrepo.x86_64 docker://17.11.0-ce
k8s-master-3 Ready master 54m v1.8.5 CentOS Linux 7 (Core) 4.11.8-1.el7.elrepo.x86_64 docker://17.11.0-ce
k8s-node-1 Ready node 54m v1.8.5 CentOS Linux 7 (Core) 4.11.8-1.el7.elrepo.x86_64 docker://17.11.0-ce
k8s-node-2 Ready node 54m v1.8.5 CentOS Linux 7 (Core) 4.11.8-1.el7.elrepo.x86_64 docker://17.11.0-ce

[root@k8s-master-1 ssl]# kubectl NAMESPACE NAME default hello-nginx-5d47cdc4b7-5p5jg default tomcat-69574bf8d5-2s96b default tomcat-69574bf8d5-47nm7 default tomcat-69574bf8d5-8kd8m default tomcat-69574bf8d5-lp4gh kube-system calico-policy-controller-84 kube-system kube-apiserver-k8s-master-1 kube-system kube-apiserver-k8s-master-2 kube-system kube-apiserver-k8s-master-3 kube-system kube-controller-manager-k8s-master-1 kube-system kube-controller-manager-k8s-master-2 kube-system kube-controller-manager-k8s-master-3 kube-system kube-dns-5595fc64b9-sbqn4 kube-system kube-proxy-9fj5w kube-system kube-proxy-ctbnd kube-system kube-scheduler-k8s-master-1 kube-system kube-scheduler-k8s-master-2 kube-system kube-scheduler-k8s-master-3 kube-system kubernetes-dashboard-574bdff9f-hjbbp get pods --all-namespaces
READY STATUS RESTARTS AGE
1/1 Running 0 7m
1/1 Running 0 41m
1/1 Running 0 41m
1/1 Running 0 41m
1/1 Running 0 41m
87549c89-kpjkg 1/1 Running 0 50m
1/1 Running 0 51m
1/1 Running 0 51m
1/1 Running 0 51m
1/1 Running 0 52m
1/1 Running 1 52m
1/1 Running 0 52m
3/3 Running 0 50m
1/1 Running 0 50m
1/1 Running 0 50m
1/1 Running 0 52m
1/1 Running 0 52m
1/1 Running 0 52m
1/1 Running 0 50m

master VIP并没有出现

首先谢谢您提供这么好的一套安装指导教程.
请问我在执行的时候,并没有提示有什么错误,但是我ip a的时候并没有找到VIP的IP地址,在二台LB服务器上可以得到:
netstat -antlp|grep 8443
tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN 899/haproxy

但我ip a时,并没有找到我的VIP地址,
root@k8-d-1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:a9:d0:df brd ff:ff:ff:ff:ff:ff
inet 10.10.2.70/23 brd 10.233.3.255 scope global dynamic ens33
valid_lft 86084sec preferred_lft 86084sec
inet6 fe80::f68:8c63:a546:8f1e/64 scope link
valid_lft forever preferred_lft forever

请问这是什么原因呢? 还请给予一些指点,谢谢了!!!!!
我配置默认VIP地址是:
MASTER_IP="10.10.2.17" # api-server 虚地址
MASTER_PORT="8443" # api-server 服务端口

可以用ubuntu的lxd建立2个worke node，这样在一个虚拟机里更好

关于etcd证书问题

关于大侠写的etcd生成证书那块，有个疑问，我发现脚本里etcd证书是在etcd节点上生成的，那么如果有三个etcd节点，就会生成三套证书，这样子的话，三个节点之间是怎么认证的，kube-apiserver和calico用的是哪套证书，怎么实现正常通信的？

Kubernetes master VIP 地址打开后提示错误

我的集群建好后,我直接打开LB的VIP地址, https://10.10.2.17:8443/,它弹出对话框让我输用户名和密码.
我输入用户名和密码后,只能看到以下网页文字:
提示如下错误:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "forbidden: User "admin" cannot get path "/"",
"reason": "Forbidden",
"details": {

},
"code": 403
}

10.10.2.17 这个是Kubernetes master 的虚拟IP, 可以ping通.
整个安装环节我也反复重装过了,没有任何错误提示. 证明安装还是挺顺利的.
能请大家指点一下吗? 谢谢啦!!

root@k8-d-2:~# kubectl cluster-info
Kubernetes master is running at https://10.10.2.17:8443
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

nodeports如何指定映射端口号?

@jmgao1983 @gjmzj @277270678
请教一下几位大神, 我运行一个nginx 的pod并成功地以node IP + ports方式访问.如下:
kubectl run nginx --image=nginx --port=80
kubectl expose deployment nginx --target-port=80 --type=NodePort

kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.68.0.1 443/TCP 1h
nginx NodePort 10.68.97.9 80:4098/TCP 2m

但它默认是随机映射端口的,像上述被映射为4098 ,我想固定为4488端口,但是失败. 我尝试各种方法均无法指定映射的端口.
例如:
cat nginx.yml
apiVersion: v1
kind: Service
metadata:
name: nginx
spec:
type: NodePort
ports:

port: 80
targetPort: 80
nodePort: 34455
selector:
name: nginx

请问有什么办法可以实现我的要求吗? 谢谢了!!!

环境预备阶段建议

1 ，关闭selinux 方式过于暴力
setenforce 0 && echo SELINUX=disabled > /etc/selinux/config
会将原来的注释和SELINUXTYPE属性全都覆盖，建议 sed 替换

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

2，在/etc/sysctl.conf 中加入如下配置

net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-arptables = 1

sysctl -p

用于消除docker info 警告
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

网络是否考虑支持其他的插件例如flannel

关于nginx-ingress-controller问题

gjmzj 你好，很高兴看到国内有人在做K8S的自动化部署，而且做到如此棒的地步！
现在有个问题，想咨询一下：在你的这种部署方式下，跑nginx-ingress-controller是否有问题？
我目前的情况是：我是用kubeadm部署的K8S集群，pod、svc这些都能正常运行。但是部署nginx-ingress-controller(https://github.com/kubernetes/ingress-nginx)时，该pod总是CrashLoopBackOff。查看日志发现是api server连不上。进一步用https://github.com/kubernetes/ingressnginx/blob/master/docs/troubleshooting.md中方法来验证：kubectl exec test-701078429-s5kca -- curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $TOKEN_VALUE" https://10.96.0.1 总是提示Failed to connect to 10.96.0.1 port 443: Connection timed out

10.96.0.1是我的kubernetes svc地址：
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 443/TCP 26d
ingress-nginx default-http-backend ClusterIP 10.102.71.61 80/TCP 2d
kube-system kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP 26d

All-in-One 部署: 如何通过ingress访问 service?

测试环境

root@node6:/etc/ansible# uname -r
4.4.0-101-generic

cat /etc/issue
Ubuntu 16.04.3 LTS \n \l

IP  10.1.88.46

遇到问题

root@node6:/etc/ansible# curl -H:traefik.tf56.lo 10.1.88.46
curl: (7) Failed to connect to 10.1.88.46 port 80: Connection refused
root@node6:/etc/ansible# curl -H:traefik.tf56.lo 127.0.0.1
curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused

NODE Type 方式可以访问

root@node6:/etc/ansible# curl -I -L 10.1.88.46:7417
HTTP/1.1 405 Method Not Allowed
Date: Thu, 07 Dec 2017 01:57:10 GMT
Content-Type: text/plain; charset=utf-8

root@node6:/etc/ansible# curl -I 10.68.184.120:8080
HTTP/1.1 405 Method Not Allowed
Date: Thu, 07 Dec 2017 01:59:37 GMT
Content-Type: text/plain; charset=utf-8

当前状态

root@node6:/etc/ansible# kubectl get svc --all-namespaces
NAMESPACE     NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                     AGE
default       kubernetes                ClusterIP   10.68.0.1       <none>        443/TCP                     20h
kube-system   kube-dns                  ClusterIP   10.68.0.2       <none>        53/UDP,53/TCP               20h
kube-system   kubernetes-dashboard      NodePort    10.68.35.96     <none>        80:2819/TCP                 20h
kube-system   traefik-ingress-service   NodePort    10.68.184.120   <none>        80:6640/TCP,8080:7417/TCP   26s

root@node6:/etc/ansible# kubectl get ing --all-namespaces
NAMESPACE     NAME             HOSTS             ADDRESS   PORTS     AGE
kube-system   traefik-web-ui   traefik.tf56.lo             80        1m

root@node6:/etc/ansible# kubectl describe  ing traefik-web-ui -n kube-system
Name:             traefik-web-ui
Namespace:        kube-system
Address:          
Default backend:  default-http-backend:80 (<none>)
Rules:
  Host             Path  Backends
  ----             ----  --------
  traefik.tf56.lo  
                   /   traefik-ingress-service:8080 (172.20.139.10:8080)
Annotations:
Events:  <none>

/root/local/bin/calicoctl node status
Calico process is running.

IPv4 BGP status
No IPv4 peers found.

IPv6 BGP status
No IPv6 peers found.

root@node6:/etc/ansible# /root/local/bin/calicoctl get ipPool
CIDR            
172.20.0.0/16

centos7.2 run error

TASK [prepare : 分发CA 证书] *****************************************************************************************************************************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: /etc/ansible/kubeasz/ca.pem
failed: [10.0.12.220] (item=ca.pem) => {"changed": false, "item": "ca.pem", "msg": "Could not find or access 'ca.pem'\nSearched in:\n\t/etc/ansible/kubeasz/roles/prepare/files/ca.pem\n\t/etc/ansible/kubeasz/roles/prepare/ca.pem\n\t/etc/ansible/kubeasz/roles/prepare/tasks/files/ca.pem\n\t/etc/ansible/kubeasz/roles/prepare/tasks/ca.pem\n\t/etc/ansible/kubeasz/files/ca.pem\n\t/etc/ansible/kubeasz/ca.pem"}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: /etc/ansible/kubeasz/ca-key.pem
failed: [10.0.12.220] (item=ca-key.pem) => {"changed": false, "item": "ca-key.pem", "msg": "Could not find or access 'ca-key.pem'\nSearched in:\n\t/etc/ansible/kubeasz/roles/prepare/files/ca-key.pem\n\t/etc/ansible/kubeasz/roles/prepare/ca-key.pem\n\t/etc/ansible/kubeasz/roles/prepare/tasks/files/ca-key.pem\n\t/etc/ansible/kubeasz/roles/prepare/tasks/ca-key.pem\n\t/etc/ansible/kubeasz/files/ca-key.pem\n\t/etc/ansible/kubeasz/ca-key.pem"}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: /etc/ansible/kubeasz/ca.csr
failed: [10.0.12.220] (item=ca.csr) => {"changed": false, "item": "ca.csr", "msg": "Could not find or access 'ca.csr'\nSearched in:\n\t/etc/ansible/kubeasz/roles/prepare/files/ca.csr\n\t/etc/ansible/kubeasz/roles/prepare/ca.csr\n\t/etc/ansible/kubeasz/roles/prepare/tasks/files/ca.csr\n\t/etc/ansible/kubeasz/roles/prepare/tasks/ca.csr\n\t/etc/ansible/kubeasz/files/ca.csr\n\t/etc/ansible/kubeasz/ca.csr"}
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: /etc/ansible/kubeasz/ca-config.json
failed: [10.0.12.220] (item=ca-config.json) => {"changed": false, "item": "ca-config.json", "msg": "Could not find or access 'ca-config.json'\nSearched in:\n\t/etc/ansible/kubeasz/roles/prepare/files/ca-config.json\n\t/etc/ansible/kubeasz/roles/prepare/ca-config.json\n\t/etc/ansible/kubeasz/roles/prepare/tasks/files/ca-config.json\n\t/etc/ansible/kubeasz/roles/prepare/tasks/ca-config.json\n\t/etc/ansible/kubeasz/files/ca-config.json\n\t/etc/ansible/kubeasz/ca-config.json"}
to retry, use: --limit @/etc/ansible/kubeasz/90.setup.retry

关于harbor问题

@gjmzj 你好，在“集群规划和基础参数设定”中需要配置Harbor：
#私有仓库 harbor服务器 (域名或者IP) 【可选】
#需要把 harbor服务器证书复制到roles/harbor/files/harbor-ca.crt
HARBOR_SERVER="harbor.mydomain.com"

这个是要求harbor要提前安装好并生成CA证书，是吧？但是在执行kubeasz的自动化安装脚本前，理论上dodker是没有安装的。这个时候harbor能正常安装吗？请指教

v1.9 FailedGetResourceMetric 无法使用自动扩展

resource cpu on pods (as a percentage of request): / 50%
使用新版本安装（升级和全新安装）都会出现无法使用自动扩展，日志中看到如下，不知道是如何处理

kubectl describe hpa
[2017-12-26_110934]Name: nginx-hpa
[2017-12-26_110934]Namespace: default
[2017-12-26_110934]Labels:
[2017-12-26_110934]Annotations:
[2017-12-26_110934]CreationTimestamp: Tue, 26 Dec 2017 11:08:51 +0800
[2017-12-26_110934]Reference: Deployment/nginx
[2017-12-26_110934]Metrics: ( current / target )
[2017-12-26_110934] resource cpu on pods (as a percentage of request): / 50%
[2017-12-26_110934]Min replicas: 1
[2017-12-26_110934]Max replicas: 10
[2017-12-26_110934]Conditions:
[2017-12-26_110934] Type Status Reason Message
[2017-12-26_110934] ---- ------ ------ -------
[2017-12-26_110934] AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
[2017-12-26_110934] ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
[2017-12-26_110934]Events:
[2017-12-26_110934] Type Reason Age From Message
[2017-12-26_110934] ---- ------ ---- ---- -------
[2017-12-26_110934] Warning FailedGetResourceMetric 23s horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)
[2017-12-26_110934] Warning FailedComputeMetricsReplicas 23s horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics.k8s.io)

Pod-network有好几种，建议在脚本里可以选择需要的一种。

Calico、Canal、Flannel、Romana、Weave Net。
@gjmzj

添加node kubelet 无法正常启动谢谢！

Dec 12 14:29:06 vm4 systemd: Starting Kubernetes Kubelet...
Dec 12 14:29:06 vm4 iptables: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Dec 12 14:29:06 vm4 systemd: kubelet.service: control process exited, code=exited status=4
Dec 12 14:29:06 vm4 systemd: Failed to start Kubernetes Kubelet.
Dec 12 14:29:06 vm4 systemd: Unit kubelet.service entered failed state.
Dec 12 14:29:06 vm4 systemd: kubelet.service failed.
Dec 12 14:29:11 vm4 systemd: kubelet.service holdoff time over, scheduling restart.
Dec 12 14:29:11 vm4 systemd: Starting Kubernetes Kubelet...
Dec 12 14:29:11 vm4 kubelet: I1212 14:29:11.753694 27943 feature_gate.go:156] feature gates: map[]
Dec 12 14:29:11 vm4 kubelet: I1212 14:29:11.753790 27943 controller.go:114] kubelet config controller: starting controller
Dec 12 14:29:11 vm4 kubelet: I1212 14:29:11.753799 27943 controller.go:118] kubelet config controller: validating combination of defaults and flags
Dec 12 14:29:11 vm4 systemd: Started Kubernetes systemd probe.
Dec 12 14:29:11 vm4 systemd: Starting Kubernetes systemd probe.
Dec 12 14:29:11 vm4 kubelet: I1212 14:29:11.760317 27943 mount_linux.go:205] Detected OS with systemd
Dec 12 14:29:11 vm4 kubelet: I1212 14:29:11.760346 27943 client.go:75] Connecting to docker on unix:///var/run/docker.sock
Dec 12 14:29:11 vm4 kubelet: I1212 14:29:11.760375 27943 client.go:95] Start docker client with request timeout=2m0s
Dec 12 14:29:11 vm4 systemd: kubelet.service: main process exited, code=exited, status=1/FAILURE
Dec 12 14:29:11 vm4 kubelet: I1212 14:29:11.768808 27943 feature_gate.go:156] feature gates: map[]
Dec 12 14:29:11 vm4 kubelet: error: failed to run Kubelet: unknown cloud provider "''"
Dec 12 14:29:12 vm4 systemd: Failed to start Kubernetes Kubelet.
Dec 12 14:29:12 vm4 systemd: Unit kubelet.service entered failed state.
Dec 12 14:29:12 vm4 systemd: kubelet.service failed.

heapster配置问题

@jmgao1983 @gjmzj
二位大神,您们好.
我搭建好k8s系统后, 增加了kubedns和dashboard, 但最后在配置heapster碰到问题了.在 dashboard里看不到各个node的CPU或图形情况, 另外heapster相关的网站也打不开,如下所示:

root@k8-d-1:/etc/ansible/kubeasz/kubeasz# ls -al manifests/heapster/
drwxr-xr-x 2 root root 4096 12月 11 10:20 .
drwxr-xr-x 6 root root 4096 12月 11 10:20 ..
-rw-r--r-- 1 root root 2238 12月 11 10:20 grafana.yaml
-rw-r--r-- 1 root root 1450 12月 11 10:20 heapster.yaml
-rw-r--r-- 1 root root 4471 12月 11 10:20 influxdb.yaml

kubectl create -f manifests/heapster/
deployment "monitoring-grafana" created
service "monitoring-grafana" created
serviceaccount "heapster" created
clusterrolebinding "heapster" created
deployment "heapster" created
service "heapster" created
deployment "monitoring-influxdb" created
service "monitoring-influxdb" created
configmap "influxdb-config" created

kubectl get pod -n kube-system | grep heapster
heapster-6956dd7956-gfpmw 1/1 Running 0 1m

kubectl get svc -n kube-system|grep heapster
heapster ClusterIP 10.68.48.43 80/TCP 1m

root@k8-d-2:# kubectl get pods -n kube-system | grep -E 'heapster|monitoring'
heapster-6956dd7956-gfpmw 1/1 Running 0 47m
monitoring-grafana-747cd57c4b-kk9jr 1/1 Running 0 47m
monitoring-influxdb-6755bd4788-6454k 1/1 Running 0 47m
root@k8-d-2:# kubectl get deployments -n kube-system | grep -E 'heapster|monitoring'
heapster 1 1 1 1 49m
monitoring-grafana 1 1 1 1 49m
monitoring-influxdb 1 1 1 1 49m

root@k8-d-2:~# kubectl cluster-info
Kubernetes master is running at https://10.10.2.17:8443
Heapster is running at https://10.10.2.17:8443/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://10.10.2.17:8443/api/v1/namespaces/kube-system/services/kube-dns/proxy
kubernetes-dashboard is running at https://10.10.2.17:8443/api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy
monitoring-grafana is running at https://10.10.2.17:8443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
monitoring-influxdb is running at https://10.10.2.17:8443/api/v1/namespaces/kube-system/services/monitoring-influxdb/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

请问一下我的问题出在哪里呢? 我该怎么样才能访问heapster的相关资源呢?(Heapster + InfluxDB + Grafana) , 再次感谢!

kubenetes master 网页无法打开

操作系统：CentOS Linux release 7.2.1511 (Core)

发现无法打开kubernetes master网页，具体操作如下：

1、获取cluster info

kubectl cluster-info              
Kubernetes master is running at https://192.168.31.100:8443

2、wget获取网页内容

wget https://192.168.31.100:8443
--2017-12-09 13:44:35--  https://192.168.31.100:8443/
正在连接 192.168.31.100:8443... 已连接。
错误: 无法验证 192.168.31.100 的由 “/C=CN/ST=HangZhou/L=XS/O=k8s/OU=System/CN=kubernetes” 颁发的证书:
  无法本地校验颁发者的权限。
要以不安全的方式连接至 192.168.31.100，使用“--no-check-certificate”。

3、

wget https://192.168.31.100:8443 --no-check-certificate
--2017-12-09 13:44:45--  https://192.168.31.100:8443/
正在连接 192.168.31.100:8443... 已连接。
警告: 无法验证 192.168.31.100 的由 “/C=CN/ST=HangZhou/L=XS/O=k8s/OU=System/CN=kubernetes” 颁发的证书:
  无法本地校验颁发者的权限。
已发出 HTTP 请求，正在等待回应... 401 Unauthorized
验证失败。

然后我本地查看master与node信息，发现集群状态正常：

 kubectl get nodes
NAME             STATUS    ROLES     AGE       VERSION
192.168.31.151   Ready     <none>    9m        v1.8.4
192.168.31.152   Ready     <none>    9m        v1.8.4

kubectl get componentstatus 
NAME                 STATUS    MESSAGE              ERROR
scheduler            Healthy   ok                   
controller-manager   Healthy   ok                   
etcd-2               Healthy   {"health": "true"}   
etcd-0               Healthy   {"health": "true"}   
etcd-1               Healthy   {"health": "true"}

这个是什么原因？

kubeasz/docs/quickStart.md 小错误

kubectl clusterinfo # 可以看到kubernetes master(apiserver)组件 running

应该为 kubectl cluster-info
我在centos7 测试机器上是这样，

我也是第一次学习k8s, 在我机器上好几个命令报如下的错误，请教下是哪里配错了？

[root@mnode ~]# kubectl cluster-info
Kubernetes master is running at https://192.168.21.3:6443
[root@mnode ~]# kubectl get node 
The connection to the server 192.168.21.3:6443 was refused - did you specify the right host or port?

查看集群信息，地址、端口没有错呢
先谢谢

多master多node集群无法通过master ip+nodeport来访问应用

你好，我使用example中的多master部署的方式部署了一套集群，各项都正常，但是部署的应用以nodeport方式暴露端口后，发现只能通过应用所在的node ip+端口来访问，而不是可以通过master ip+port方式开访问，并且master集群三台机器通过netstat查看均无对应的nodeport，这个和我之前部署的好像不大一样，应该是通过master ip+nodeport方式可以直接访问service的，是否有哪里配置的不对，请指教？
多谢

我看你的这个项目是基于flannel的，和你现在这个项目其实可以结合下。05.calico.yaml之后是不是能加上一个05.flannel.yml?

如果后面的06，07对05有依赖。也同步一个06-flannel-master和07-flannel-node

can't extend the node if that is ubuntu 14.04 version.

Hi ,
I tried to run the 20.addnode.yml for extend the additional node server but it is fail.
since the ubuntu 14.04 no systemd/systemctl services by default . so I can't not enable calico service through the command "systemctl daemon-reload && systemctl enable calico-node && systemctl restart calico-node"

may I know how to fix this issue for ubuntu 14.04 version ?

thanks so much !

关于node节点增加问题。亲，如果要增加node节点，只需要修改hosts中的new-node？然后执行ansible-playbook 20.addnode.yaml吗？

[new-node]
#192.168.1.xx NODE_ID=node6 NODE_IP="192.168.1.xx"
#192.168.1.xx NODE_ID=node7 NODE_IP="192.168.1.xx"

ansible-playbook 20.addnode.yaml

@gjmzj

优化

非常感谢提供这么好的部署方式，初始化节点，不过感觉向已经部署好的节点增加节点，不是特别好处理.

Reactive cluster support

Could you offer me a yml file which supports reactive the cluster?
Sometimes the computer would be shutdown,
and when I restart then computer and setup the cluster by "ansible-playbook 90.setup.yml",
but It will failed by authorize.

error logs:

TASK [kube-node : 运行calico-kube-controllers] ***********************************************************************************************
fatal: [192.168.3.38]: FAILED! => {"changed": true, "cmd": "sleep 15 && /root/local/bin/kubectl create -f /root/local/kube-system/calico/rbac.yaml && /root/local/bin/kubectl create -f /root/local/kube-system/calico/calico-kube-controllers.yaml", "delta": "0:00:15.602011", "end": "2017-12-07 02:43:09.558077", "msg": "non-zero return code", "rc": 1, "start": "2017-12-07 02:42:53.956066", "stderr": "Error from server (AlreadyExists): error when creating \"/root/local/kube-system/calico/rbac.yaml\": clusterroles.rbac.authorization.k8s.io \"calico-kube-controllers\" already exists\nError from server (AlreadyExists): error when creating \"/root/local/kube-system/calico/rbac.yaml\": clusterrolebindings.rbac.authorization.k8s.io \"calico-kube-controllers\" already exists\nError from server (AlreadyExists): error when creating \"/root/local/kube-system/calico/rbac.yaml\": serviceaccounts \"calico-kube-controllers\" already exists", "stderr_lines": ["Error from server (AlreadyExists): error when creating \"/root/local/kube-system/calico/rbac.yaml\": clusterroles.rbac.authorization.k8s.io \"calico-kube-controllers\" already exists", "Error from server (AlreadyExists): error when creating \"/root/local/kube-system/calico/rbac.yaml\": clusterrolebindings.rbac.authorization.k8s.io \"calico-kube-controllers\" already exists", "Error from server (AlreadyExists): error when creating \"/root/local/kube-system/calico/rbac.yaml\": serviceaccounts \"calico-kube-controllers\" already exists"], "stdout": "", "stdout_lines": []}
...ignoring

/var/log/syslog

deployment_controller.go:483] Error syncing deployment kube-system/calico-kube-controllers: Operation cannot be fulfilled on deployments.extensions "calico-kube-controllers": the object has been modified; please apply your changes to the latest version and try again
kube-apiserver[13592]: E1207 02:03:03.751358   13592 status.go:62] apiserver received an error that is not an metav1.Status: etcdserver: request timed out
Dec  7 02:03:04 ubuntu-xenial kube-apiserver[13592]: k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.ErrorNegotiated(0x7fe6fb0c1e48, 0xc426b751d0, 0x814ac00, 0xc4236d9740, 0x81712c0, 0xc420429200, 0x0, 0x0, 0x38b9253, 0x2, ...)
Dec  7 02:03:04 ubuntu-xenial kube-apiserver[13592]: logging error output: "k8s\x00\n\f\n\x02v1\x12\x06Status\x125\n\x06\n\x00\x12\x00\x1a\x00\x12\aFailure\x1a\x1detcdserver: request timed out\"\x000\xf4\x03\x1a\x00\"\x00"
Dec  7 02:03:19 ubuntu-xenial kube-controller-manager[13673]: I1207 02:03:19.283552   13673 deployment_controller.go:483] Error syncing deployment default/my-nginx: Operation cannot be fulfilled on deployments.extensions "my-nginx": the object has been modified; please apply your changes to the latest version and try again
Dec  7 02:04:28 ubuntu-xenial dockerd[11445]: time="2017-12-07T02:04:04.695583846Z" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = Internal desc = transport is closing"
 kube-apiserver[15921]: I1207 02:48:05.167841   15921 logs.go:41] http: TLS handshake error from 192.168.3.38:60334: remote error: tls: bad certificate

安装一切正常，但是无论如何找不到bgp peer

使用单主多从模式
master
root@ubuntu:/etc/ansible# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.47.2 0.0.0.0 UG 0 0 0 ens33
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.20.243.192 0.0.0.0 255.255.255.255 UH 0 0 0 cali5e6a6a095c0
172.20.243.192 0.0.0.0 255.255.255.192 U 0 0 0 *
172.20.243.193 0.0.0.0 255.255.255.255 UH 0 0 0 cali01719dd8ee3
172.20.243.194 0.0.0.0 255.255.255.255 UH 0 0 0 cali79d9335e2eb
172.20.243.196 0.0.0.0 255.255.255.255 UH 0 0 0 cali86295d33f81
192.168.47.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33

node01
root@ubuntu:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.47.2 0.0.0.0 UG 0 0 0 ens33
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.20.243.192 0.0.0.0 255.255.255.192 U 0 0 0 *
172.20.243.195 0.0.0.0 255.255.255.255 UH 0 0 0 cali383215e280f
192.168.47.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33

node02
root@ubuntu:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.47.2 0.0.0.0 UG 0 0 0 ens33
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.20.243.192 0.0.0.0 255.255.255.192 U 0 0 0 *
172.20.243.197 0.0.0.0 255.255.255.255 UH 0 0 0 cali5ef7fb336ac
192.168.47.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33

所有机器上的bgp peer信息
root@ubuntu:/etc/ansible# calicoctl node status
Calico process is running.

IPv4 BGP status
No IPv4 peers found.

IPv6 BGP status
No IPv6 peers found.

root@ubuntu:/etc/ansible#

安转顺利完成，但是不能获得node列表

系统：Centos 7.31
安转除了下面这一部分可选项在centos不能过以外，别的都比较顺利：

可选 ------安装docker查询镜像 tag的小工具----

name: apt更新缓存刷新
apt: update_cache=yes cache_valid_time=72000
tags: docker-tag
name: 安装轻量JSON处理程序
apt: name=jq state=latest
tags: docker-tag
name: 下载 docker-tag
copy: src=docker-tag dest={{ bin_dir }}/docker-tag mode=0755
tags: docker-tag

因为这一部分使用了apt-get

但是安装好以后，获取不到node list

[root@vb-centos1 ~]# kubectl cluster-info
Kubernetes master is running at https://192.169.1.130:6443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root@vb-centos1 ~]# kubectl get node
No resources found.
[root@vb-centos1 ~]# kubectl get pod --all-namespaces
No resources found.
[root@vb-centos1 ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 192.169.1.132 | node-to-node mesh | up | 13:01:31 | Established |
| 192.169.1.131 | node-to-node mesh | up | 13:02:17 | Established |
+---------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

我是使用了三个虚拟机，一个master 两个nodes的配置

如何将运行的image的端口外对映射出来?

@gjmzj 大神,请教一下,

我按您的教程顺利安装布置成功了.但现在有个问题,我在运行pod的时候,境像里的端口对外如何映射出来呢?
我尝试几个方法均不成功,不知道能不能得到您的指点? 谢谢了!!

Create nginx pod:
kubectl run nginx --image=nginx --port=80

Expose nginx as a service:
kubectl expose deployment nginx --target-port=80 --type=NodePort

root@k8-d-1:~# kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-7cbc4b4d9c-vkwpg 1/1 Running 0 6m

root@k8-d-2:# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8ff15f173594 nginx "nginx -g 'daemon ..." 6 minutes ago Up 6 minutes k8s_nginx_nginx-7cbc4b4d9c-vkwpg_default_5aaf73d1-db22-11e7-8d3c-000c292c98d7_0
7e18250b1b6b mirrorgooglecontainers/pause-amd64:3.0 "/pause" 6 minutes ago Up 6 minutes k8s_POD_nginx-7cbc4b4d9c-vkwpg_default_5aaf73d1-db22-11e7-8d3c-000c292c98d7_0
0473991c21e5 calico/node:v2.6.2 "start_runit" 2 hours ago Up 2 hours calico-node
root@k8-d-2:# docker logs 8ff15f173594

集群所有的物理机IP地址改变后无法运行cluster

@gjmzj
请教一下,我组成的cluster集群用的是 10.10.10.xx 这次IP地址给物理机.
今天我尝试将所有的物理机地址改为另外一个网段: 172.10.10.x ,,修改/etc/ansible/hosts内容然后再执行

ansible-playbook 99.clean.yml
ansible-playbook 90.setup.yml

最后发现运行状态不正常.
kubectl get cs
Unable to connect to the server: dial tcp 172.10.200:8443: getsockopt: no route to host
journalctl -n 50
看到它仍在找旧的IP服务器.
docker[1472]: bird: BGP: Unexpected connect from unknown address 10.10.10.84 (port 52815)

请问我除了重装系统之外还有办法修复这个问题吗?

谢谢了!!!

node 节点crash后,cluster更新pods要很长时间.

@gjmzj 请教一下大神,我的集群搭建好了后,做强行关机测试看看它的jobs会不会迅速切换到另外一个node上来.

以dashboard服务为例,我在三台node节点都提前下载了它的image.
我看到它当前运行在172.10.10.232上,所以我上去直接把这台机关机了.
然后观察它切换到另外一台node要大约3 - 5分钟才切换到另外一台node过去.
请问这是怎么回事呢? 理论上我希望它在几秒钟就发现node异常并尽快切换任务到另外一台node上的.

谢谢!!

root@kuber-3:~# kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
calico-kube-controllers-86df6c7c86-g95gr 1/1 Running 0 1h 172.10.10.233 172.10.10.233
heapster-7b76dbf757-vbwpm 1/1 Running 0 32m 172.20.127.194 172.10.10.233
kube-dns-54cff6c949-88hkz 3/3 Running 0 34m 172.20.146.193 172.10.10.231
kubernetes-dashboard-5b9649685d-pnmjd 1/1 Running 0 14s 172.20.127.196 172.10.10.233
kubernetes-dashboard-5b9649685d-xwg7r 1/1 Unknown 0 6m 172.20.153.66 172.10.10.232
monitoring-grafana-64747d765f-nvhlg 1/1 Running 0 32m 172.20.146.194 172.10.10.231
monitoring-influxdb-7dd6654659-7mhd9 1/1 Running 0 32m 172.20.146.195 172.10.10.231

在进行第7步kube-node安装的时候，我的kubelet启动参数中添加了--fail-swap-on=false \ --logtostderr=true \ --v=2 这个swap的禁用，但是启动之后，还是报错error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename

1.kubelet service配置文件
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
#--pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest
ExecStart=/root/local/bin/kubelet
--address=192.168.12.125
--hostname-override=192.168.12.125
--pod-infra-container-image=mirrorgooglecontainers/pause-amd64:3.0
--experimental-bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig
--cert-dir=/etc/kubernetes/ssl
--network-plugin=cni
--cni-conf-dir=/etc/cni/net.d
--cni-bin-dir=/root/local/bin
--cluster-dns=10.68.0.2
--cluster-domain=cluster.local.
--hairpin-mode hairpin-veth
--allow-privileged=true
--fail-swap-on=false
--logtostderr=true
--v=2
#kubelet cAdvisor 默认在所有接口监听 4194 端口的请求, 以下iptables限制内网访问
ExecStartPost=/sbin/iptables -A INPUT -s 10.0.0.0/8 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -s 172.16.0.0/12 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -s 192.168.0.0/16 -p tcp --dport 4194 -j ACCEPT
ExecStartPost=/sbin/iptables -A INPUT -p tcp --dport 4194 -j DROP
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
2.kubelet启动后日志输出
I1224 18:16:58.480975 23343 feature_gate.go:156] feature gates: map[]
I1224 18:16:58.481041 23343 controller.go:114] kubelet config controller: starting controller
I1224 18:16:58.481047 23343 controller.go:118] kubelet config controller: validating combination of defaults and flags
I1224 18:16:58.485590 23343 client.go:75] Connecting to docker on unix:///var/run/docker.sock
I1224 18:16:58.485616 23343 client.go:95] Start docker client with request timeout=2m0s
I1224 18:16:58.490521 23343 feature_gate.go:156] feature gates: map[]
W1224 18:16:58.490656 23343 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider should be set explicitly
W1224 18:16:58.490684 23343 server.go:324] standalone mode, no API client
I1224 18:16:58.495607 23343 manager.go:149] cAdvisor running in container: "/sys/fs/cgroup/cpu,cpuacct/user.slice"
W1224 18:16:58.506257 23343 manager.go:157] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15441: getsockopt: connection refused
W1224 18:16:58.506380 23343 manager.go:166] unable to connect to CRI-O api service: Get http://%2Fvar%2Frun%2Fcrio.sock/info: dial unix /var/run/crio.sock: connect: no such file or directory
I1224 18:16:58.515926 23343 fs.go:139] Filesystem UUIDs: map[3067423f-69e6-46ce-8935-306bd637a39d:/dev/vda1 97cdcaff-a8e7-4ffe-84ac-722f528abba0:/dev/vda2 a46458ae-3165-48b6-a14a-2bcd29897389:/dev/vda3]
I1224 18:16:58.515947 23343 fs.go:140] Filesystem partitions: map[tmpfs:{mountpoint:/dev/shm major:0 minor:18 fsType:tmpfs blockSize:0} /dev/vda3:{mountpoint:/var/lib/docker/overlay major:253 minor:3 fsType:xfs blockSize:0} /dev/vda1:{mountpoint:/boot major:253 minor:1 fsType:xfs blockSize:0} shm:{mountpoint:/var/lib/docker/containers/6578f528ef25f02f60f593c9560019fa7b51ffc05490c17ad32a14cea587d6f4/shm major:0 minor:39 fsType:tmpfs blockSize:0}]
I1224 18:16:58.519260 23343 manager.go:216] Machine: {NumCores:4 CpuFrequency:2199998 MemoryCapacity:8203042816 HugePages:[{PageSize:1048576 NumPages:0} {PageSize:2048 NumPages:0}] MachineID:68fca30095d25e4cb832f3357e70b88a SystemUUID:DAD79F94-D5D2-4685-9E28-5B828EDF18CE BootID:7d868d39-2348-4606-963c-a4c812598ed1 Filesystems:[{Device:tmpfs DeviceMajor:0 DeviceMinor:18 Capacity:4101521408 Type:vfs Inodes:1001348 HasInodes:true} {Device:/dev/vda3 DeviceMajor:253 DeviceMinor:3 Capacity:164269654016 Type:vfs Inodes:160429824 HasInodes:true} {Device:/dev/vda1 DeviceMajor:253 DeviceMinor:1 Capacity:1063256064 Type:vfs Inodes:1048576 HasInodes:true} {Device:shm DeviceMajor:0 DeviceMinor:39 Capacity:67108864 Type:vfs Inodes:1001348 HasInodes:true}] DiskMap:map[253:0:{Name:vda Major:253 Minor:0 Size:171798691840 Scheduler:none}] NetworkDevices:[{Name:eth0 MacAddress:52:57:00:00:36:bc Speed:0 Mtu:1500} {Name:tunl0 MacAddress:00:00:00:00 Speed:0 Mtu:1480}] Topology:[{Id:0 Memory:8589373440 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]} {Id:1 Memory:0 Cores:[{Id:0 Threads:[2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]} {Id:1 Threads:[3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None}
I1224 18:16:58.520236 23343 manager.go:222] Version: {KernelVersion:3.10.0-327.36.3.el7.x86_64 ContainerOsVersion:CentOS Linux 7 (Core) DockerVersion:17.09.1-ce DockerAPIVersion:1.32 CadvisorVersion: CadvisorRevision:}
W1224 18:16:58.521096 23343 server.go:232] No api server defined - no events will be sent to API server.
I1224 18:16:58.521111 23343 server.go:422] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename

master 节点高可用问题

比如：

name: 创建kube-scheduler的systemd unit文件
template: src=kube-scheduler.service.j2 dest=/etc/systemd/system/kube-scheduler.service

pod一直是ContainerCreating的状态

我是在openstack的ubuntu16.04虚机上起的allinone，部署脚本没有问题，只有两个警告：
TASK [prepare : 写入环境变量$PATH] ***********************************************************************************************************
[WARNING]: Consider using template or lineinfile module rather than running sed

TASK [prepare : 分发CA 证书] ***************************************************************************************************************
ok: [190.190.190.23] => (item=ca.pem)
changed: [190.190.190.23] => (item=ca-key.pem)
ok: [190.190.190.23] => (item=ca.csr)
ok: [190.190.190.23] => (item=ca-config.json)
[WARNING]: Could not match supplied host pattern, ignoring: lb

但是完成后，状态一直是ContainerCreating
root@zte-host:~# kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6756bbbb4d-7k77s 0/1 ContainerCreating 0 13m

我环境连不了外网，会不会是这个原因。应该怎么修改呢？有些资源我可以提下下好传到虚机上的。

安装dashboard失败。

root@host-192-168-10-67:~# kubectl get pod -n kube-system -o wide
NAME                                       READY     STATUS              RESTARTS   AGE       IP              NODE
calico-kube-controllers-587657658c-8sqpl   1/1       Running             0          21m       192.168.10.68   192.168.10.68
kubernetes-dashboard-69d5cddb47-wqhqz      0/1       ContainerCreating   0          13m       <none>          192.168.10.67

Dec 21 20:21:24 host-192-168-10-67 kubelet[83422]: E1221 20:21:24.150274   83422 cni.go:259] Error adding network: Unable to retreive ReadyFlag from Backend: resource does not exist
Dec 21 20:21:24 host-192-168-10-67 kubelet[83422]: E1221 20:21:24.150303   83422 cni.go:227] Error while adding to cni network: Unable to retreive ReadyFlag from Backend: resource d
Dec 21 20:21:24 host-192-168-10-67 kubelet[83422]: E1221 20:21:24.249914   83422 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = Ne
Dec 21 20:21:24 host-192-168-10-67 kubelet[83422]: E1221 20:21:24.249948   83422 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kubernetes-dashboard-69d5cddb47-wqhqz_kube-syst
Dec 21 20:21:24 host-192-168-10-67 kubelet[83422]: E1221 20:21:24.249956   83422 kuberuntime_manager.go:647] createPodSandbox for pod "kubernetes-dashboard-69d5cddb47-wqhqz_kube-sys
Dec 21 20:21:24 host-192-168-10-67 kubelet[83422]: E1221 20:21:24.249993   83422 pod_workers.go:186] Error syncing pod dfed5bb7-e647-11e7-9b02-000c2968a591 ("kubernetes-dashboard-69
Dec 21 20:21:24 host-192-168-10-67 kubelet[83422]: I1221 20:21:24.627883   83422 kubelet.go:1881] SyncLoop (PLEG): "kubernetes-dashboard-69d5cddb47-wqhqz_kube-system(dfed5bb7-e647-1
Dec 21 20:21:24 host-192-168-10-67 kubelet[83422]: W1221 20:21:24.627956   83422 pod_container_deletor.go:77] Container "6a0f791603438e159078276cc084ed0f8ee4990b5fbbaa4169c56525bae7

如何修改kube-controller-manager的默认参数

请教一下各位大神,我运行k8s后想单独修改一下kube-controller-manager的默认参数,
我注意到它的启动文件在:/etc/systemd/system/kube-controller-manager.service.
似乎并没有默认的配置文件, 请问我如何能修改它的一些默认参数呢? 像:
--node-monitor-grace-period=10s
--node-monitor-period=5s
--pod-eviction-timeout=30s

我直接运行报错如下:
root@kuber-master:~# kube-controller-manager
--master=172.10.10.223:8443
--node-monitor-grace-period=10s
--node-monitor-period=5s
--pod-eviction-timeout=30s

I1227 11:53:38.505745 16492 controllermanager.go:109] Version: v1.8.6
I1227 11:53:38.506302 16492 leaderelection.go:174] attempting to acquire leader lease...
F1227 11:53:38.506485 16492 controllermanager.go:221] listen tcp 0.0.0.0:10252: bind: address already in use

还请请指点一下,谢谢!!! 预祝各位大神新春快乐!
@DiamondYuan @jmgao1983 @277270678 @gjmzj

安装文档说明

@jmgao1983 @gjmzj
Hello 两位大神
在下在CA配置那里有一点不明
”下载证书工具 CFSSL“
怎么下载？
还有就是教程能不能改成shell+命令+变量？

谢谢

centos7 下没有出现calico网卡

@gjmzj 我在centos7下执行AllINOne的ansible-playbook 90.setup.yml，经验证除了calico外，其余都OK，很完美 :)

calico的现象是，ip addr后没有看到calico网卡，不知道是怎么回事？
其中calicoctl node status的输出是：
Calico process is running.

IPv4 BGP status
No IPv4 peers found.

IPv6 BGP status
No IPv6 peers found.

calicoctl get ipPool -o yaml的输出是：

apiVersion: v1
kind: ipPool
metadata:
cidr: 172.17.0.0/16
spec:
nat-outgoing: true

kubectl get pod -n kube-system -o wide的输出看似也是正确的
NAME READY STATUS RESTARTS AGE IP NODE
calico-kube-controllers-699495f8d4-hshk4 1/1 Running 0 10m 10.65.93.34 10.65.93.34

是否可以把安装docker这一步作为额外条件。而不是在流程中。

因为安装docker的操作往往有很多种。和k8s其他管理不大。

建议把pod-infra-container-image做成可配置的

--pod-infra-container-image=mirrorgooglecontainers/pause-amd64:3.0 \

把这个变成读取host文件里的镜像地址。

kubedns-无法解析外网

@gjmzj 请教一下,我看到您做了kubedns的新教程,按您的教程执行kubectl create -f /etc/ansible/manifests/kubedns/kubedns.yaml,系统提示均运行正常.
最后我用kubectl run busybox --rm -it --image=busybox /bin/sh来做nslook解析时,前面您的那些例子我都可以正常解析,但是最后解析外网却不成功. 换成任何物理机的外网地址都不行,请问这是我哪里配置有错吗?

谢谢!!
root@k8-d-1:~# kubectl run busybox --rm -it --image=busybox /bin/sh
If you don't see a command prompt, try pressing enter.
/ # nslookup www.baidu.com
Server: 10.68.0.
Address 1: 10.68.0.2 kube-dns.kube-system.svc.cluster.local

nslookup: can't resolve 'www.baidu.com'
/ # Session ended, resume using 'kubectl attach busybox-6bf46598ff-8lnkw -c busybox -i -t' command when the pod is running

多主多节点 master virtual ip

请教一下，在多主多节点的配置中，您提到要规划一个额外的master VIP。
这个master VIP是只需要分配一个与其他master同网段的就行，还是得有其他什么要求？
或者您能详细解释一下这个master VIP吗？
谢谢

在 node上安装kubelet后，报错error: failed to run Kubelet: cannot create certificate signing request: Post https://${NODE_IP}:6443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: dial tcp ${NODE_IP}:6443: getsockopt: connection refused

dashboard

请问我如何升级dashboard到1.8呢

calico创建后无法建立BGP,单节点外部服务可用

[root@host ~]# calicoctl node status
Calico process is running.

IPv4 BGP status
No IPv4 peers found.

IPv6 BGP status
No IPv6 peers found.

TLS证书制作，证书请求文件的hosts字段有什么作用

"CN": "kubernetes", "hosts": [ "127.0.0.1", "172.16.1.235", "172.16.1.237", "172.16.1.244", "10.254.0.1", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local" ],
首先说明我etcd和k8s用的同一套证书也是同一台机器，我尝试过去掉里面235，237，244等ip，发现用etcdctl访问的时候，只要用的ca.pem是同一个，证书和私钥根本无所谓呢

easzlab / kubeasz Goto Github PK

kubeasz's People

Contributors

Stargazers

Watchers

Forkers

kubeasz's Issues

可选 ------安装docker查询镜像 tag的小工具----

Recommend Projects

Recommend Topics

Recommend Org