k8sp / sextant Goto Github PK
View Code? Open in Web Editor NEWFully automatic installation of CoreOS+Kubernetes clusters
License: Apache License 2.0
Fully automatic installation of CoreOS+Kubernetes clusters
License: Apache License 2.0
Currently I just set /dev/sdb as the OSD disk. In real world, the device name may vary, and there may be more than one disk can be used for OSD. So make it automatically find all the available disks (except the one that has the system installed), and run an OSD daemon with each of them.
每次使用vmtest/run之后,都有可能遗留一两个虚拟机没有删掉。用了一段时间之后,用VirtualBox应用可以看到遗留下来不少虚拟机。
@lipeng-unisound 以下这些DHCP配置项是必要的吗?
default-lease-time 600;
max-lease-time 7200;
authoritative;
option rfc3442-classless-static-routes code 121 = array of integer 8;
subnet 10.10.10.0 netmask 255.255.255.0 {
option rfc3442-classless-static-routes 24, 192,168,6, 10,10,10,254, 24, 192,169,100, 10,10,10,254, 16, 10,200, 10,10,10,254;
}
比如这里 https://github.com/k8sp/auto-install/blob/master/cloud-config-server/server.go#L28 有 etcd endpoint的IP地址。应该改成一个命令行参数。
ln -s ../testutil/in_docker_test.bash
为了解决#73
[by @wangkuiyi] 关于这个讨论的背景,在 #109
https://ceph.com/planet/getting-started-with-the-docker-rbd-volume-plugin/
http://www.sebastien-han.fr/blog/2015/08/17/getting-started-with-the-docker-rbd-volume-plugin/
http://www.atwop.com/archives/852.html
编译这个插件报错:
go get github.com/yp-engineering/rbd-docker-plugin
# github.com/yp-engineering/rbd-docker-plugin
../../../../yp-engineering/rbd-docker-plugin/main.go:83: cannot use d (type cephRBDVolumeDriver) as type volume.Driver in argument to volume.NewHandler:
cephRBDVolumeDriver does not implement volume.Driver (missing Capabilities method)
目前云知声机群pxe server上的dhcp配置文件�绑定了太多fixed ip地址。
#
# DHCP Server Configuration file.
# see /usr/share/doc/dhcp*/dhcpd.conf.example
# see dhcpd.conf(5) man page
#
#
# DHCP Server Configuration file.
# see /usr/share/doc/dhcp*/dhcpd.conf.example
# see dhcpd.conf(5) man page
#
# create new
# specify domain name
# option domain-name "ai-labs.unisound.com";
# specify name server's hostname or IP address
# option domain-name-servers 10.10.10.1;
next-server 10.10.10.192;
filename "pxelinux.0";
# default lease time
default-lease-time 600;
# max lease time
max-lease-time 7200;
# this DHCP server to be declared valid
authoritative;
option rfc3442-classless-static-routes code 121 = array of integer 8;
# option ms-classless-static-routes code 249 = array of integer 8;
# specify network address and subnet mask
subnet 10.10.10.0 netmask 255.255.255.0 {
# specify the range of lease IP address
# range dynamic-bootp 10.10.10.206 10.10.10.212;
# specify broadcast address
option broadcast-address 10.10.10.255;
# specify default gateway
option routers 10.10.10.192;
# specify domain name
option domain-name "ailab.unisound.com";
# specify name servers
option domain-name-servers 10.10.10.192, 8.8.8.8, 8.8.4.4;
option rfc3442-classless-static-routes 24, 192,168,6, 10,10,10,254, 24, 192,169,100, 10,10,10,254, 16, 10,200, 10,10,10,254;
# option ms-classless-static-routes 32, 111, 111, 111, 254, 0, 0, 0, 0, 111, 111, 111, 254;
host zodiac-01 {
hardware ethernet 00:25:90:C0:F7:80 ;
fixed-address 10.10.10.201 ;
}
host zodiac-02 {
hardware ethernet 00:25:90:C0:F6:EE ;
fixed-address 10.10.10.202 ;
}
host zodiac-03 {
hardware ethernet 00:25:90:C0:F6:D6 ;
fixed-address 10.10.10.203 ;
}
host zodiac-04 {
hardware ethernet 00:25:90:C0:F7:AC ;
fixed-address 10.10.10.204 ;
}
host zodiac-05 {
hardware ethernet 00:25:90:C0:F7:7E ;
fixed-address 10.10.10.205 ;
}
host zodiac-06{
hardware ethernet 00:25:90:c0:f7:62;
fixed-address 10.10.10.206;
}
host zodiac-07{
hardware ethernet 00:25:90:c0:f7:68;
fixed-address 10.10.10.207;
}
host zodiac-08{
hardware ethernet 00:25:90:c0:f7:7a;
fixed-address 10.10.10.208;
}
host zodiac-09{
hardware ethernet 00:25:90:c0:f7:c8;
fixed-address 10.10.10.209;
}
host zodiac-10{
hardware ethernet 00:25:90:c0:f7:88;
fixed-address 10.10.10.210;
}
host zodiac-11{
hardware ethernet 00:25:90:c0:f7:7c;
fixed-address 10.10.10.211;
}
host zodiac-12{
hardware ethernet 00:25:90:c0:f7:86;
fixed-address 10.10.10.212;
}
host coreos-191 {
hardware ethernet 00:e0:81:ee:82:c4;
fixed-address 10.10.10.191;
}
}
skydns.go:98:1:warning: getSkyDNSFile is unused (deadcode)
skydns_test.go:17:2:warning: unused global variable systemdContent (varcheck)
skydns_test.go:30:2:warning: unused global variable upstartContent (varcheck)
在写bootstrapper的时候,测试比较有挑战——要测试在Ubuntu和CentOS里,我们的代码是否正确。
一个直接的想法是在docker里跑:比如创建一个Dockerfile,FROM ubuntu:14.04,然后在里面跑我们的程序,看看效果是否如我们期待。
技术上,我们确实可以用Go原因写一个unit test,来创建和执行这样一个docker image,因为有一个很好用的Go的docker API库: https://github.com/fsouza/go-dockerclient 。我尝试着调用这样一个库,写unit tests,在本地(我的Mac和Linux电脑)执行没问题: https://github.com/wangkuiyi/learn-ci-docker/blob/master/main_test.go
但是,在Github+TravisCI环境里,这个办法就不成了。因为TravisCI是在一个Docker container里执行我们的Go unit test的。而当我们的程序去连接docker daemon的时候,会发现不存在 Unix socket /var/run/docker.sock
:
$ go test -v ./...
=== RUN TestDockerAPI
--- SKIP: TestDockerAPI (0.00s)
main_test.go:18: Cannot list iamges: Get http://unix.sock/images/json: dial unix /var/run/docker.sock: connect: no such file or directory
实际上,如果我们对TravisCI更加可控的话,我们可以干这么一招儿:
TravisCI可以启动docker container,是因为其租用的虚拟机上有docker daemon,也就有 /var/run/docker.sock
,
如果TravisCI启动docker container的时候,用这篇文章里的做法
docker run -v /var/run/docker.sock:/var/run/docker.sock ...
那么docker container里就有 /var/run/docker.sock
了。这样我们的Go unit test 就可以运行了。
可惜,作为TravisCI的用户,我们没法控制TravisCI如何启动containers。但是如果将来有一天,我们自己构建Jenkins之类的CI系统,可以用这个办法的。
因为目前我们用的就是TravisCI,所以我用 _test.bash 脚本来启动docker containers,来测试我们的代码的正确性。
目前没有一个脚本程序(bash或者ansible)配置PXE server。所以PXE server的配置并不自动。
config文件夹下利用 go 生成 DHCP,nginx,PXE 的配置,通过 yaml 作为输入
欠缺一个串起来一键运行的 run.bash 在此bash里检测并安装 dhcpd nginx 等服务,并且利用 go 生成相关的配置
错误信息如下
default: Inserting generated public key within guest...
default: Removing insecure key from the guest if it's present...
default: Key inserted! Disconnecting and reconnecting using new SSH key...
default: Warning: Authentication failure. Retrying...
default: Warning: Authentication failure. Retrying...
最后会询问用户名和密码。都输入vagrant就可以登录进去执行unit test了。
liuqs@BlackTurtle:~>docker run -ti centos:7 /bin/bash
[root@7430895f5868 /]# systemctl status systemd-journald
Failed to get D-Bus connection: Operation not permitted
解决方案:采用 /usr/sbin/init 自动启动 dbus daemon
$ docker run --privileged -d -ti -e "container=docker" -v /sys/fs/cgroup:/sys/fs/cgroup centos:7 /usr/sbin/init
6dd3234f6c9d3475fd56c2996ab25269d646aca4bde219166b0d4f6c9570046e
$ docker exec -it 6dd323 /bin/bash
[root@6dd3234f6c9d /]# systemctl status systemd-journald
● systemd-journald.service - Journal Service
Loaded: loaded (/usr/lib/systemd/system/systemd-journald.service; static; vendor preset: disabled)
Active: active (running) since Sat 2016-07-30 11:35:10 UTC; 2min 22s ago
Docs: man:systemd-journald.service(8)
man:journald.conf(5)
Main PID: 21 (systemd-journal)
Status: "Processing requests..."
CGroup: /docker/6dd3234f6c9d3475fd56c2996ab25269d646aca4bde219166b0d4f6c9570046e/system.slice/systemd-journald.service
└─21 /usr/lib/systemd/systemd-journald
参考:
go get golang.org/x/net 下午测试需要翻墙
2016/08/02 00:12:24 Running go get -u github.com/skynetservices/skydns ...
package golang.org/x/net/context: unrecognized import path "golang.org/x/net/context" (https fetch: Get https://golang.org/x/net/context?go-get=1: dial tcp 216.58.221.241:443: i/o timeout)
https://github.com/k8sp/auto-install/tree/master/k8s-install-systemd-unit 介绍了手动生成 master.zip 和 worker.zip 的过程。这个过程应该是通过一个Makefile来做的,而不只是在文档里描述。
一份在:https://github.com/k8sp/auto-install/blob/master/cloud-config-server/template/unisound-ailab/build_config.yml
另一份在:DHCP的配置文件里。
不应该有冗余信息,因为可能修改了一处,而忘记对应地修改另一处。
If Go code is in a private Github repo, we wouldn't be able to run go get github.com/account/repo
to retrieve the code. It is true that there are workarounds like those use Github SSH keys, but which would introduce other problems like go get -u
doesn't work.
The easiest solution is to
go get
, to access the repo via https://<token>:github.com/account/repo
instead of via https://github.com/account/repo
.Go to the Web page of Github, login, and go to Github settings page. At the "personal access tokens" tab, click the button "generate personal access token".
Select "repo" in "scopes":
Copy and paste the generated token number, and save it somewhere you won't forget. An unsafe and reasonable place is your email box.
Run the following command
git config --global url."https://c61axxxxxxxxxxxxxxx:[email protected]/".insteadOf "https://github.com/"
where c61axxxxxxxxxxxxxxx
is your Github personal access token.
Then you will find something new was added to your ~/.gitconfig
file:
[url "https://c61axxxxxxxxxxxxxxx:[email protected]/"]
insteadOf = https://github.com/
Now, run go get github.com/k8sp/auto-install/cloud-config-server
.
Follow the etcd demo in this repo.
Bootstrapper have several functions on the schedule up to now. Shall we need a web UI for it? If we want to view the config(scripts) or change them in future, too many api managed by logging in server will make mess.
@wangkuiyi @Yancey1989
我看到 @typhoonzero 的PR里把在每台机器上启动flanneld的逻辑改编成了只在Kubernetes master nodes上启动: https://github.com/k8sp/auto-install/pull/64/files#r73539623
因为我对flanneld了解不够,又因为为了每个PR目标单纯所以从上面PR里把这个修改给去掉了,所以还得请大家注意可能要在另一个PR里加入这个修改。
可以参考etcd的discovery服务:
这样,CCTS即可作为一个全局的config server
Step1 生成 k8s 机群 keypair 文件
修改 environment 文件中 KUBERNETES_MASTER_IPV4 地址为机群 master 节点 IP
k8s-install-systemd-unit 里多个子目录里有 environment 文件。这里说的是哪一个?
cloud-config-server and bootstrapper must use the same convention of hostname.
根据我们目前的需求,每个PXE Server只对应一个集群,证书的生产可以在CCTS启动时一次性生成好,我觉得流程可以是这样:
ca.pem, ca-key.pem
这两个文件ca.pem, apiserver.pem, apiserver-key.pem
这三个文件的信息写入返回的cc文件中ca.pem, ca-key.pem
生成worker-key.pem, worker.pem
写入返回的cc文件中相关伪码如下:
func generateRootAC() {
if fileExist("./tls/ca.pem") && fileExist("./tls/ca-key.pem") {
fmt.Print("ca.pem has already exists.")
return false
}
// Generate ca.pem, ca-key.pem located ./tls
out, err := exec.Command("/bin/bash", "./script/generate_root_ca.sh").Output()
if err != nil {
fmt.Printf("Generate Root AC Failed: %s\n", out)
}
}
func processKubeMasterCert(ip string) bool {
// Generate apiserver.pem, apiserver-key.pem located ./tls/master-${master_ip}/
out, err := exec.Command("/bin/bash", "./script/generate_master_ca.sh", ip).Output()
if err != nil {
fmt.Printf("Generate Master AC Failed: %s\n", out)
return false
}
return true
}
func processKubeNodeCert(ip string) {
// Generate work.pem, work-key.pem located ./tls/work-${work_ip}/
out, err := exec.Command("/bin/bash", "./script/generate_work_ca.sh". ip).Output()
if err != nil {
fmt.Printf("Generate Work AC Failed: %s\n", out)
return false
}
return true
}
证书生产的脚本可参考https://github.com/k8sp/bigdata/tree/master/install/tls
如题,k8s集群的部署环境会基于某些已经存在DHCP服务的集群环境,比如云知声的环境和百分点的环境。所以需要一种方式,能避免k8s集群自动安装后自动分配的IP和现有已存在的环境发生冲突。现有的方案和想法:
This may be done by watching the etcd until the conf and key files are presented. @pineking can you solve this issue?
在启动kubernetes进程时,需要通过rkt启动ACI格式的hyperkube,flannel等镜像,而这个镜像地址默认是在公网quay.io上面,这会带来两个问题:
是否有搭建私有ACI仓库的方案或者其他解决此问题的方法呢?
我脑海里设想的流程和架构如下,请大家看看我们的理解是否是一致的,以及这样的设想是否有什么问题。
实际网络环境千差万别,为了让工作目标清晰可以执行,我建议我们只考虑以下网络情况,请大家看是否可以:
/etc/network/interfaces
里写死了IP地址(static IP)。这台机器会是我们的bootstrapper server。【讨论】我们的笔记本电脑怎么连接机群?如果要通过笔记本电脑ssh到bootstrapper server,是不是笔记本电脑也得是机群中的一员?
【讨论】既然所有机器都会把自己的IP和hostname注册到SkyDNS里,是不是etcd cluster members就不需要fixed IP了,直接用他们的hostname就好了?
scp ca* bootstrapper-server:/
scp $GOPATH/bin/bootstrapper bootstrapper-server:/
ssh bootstrapper-server -c "sudo /bootstrapper -ca-key=/ca.key -ca-crt=/ca.crt
bootstrapper
程序配置和启动各个服务,包括cloud-config-server:
cloud-config-server -ca-key=/ca.key -ca-crt=/ca.crt
这样,cloud-config-server也就有了CA,从而可以为每个node生成各自的crt。
每台电脑买来的时候BIOS里默认的启动设备顺序是:硬盘优先,网络启动随后。这个默认顺序不需要修改,于是有以下流程:
install.sh
。install.sh
一方面访问bootstrapper server上的cloud-config-server,提供node的primary NIC的MAC地址,得到node的cloud-config文件(第二级cloud-config),这是一个YAML配置文件,
install.sh
执行coreos-install
脚本把CoreOS操作系统安装到本地磁盘,并且将第二级cloud-config文件配置为本地CoreOS启动时执行。install.sh
重启node。而且看起来目前的流程并不足以全自动安装。
http://stackoverflow.com/questions/28055715/running-services-upstart-init-d-in-a-container
导致通过 Upstart 创建的服务起不来,有点类似 CentOS的 systemctl的问题 #54
目前https://github.com/k8sp/auto-install/blob/master/cloud-config-server/template/cloud-config.template 中的内容有一些说通过shell 脚本进行wget从外网获取zip包等方式进行安装。我们更倾向于通过配置文件加载,以service的方式进行,另外一些命令等二进制包可以直接通过bootstrapper上的nginx服务器进行提供。
另外对此修改也需要一些bootstrapper的功能上的支持,和对https://github.com/k8sp/auto-install/blob/master/cloud-config-server/template/unisound-ailab/build_config.yml 和 https://github.com/k8sp/auto-install/blob/master/cloud-config-server/template/template.go 进行一些修改和添加字段。
#53
Currently, every time the system is booted, the script that starts the Ceph OSD daemon will check if there is an existing Ceph OSD container that was once run for a disk. If so, then docker start
is executed, otherwise docker run
a new container and create a new OSD.
However, we observed that after the system is updated, this method fails, because when we run ceph osd tree
we see two OSDs, with one down and one up. I guess this is because after the system is updated, the docker containers that were running in the old system are lost.
We need to find a more proper way to deal with this problem.
我在 #48 里引入了一个办法:为了验证bootstrapper可以正确地在Ubuntu里安装、配置和启动DHCP服务,我写了 in_docker_test.bash 在 ubuntu:14.04 这样的Docker container里跑Go的unit test程序。
但是碰到了一个问题:service isc-dhcp-server start
执行失败。DHCP服务启动不起来。 @pineking 在物理机,我在虚拟机上也都碰到了一样的问题。
根据 https://github.com/k8sp/bare-metal-coreos/tree/master/pxe-on-rasppi 描述,DHCP服务其实是通过/usr/sbin/dhcpd
启动的。我们手工运行/usr/sbin/dhcpd
,可以看到屏幕上打印的错误信息。 @pineking 在物理机上的截图如下:
可见,DHCP服务在抱怨没找到匹配的子网。 @pineking 据此尝试修改了unit test里使用的子网配置,使其与物理机上的一致,DHCP就可以启动起来了。
随后,我查到 https://docs.docker.com/v1.7/articles/networking/ 上解释了Docker的默认子网设置。据此修改了Go unit test里的子网配置,使其可以在Docker container里启动DHCP服务。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.