flannel-io / flannel Goto Github PK

View Code? Open in Web Editor NEW

8.7K 8.7K 2.9K 63.81 MB

flannel is a network fabric for containers, designed for Kubernetes

License: Apache License 2.0

Go 90.50% C 2.60% Makefile 2.29% Dockerfile 0.67% Shell 3.95%

docker docker-image flannel go kubernetes network overlay-network subnet

flannel's People

Contributors

Stargazers

Watchers

Forkers

eyakubovich yungsang lloydchang bcwaldon rubandeventhiran lemonhall schevalier toshisam rainsome-org1 rm3l mrunalp th3architect is00hcw robszumski hingstarne henrypfhu hustcat josselin-c visualphoenix inthecloud247 ligustah huskeder bioshrek nightwolfzor sheltowt charwliu xuzhaokui anguslees eparis liusdu demohint mindscratch rayleyva antmanler cjcullen mtsuk flynn-archive sysalexis endocode mahak mdshuai zzc1986 jbelke impulsehu mbrukman lemenkov antoniomeireles gavioto 3van lvlv cnh zofuthan amos6224 aaronyangzju rohansingh lrdcasimir aveshagarwal windwizard hectorj2f tonyli71 grepory lipengyuntian hellufo2 mycollectresp forqlift mohdahmad docker-tools benlogn sohailalam2 greenpau ipv1337 meghanschofield jwuwork jayunit100 mohit no2key cloudxtreme jonboulle stormltf chanehua bpradipt everythingme heidsoft-paas hw-qiaolei kelcecil cusspvz dcbw sujitfulse liuyang430068 lihuanghai shenjinxi driabich guoyang2011 rajatchopra chikatambun imzhulei chenweihua tangfeixiong opensorceress bprashanth

flannel's Issues

Investigate if Docker uses x.x.x.0 address and move TUN off it if so

Right now Kolach will allocate a /24 (by default) for the host and assign x.x.x.0 to the TUN device itself. It's expected that docker0 will get x.x.x.1 and veth interfaces x.x.x.2+. However if Docker also assigns .0 to the veth interfaces, there will be a collision.

Starting services in order etcd -> flannel -> docker results in etcd not listening on docker0 interface

[08:24am] dysinger: heya
[08:24am] dysinger: been trying out rudder (now renamed to flannel).
[08:24am] dysinger: How can I fix this problem ? https://twitter.com/dysinger/status/530062731664445440
[08:28am] eyakubovich: dysinger: how come etcd is bound to docker0?
[08:29am] dysinger: well I mean that it’s not bound to the host’s docker interface so that guests can access etcd.
[08:29am] dysinger: because flannel reconfigures it after etcd has already started
[08:29am] dysinger: right?
[08:30am] eyakubovich: dysinger: flannel should start before docker
[08:30am] dysinger: correct
[08:30am] dysinger: it goes etcd -> flannel -> docker
[08:31am] eyakubovich: ah, but containers in docker reference etcd via docker0 address , right?
[08:31am] dysinger: y
[08:31am] eyakubovich: but you can programmatically get docker0 address
[08:31am] dysinger: but it’s no longer listening on 172…. the network interface was changed to 10…./24
[08:32am] dysinger: I’m even handing the host’s docker0 interface to the guest. It’s just that etcd isn’t listening on that new network address.
[08:33am] eyakubovich: dysinger: i see. what ver of etcd are you using?
[08:33am] dysinger: coreos alpha
[08:33am] dysinger: whatever is on there
[08:34am] eyakubovich: dysinger: can you netstat and see what address it's bound to?
[08:34am] dysinger: suer
[08:34am] eyakubovich: i think it might be bound explicitly to 127.0.0.1
[08:43am] dysinger: eyakubovich: https://gist.github.com/dysinger/44308d025f98e0108257
[08:46am] eyakubovich: dysinger: it's bound to 0.0.0.0 so it's odd. can you ping 10.0.37.1?
[08:46am] eyakubovich: from container
[08:47am] dysinger: eyakubovich: yes
[08:49am] eyakubovich: dysinger: trying to think... wondering if iptables blocking it. let me try to spin this up
[08:50am] dysinger: eyakubovich: I can share my CFN template if you like (with the cloud init bits)
[08:51am] eyakubovich: dysinger: that would be helpful
[08:51am] eyakubovich: dysinger: if you don't want to share on IRC, email it to [email protected]
[08:53am] dysinger: eyakubovich: https://gist.github.com/dysinger/1f69b82687f70d0f080a there’s nothing private about it.
[09:15am] eyakubovich: dysinger: sorry, went down the wrong path. the cloud-init looks fine. let's do a simple ncat test: on the host do "ncat -l 9999" and then try to reach it from container via docker0 IP.
[09:15am] dysinger: k
[09:16am] dysinger: works fine
[09:17am] dysinger: this is because I’m binding to the interface after flannel has changed it
[09:17am] dysinger: etcd starts way before flannel
[09:17am] dysinger: (my guess)
[09:18am] eyakubovich: dysinger: right but etcd binds to 0.0.0.0 which should work regardless of what IPs other interfaces have. interface IP's can change
[09:19am] eyakubovich: dysinger: but we can test that by bouncing etcd
[09:19am] eyakubovich: dysinger: so etcd will rebind and see if that helps
[09:24am] dysinger: I tried restarting etcd after flannel but doing that across all the nodes in the cluster at the same time (roughly) caused probs (expected).
[09:24am] dysinger: let me bounce etcd & try again
[09:26am] eyakubovich: dysinger: sorry, i meant just bouncing one instance -- on the node where you're testing
[09:27am] dysinger: y
[09:29am] dysinger: hmm
[09:29am] dysinger: I’m trying sudo -i systemctl restart etcd.service but it’s just hanging there forever.
[09:30am] dysinger: there we go
[09:31am] dysinger: eyakubovich: works perfect now
[09:32am] eyakubovich: dysinger: so it is related to docker0 coming up after. can you file an issue on github under flannel? I can also do that but it's better if you do so you get notified of progress
[09:32am] dysinger: now the problem is we don’t really want to start etcd across a cluster of a dozen nodes & then restart etcd right after it just got started … across the whole cluster.
[09:34am] eyakubovich: dysinger: agreed, this should work without restart
[09:35am] dysinger: k I’ll make an issue

UDP Backend non-default port may not work

This config did not work for me:

{
    "Network": "10.99.0.0/16",
    "SubnetLen": 24,
    "SubnetMin": "10.99.0.0",
    "SubnetMax": "10.99.255.0",
    "Backend": {
        "Type": "udp",
        "Port": 9555
    }
}

This config does work:

{
    "Network": "10.99.0.0/16",
    "SubnetLen": 24,
    "SubnetMin": "10.99.0.0",
    "SubnetMax": "10.99.255.0",
    "Backend": {
        "Type": "udp"
    }
}

My flannel was built from v0.1.0 source.

Dockerfile: have a buildable dockerfile

It is quite nice to have both a minimal container as the current Dockerfile builds and a container built from git. etcd has started doing this and replicating it here would be nice:

https://quay.io/repository/coreos/etcd-git
https://github.com/coreos/etcd/blob/master/scripts/build-docker

Problem in forwarding packets: The IP of container gets replaced with the IP of docker or flannel interface

I am trying to set up Hadoop cluster using Kubernetes, CoreOSand flannel (vxlan). In Hadoop, there is a Namenode (master) and Datanodes (workers), and Datanodes register on Namenodes. Now, when Datanode (inside a container) registers itself on Namenode, it sends a request. The Namenode receives this request and registers the Datanode with the IP of the source of the packet. However, the source in the request is being replaced with the IP address of Docker interface, if Datanode lands on the same host (minion) as Namenode. Or the source is being replaced with Flannel interface IP, if it gets assigned to another minion.

Please, see the following gists:

logs of Container processes:
https://gist.github.com/LuqmanSahaf/fd7ee3bf9b1766e4a5ad

TCPDump(s) of different interfaces:
https://gist.github.com/LuqmanSahaf/88b4b9ddf2955028bfa8

Namenode catches the IP from RPC request and uses it to register Datanode.

Now, I want to ask a few questions:
Why is that the IP of the container is being replaced?
Is this problem related to flannel or Docker? (Because, even if the Datanode is in the same subnet (minion), the IP gets replaced with the Docker interface IP.)

Please, help me out with this.

should a keepalive be necessary?

In all my work with flannel, it certainly seems that the network between any two hosts will not function until there's a bidirectional flow of traffic between those two hosts.
(i.e:
from A ping B - fail
from B ping A - fail
from A & B simultaneously reciprocate ping -SUCCESS! :)

After (some period, not yet measured, but not very long time) of no traffic between A & B the become unreachable until the process is repeated.

I've written a keepalive unit-pair (dns announcer, and fping+dig script to handle this and keep the network running - but should this be necessary? In most situations, with a loaded network this wouldn't be noticed, but it does present a challenge to bootstrapping the cluster AFAICT.

The cluster is running in AWS VPC, all etcd machines are in the same AZ, same subnet, -- and this is on the current 522.2 beta.

MTU can be larger than 1500, don't hard code that for buffer size

Jumbo frames and virtualization can lead to all kinds of MTU sizes. Don't make any assumptions.

Omitting the scheme from -etcd-endpoints argument results in ambiguous behavior

It appears the scheme is required for an etcd endpoint to be utilized by flannel:

./flanneld -etcd-endpoints=192.168.1.100:1234

E1124 19:34:07.531877 00626 main.go:137] Failed to create SubnetManager: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
E1124 19:34:08.534750 00626 main.go:137] Failed to create SubnetManager: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
E1124 19:34:09.535471 00626 main.go:137] Failed to create SubnetManager: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
^C$

Upon examining strace output for the above no connect is even being attempted, the error should say something about the absent scheme in such a situation then exit immediately.

Implement VXLAN

Take advantage of in-Kernel VXLAN implementation which would reduce the overhead significantly of the UDP tunnel.

Looking at this briefly we would need to do a few things:

Test out that it works: https://www.kernel.org/doc/Documentation/networking/vxlan.txt
Write a Go library to manage it
Ensure we have the latest iproute2 tools

References: http://en.wikipedia.org/wiki/Distributed_Overlay_Virtual_Ethernet#Implementations

Better handle presence of link-local addresses

If an interface has a link-local address in addition to a global address, flannel should pick the global one.

Watch of subnet leases failed

I'm seeing this error all the time.
Logs: https://gist.github.com/gregory90/371481d761b6f042f3c8
I'm running flannel on CoreOS, etcd is reachable.

Is this something I should be worried about?

Configure UDP port through config key

Currently the UDP port default can be changed via --port. It would be better to change that through config JSON stored in etcd. This will make it easier to move the whole cluster to a new port.

flannel is not compatible with iptables < 1.4.11

I've run into at least one place where flannel attempts to use a command/option of iptables that is not supported in older version of iptables (< 1.4.11). Specifically, I tried to start flannel with ip-masq=true and backend as UDP, that's when it failed. It was trying to set iptables rules using the AppendUnique function, which in turn calls Exists.

The Exists check uses the -C flag which isn't available.

This is currently a show stopper for me as I am using CentOS 6.5 and iptables 1.4.7.

Kubernetes ran into this as well.

Handle subnet lease getting expired

Although flannel will start renewing the lease an hour prior to expiration, it could still get lost: e.g. VM getting suspended. Flannel should try to get the same subnet assignment if it's still available but fall back to a new lease and signal the fact.

No interface with given IP found when using -iface flag

Ran into a stumbling block when trying to get rudder to bind to COREOS_PRIVATE_IPV4:

$ source /etc/environment
$ echo $COREOS_PRIVATE_IPV4
172.17.8.100
$ sudo rudder -iface=${COREOS_PRIVATE_IPV4}
E0829 19:21:44.965886 01154 main.go:76] Error looking up interface 172.17.8.100: No interface with given IP found
$ ifconfig
...

enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.8.100  netmask 255.255.255.0  broadcast 172.17.8.255
        inet6 fe80::a00:27ff:fe89:97b9  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:89:97:b9  txqueuelen 1000  (Ethernet)
        RX packets 24833  bytes 3267159 (3.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 26618  bytes 5217752 (4.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Note using -iface=enp0s8 works fine. However, that's not really a portable solution.

flannel and multicast

Does flannel forward multicast packets between peers?

I have started two containers that has software that uses multicast for discovery and the discovery does not work.

Flannel without Kubernetes

Hi,

While I understand that the starting point of Flannel was to address a specific need of how Kubernetes is working, I think it would be great to be able to leverage it with docker directly without using Kubernetes at all. I am not exactly sure this is possible now or if this is something under consideration.

I would also encourage the doc to be more broad and not solely focuses on Kubernetes in the intro as I believe this project has a great potential for docker alone so that each container get their own IP address and all containers can potentially talk to each other no matter on which server they are.

Unable to access container by ip from another host

Steps to reproduce

My cloud-config

#cloud-config
coreos:
  etcd:
    discovery: https://discovery.etcd.io/{here is my discovery id}
    addr: $private_ipv4:4001
    peer-addr: $private_ipv4:7001
  units:
    - name: etcd.service
      command: start
    - name: fleet.service
      command: start

Create droplet in DigitalOcean with enabled private networking (i tried both) providing standard cloud config with hostname first
Install flannel and configure docker manually or using my installer (with one command)
Steps 1-2, but with hostname second
Launch nginx on second host (dokerfile/nginx image for example)
Try to access it by container ip from first host

Details

I have 2 machines on Digital ocean with hostnames "first" and "second" configured to use their private networking addresses in cloud info.
I've installed flannel, configured docker accordingly

docker.service

[Unit]
Description=Docker Application Container Engine 
Documentation=http://docs.docker.io
Requires=docker.socket

[Service]
Environment="TMPDIR=/var/tmp/"
EnvironmentFile=/run/flannel/subnet.env
ExecStartPre=/bin/mount --make-rprivate /
LimitNOFILE=1048576
LimitNPROC=1048576
ExecStart=/usr/bin/docker --bip=${FLANNEL_SUBNET} --mtu=${FLANNEL_MTU} --daemon --storage-driver=btrfs --host=fd://

[Install]
WantedBy=multi-user.target

flannel.service

[Unit]
Requires=etcd.service
After=etcd.service

[Service]
ExecStartPre=-/usr/bin/etcdctl mk /coreos.com/network/config '{"Network":"10.0.0.0/16"}'
ExecStart=/opt/bin/flanneld

I wrote installer for fresh Digital Ocean droplets with CoreOS
https://github.com/ernado/deployer

It just stops docker, downloads flannel, creates flannel service, waits for it env file and restarts docker with updated config.
For example:

deploying flannel on coreos
stoping docker...Warning: Stopping docker.service, but it can still be activated by:
  docker.socket
ok
creating docker service file...ok
downloading flannel...ok
creating docker service file...ok
starting flannel...ok
waiting for flannel subnet file...ok
docker0 already removed
restarting docker...ok
operations complete

So, I've installed flannel and configured docker to use ip addresses from right subnets.

first host:

core@first ~ $ systemctl status flannel
● flannel.service
   Loaded: loaded (/etc/systemd/system/flannel.service; static)
   Active: active (running) since Thu 2014-10-23 15:05:31 UTC; 1h 6min ago
  Process: 690 ExecStartPre=/usr/bin/etcdctl mk /coreos.com/network/config {"Network":"10.0.0.0/16"} (code=exited, status=0/SUCCESS)
 Main PID: 697 (flanneld)
   CGroup: /system.slice/flannel.service
           └─697 /opt/bin/flanneld

Oct 23 15:05:31 first.core.cydev.ru flanneld[697]: I1023 15:05:31.086545 00697 main.go:111] Determining IP address of default interface
Oct 23 15:05:31 first.core.cydev.ru flanneld[697]: I1023 15:05:31.087167 00697 main.go:188] Using 169.254.222.1 as external interface
Oct 23 15:05:31 first.core.cydev.ru flanneld[697]: I1023 15:05:31.087863 00697 subnet.go:298] Picking subnet in range 10.0.1.0 ... 10.0.255.0
Oct 23 15:05:31 first.core.cydev.ru flanneld[697]: I1023 15:05:31.088733 00697 subnet.go:80] Subnet lease acquired: 10.0.43.0/24
Oct 23 15:05:31 first.core.cydev.ru flanneld[697]: I1023 15:05:31.139719 00697 main.go:199] UDP mode initialized
Oct 23 15:05:31 first.core.cydev.ru flanneld[697]: I1023 15:05:31.146842 00697 udp.go:239] Watching for new subnet leases
Oct 23 15:08:41 first.core.cydev.ru flanneld[697]: I1023 15:08:41.097106 00697 udp.go:264] Subnet added: 10.0.13.0/24
Oct 23 15:38:41 first.core.cydev.ru flanneld[697]: W1023 15:38:41.109773 00697 subnet.go:385] Watch of subnet leases failed because etcd index outside history window
Oct 23 16:08:41 first.core.cydev.ru flanneld[697]: W1023 16:08:41.122768 00697 subnet.go:385] Watch of subnet leases failed because etcd index outside history window
Oct 23 16:08:41 first.core.cydev.ru flanneld[697]: I1023 16:08:41.124748 00697 udp.go:275] Subnet removed: 10.0.43.0/24

second host:

core@second ~ $ systemctl status flannel
● flannel.service
   Loaded: loaded (/etc/systemd/system/flannel.service; static)
   Active: active (running) since Thu 2014-10-23 15:08:40 UTC; 1h 3min ago
  Process: 689 ExecStartPre=/usr/bin/etcdctl mk /coreos.com/network/config {"Network":"10.0.0.0/16"} (code=exited, status=4)
 Main PID: 695 (flanneld)
   CGroup: /system.slice/flannel.service
           └─695 /opt/bin/flanneld

Oct 23 15:08:40 second.core.cydev.ru flanneld[695]: I1023 15:08:40.819850 00695 main.go:111] Determining IP address of default interface
Oct 23 15:08:40 second.core.cydev.ru flanneld[695]: I1023 15:08:40.820215 00695 main.go:188] Using 169.254.74.205 as external interface
Oct 23 15:08:40 second.core.cydev.ru flanneld[695]: I1023 15:08:40.823645 00695 subnet.go:298] Picking subnet in range 10.0.1.0 ... 10.0.255.0
Oct 23 15:08:40 second.core.cydev.ru flanneld[695]: I1023 15:08:40.896856 00695 subnet.go:80] Subnet lease acquired: 10.0.13.0/24
Oct 23 15:08:40 second.core.cydev.ru flanneld[695]: I1023 15:08:40.934942 00695 main.go:199] UDP mode initialized
Oct 23 15:08:40 second.core.cydev.ru flanneld[695]: I1023 15:08:40.936529 00695 udp.go:239] Watching for new subnet leases
Oct 23 15:08:40 second.core.cydev.ru flanneld[695]: I1023 15:08:40.937699 00695 udp.go:264] Subnet added: 10.0.43.0/24
Oct 23 15:38:41 second.core.cydev.ru flanneld[695]: W1023 15:38:41.152606 00695 subnet.go:385] Watch of subnet leases failed because etcd index outside history window
Oct 23 16:08:41 second.core.cydev.ru flanneld[695]: W1023 16:08:41.166824 00695 subnet.go:385] Watch of subnet leases failed because etcd index outside history window
Oct 23 16:08:41 second.core.cydev.ru flanneld[695]: I1023 16:08:41.168734 00695 udp.go:275] Subnet removed: 10.0.13.0/24

Then on second host:

core@second ~ $ docker run -d --name nginx -p 80:80 dockerfile/nginx
d7dfadef8698c8dae5b858ab38ace6e3ec8cbcbfc90aacfdfab682836d7024b9
core@second ~ $ docker inspect nginx | grep IPA
        "IPAddress": "10.0.13.3",
core@second ~ $ curl 10.0.13.3:80
<!DOCTYPE html>
<html>
<head>
# (output ommited)

core@second ~ $ ping 10.0.13.3
PING 10.0.13.3 (10.0.13.3) 56(84) bytes of data.
64 bytes from 10.0.13.3: icmp_seq=1 ttl=64 time=0.050 ms

#10.132.202.49 - private ip of second host
core@second ~ $ curl 10.132.202.49:80
<!DOCTYPE html>
<html>
<head>
# ...

On first host:

core@first ~ $ curl 10.0.13.3:80
# is timing out
core@first ~ $ ping 10.0.13.3   
core@first ~ $ ping 10.0.13.3   
PING 10.0.13.3 (10.0.13.3) 56(84) bytes of data.
^C
--- 10.0.13.3 ping statistics ---
83 packets transmitted, 0 received, 100% packet loss, time 82019ms

# directly with private ip
core@first ~ $ curl 10.132.202.49:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome

Have no idea what is wrong.

ifconfig for first

core@first ~ $ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.0.43.1  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 56:84:7a:fe:97:99  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 169.254.222.1  netmask 255.255.0.0  broadcast 169.254.255.255
        inet6 fe80::601:2dff:fe15:6801  prefixlen 64  scopeid 0x20<link>
        ether 04:01:2d:15:68:01  txqueuelen 1000  (Ethernet)
        RX packets 202760  bytes 42332132 (40.3 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 201553  bytes 32433943 (30.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.132.202.54  netmask 255.255.0.0  broadcast 10.132.255.255
        inet6 fe80::601:2dff:fe15:6802  prefixlen 64  scopeid 0x20<link>
        ether 04:01:2d:15:68:02  txqueuelen 1000  (Ethernet)
        RX packets 20082  bytes 2759448 (2.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20085  bytes 3884645 (3.7 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel0: flags=81<UP,POINTOPOINT,RUNNING>  mtu 1472
        inet 10.0.43.0  netmask 255.255.0.0  destination 10.0.43.0
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
        RX packets 84  bytes 4704 (4.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 180  bytes 34380 (33.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 74552  bytes 14546484 (13.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 74552  bytes 14546484 (13.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

second

core@second ~ $ ifconfig
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1472
        inet 10.0.13.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::1022:efff:fe07:21a  prefixlen 64  scopeid 0x20<link>
        ether 56:84:7a:fe:97:99  txqueuelen 0  (Ethernet)
        RX packets 42  bytes 5772 (5.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 35  bytes 2574 (2.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 169.254.74.205  netmask 255.255.0.0  broadcast 169.254.255.255
        inet6 fe80::601:2dff:fe15:8301  prefixlen 64  scopeid 0x20<link>
        ether 04:01:2d:15:83:01  txqueuelen 1000  (Ethernet)
        RX packets 223266  bytes 211758399 (201.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 212482  bytes 28694791 (27.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.132.202.49  netmask 255.255.0.0  broadcast 10.132.255.255
        inet6 fe80::601:2dff:fe15:8302  prefixlen 64  scopeid 0x20<link>
        ether 04:01:2d:15:83:02  txqueuelen 1000  (Ethernet)
        RX packets 20188  bytes 3904516 (3.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20209  bytes 2777168 (2.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel0: flags=81<UP,POINTOPOINT,RUNNING>  mtu 1472
        inet 10.0.13.0  netmask 255.255.0.0  destination 10.0.13.0
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
        RX packets 82  bytes 4592 (4.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 82  bytes 26076 (25.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 39243  bytes 6578308 (6.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 39243  bytes 6578308 (6.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth30eb: flags=67<UP,BROADCAST,RUNNING>  mtu 1472
        inet6 fe80::9883:e2ff:fe1e:11d  prefixlen 64  scopeid 0x20<link>
        ether 9a:83:e2:1e:01:1d  txqueuelen 1000  (Ethernet)
        RX packets 34  bytes 5712 (5.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 41  bytes 3054 (2.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I can provide ssh access for somebody from flannel team, these machines are only for test. But I think you can reproduce this case on DigitalOcean easily.

Implement IPSEC mode

For users that require network level encryption, flannel should be able to use IPSEC as the encapsulation protocol.

flannel should ship a binary release hosted on GitHub

Is it to early to start cutting binary releases of Rudder?

Handle shutdown cleanly

What would be Flannel commands in iptools terms?

I'm having a bit of difficulty understanding how Flannel sets up the network when the VXLAN backend is used. Would you mind replicating the steps using iptools commands?

If docker is started before and not after, our route does not get added

Create a subnet allocator mode

For providers where the host can just use as many IPs as it wants, Kolach should be able to function as a DHCP server of sorts, handing out entire subnets to hosts for use with Docker or k8s.

investigate using GUE

torvalds/linux@6106253

This will appear in Linux 3.18 and would be way easier to implement when compared to VXLAN.

etcd closes a watch after 5 mins causing benign error messages

When etcd times out a connection, it returns a partial chunked response that is fed to json parser and generated a syntax error.

Error decoding subnet lease JSON (vxlan)

I am using the vxlan backend an I am seeing the following in my logs:

Dec 07 14:04:12 ip-10-0-0-130.ec2.internal flanneld[609]: E1207 14:04:12.054686 00609 vxlan.go:217] Error decoding subnet lease JSON: json: cannot unmarshal string into Go value of type vxlan.vxlanLeaseAttrs

As a result of that no routes to other hosts are added. Looking at etcd I see that some of the backend data are base64 encoded and others aren't:

{
  "key": "/coreos.com/network/subnets/10.10.43.0-24",
  "value": "{\"PublicIP\":\"10.0.0.130\",\"BackendType\":\"vxlan\",\"BackendData\":{\"VtepMAC\":\"2e:67:19:06:da:c0\"}}",
  "expiration": "2014-12-08T13:59:12.034566916Z",
  "ttl": 81296,
  "modifiedIndex": 19911696,
  "createdIndex": 19911696
},
{
  "key": "/coreos.com/network/subnets/10.10.33.0-24",
  "value": "{\"PublicIP\":\"10.0.0.129\",\"BackendType\":\"vxlan\",\"BackendData\":\"eyJWdGVwTUFDIjoiZmU6MDM6M2M6YzA6ZjU6YTIifQ==\"}",
  "expiration": "2014-12-08T01:41:57.69184428Z",
  "ttl": 37062,
  "modifiedIndex": 19361855,
  "createdIndex": 19361855
}

[question] why flannel0 is UNKNOWN state in Centos7 after starting flannel service

follow flannel README to install Centos7, flannel0 is always UNKNOWN state and containers can't communicate with multiple hosts. what reasons can cause this?

flannel0: <POINTOPOINT,UP,LOWER_UP> mtu 1472 qdisc pfifo_fast state UNKNOWN qlen 500
link/none
inet 10.100.77.0/16 scope global flannel0
valid_lft forever preferred_lft forever

Document rudder -> flannel name change in readme

The old rudder repo now redirects to flannel, but I didn't see anything explaining / documenting the change... a bit confusing. A sentence or two on this could help clarify things.

Add support for env variables to configure flannel

Things like etcd-endpoint and etcd-prefix should be configuration via env vars.

Finer control over IP address assignment

Is there a plan to add more fine grained control over IP address assignment of containers? This is related to vxlan where two containers could have the same IP address but different vxlan ids.

Possible conflict of subnets when using the "public" ip as its identity

When trying out flannel with vagrant configured for two coreos instances, it did not work out of the box.
The reason was that both instances had "10.0.2.15" configured and flannel used it as its identity / public ip. Since it also uses that information as the identity when defining and re-using the allocated subnets, both instances was trying to use the same range.

I resolved it easily with -iface=$public_ipv4 but maybe the ip discovery logic should be the same as how $public_ipv4 gets set by default? Additionally maybe it should not use this ip as the mechanism to identify the subnets assigned to itself?

Use modifiedIndex and not EtcdIndex when watching

EtcdIndex is the current index and maybe ahead when we're processing a watch response. modifiedIndex contains the index of the event that the watch returned. Increment that by one when calling watch again.

Support for SSL client certificates

etcd supports ssl certificates to authenticate clients. It would be nice if flannel supported this as well.

Add C accelerated proxy loop

Should also sends back ICMP Destination Net Unreachable when no route is found.

Document UDP port usage

Rudder uses port 8285 by default. That needs to be put into docs to remind users to open it up in the firewall.

All hosts get same subnet and do not recognize the other hosts

I trying out flannel using cores-vagrant. I start up 3 hosts and install the flannel binary in each of them. I add the network configuration using etcdctl and then start flannel on each host using

sudo /opt/bin/flanneld &

and on all hosts I get the same output
Installing signal handlers
I0926 06:56:56.933827 00733 main.go:89] Determining IP address of default interface
I0926 06:56:56.934355 00733 main.go:164] Using 10.0.2.15 as external interface
I0926 06:56:56.997207 00733 subnet.go:80] Subnet lease acquired: 10.0.78.0/24
I0926 06:56:57.001976 00733 main.go:175] UDP mode initialized
I0926 06:56:57.002595 00733 udp.go:238] Watching for new subnet leases

coreos-vagrant creates two network adapters on each host : one nat and one host only

Could my problem be that flannel selects the NAT adapter (10.0.2.15)?

Error output is incorrectly formatted

When flannel fails to create the backend, the error output message is incorrectly formatted.

Has Flannel/VXLAN been tested in AWS EC2?

I'm asking because I can't seem to be able to create vxlan devices using iproute2.

Build and run on OSX, join docker swarm in coreos under vagrant

Hi,

Is it possible to build and run flannel under osx? If yes, then hopefully it can join the mesh network of docker containers (running in CoreOS under vagrant in my case) so that the OSX host can freely reach any docker container. This would be awesome

write a prototype :)

How to update flannel in CoreOS?

Currently i have installer in golang, that downloads last version of flannel from my CDN, adds flannel via systemd, configures Docker and restarts it.

How flannel is supposed to be updated in CoreOS cluster? It cannot run in container, because Docker relies on environmental file that flannel generates.
Via CoreUpdate?

Setup IP Masquerade in VXLAN case

Right now VXLAN does not honor ipMasq parameter. It needs to or that code needs to move out of the backends.

Flannel stops working on EC2

First of all, sorry for the high-level error report, but I'm not sure what is going wrong or how to best debug it. I am running a ~8 node CoreOS cluster on EC2, stable channel. My etcd is running on a separate node. Using flannel from the current master branch built within a container.

Everything works fine for about 12 hours but then the routing stops working and packets are being dropped. This is reproducible and happens every time I re-provision my cluster (restart flannel and docker). I was initially running the UDP backend and I didn't find any error messages in the journal logs. I then switched to the vxlan backend and I am seeing the following:

Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.506152 17341 vxlan.go:264] L3 miss: 10.10.109.8
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.506189 17341 device.go:228] calling NeighSet: 10.10.109.8, 12:7d:bc:75:19:1f
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.506235 17341 vxlan.go:275] AddL3 succeeded
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.507063 17341 vxlan.go:248] L2 miss: 12:7d:bc:75:19:1f
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.507078 17341 device.go:202] calling NeighAdd: 10.0.0.129, 12:7d:bc:75:19:1f
Nov 24 11:07:33 kalm flanneld[17341]: I1124 11:07:33.507115 17341 vxlan.go:259] AddL2 succeeded
Nov 24 11:08:01 kalm flanneld[17341]: I1124 11:08:01.843236 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8
Nov 24 11:08:12 kalm flanneld[17341]: I1124 11:08:12.576140 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8
Nov 24 11:08:13 kalm flanneld[17341]: I1124 11:08:13.578175 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8
Nov 24 11:08:14 kalm flanneld[17341]: I1124 11:08:14.580082 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8
Nov 24 11:08:36 kalm flanneld[17341]: I1124 11:08:36.658179 17341 vxlan.go:264] L3 miss: 10.10.109.8
Nov 24 11:08:36 kalm flanneld[17341]: I1124 11:08:36.658219 17341 device.go:228] calling NeighSet: 10.10.109.8, 12:7d:bc:75:19:1f
Nov 24 11:08:36 kalm flanneld[17341]: I1124 11:08:36.658292 17341 vxlan.go:275] AddL3 succeeded
Nov 24 11:09:04 kalm flanneld[17341]: I1124 11:09:04.946135 17341 vxlan.go:243] Ignoring not a miss: 12:7d:bc:75:19:1f, 10.10.109.8

Any ideas what could be causing this? Is it perhaps something specific to EC2?

Process EINTR in proxy.c

Receiving signals may cause the proxy to misbehave.

setupIpMasq fails if FLANNEL chain does not already exist

https://github.com/coreos/flannel/blob/master/backend/udp/udp.go#L209

If the FLANNEL chain does not already exist (e.g. on a fresh CoreOS VM) L209 of setupIpMasq function will fail as it cannot clear the chain.

Current master HEAD built flannel does not work on CentOS 6.5

Hi there!

I've been trying to get flannel working on CentOS 6.5. If I run the prebuilt binary from here:

http://storage.googleapis.com/flannel/flanneld

it works. If I build flannel from source it does not.

Routes outside Flannel Networks and NAT'ing

Hi, This is a request-for-feature issue.

I want to run a container where there is vxLAN backed overlay. But for destinations outside the overlay, I want to NAT the container out, eg to the public internet.

I see three ways to do go about this:

Second Network Interface: Use Flannel for the overlay, but then create a second netdev that is more similar to the default Docker NAT for the container. Then traffic to the overlay still goes via flannel, and a default route on the second netdev handles internet traffic.
Flannel setup NATs: A configuration option to Flannel of some kind, add a default routes for outside the overlay, and when a exiting the overlay, have flannel configure a NAT.
Create vxLAN gateways: Make a container or machine with a flannel interface and another netdev to NAT to (or something like that). Configure default route on other containers to go via this NAT instance.

Thoughts? Is this in scope or out of scope for flannel?

OpenVSwitch integration through a plugin

Would the maintainers consider having OVS as an option for the tunnelling functionality?
If a plugin API were implemented, then the underlying tunnelling could be replaced by whatever users see fit.

Firewall support

Are there any plans to support firewalls in flannel? Or is this something that should be built on top on flannel as a separate component?

I'm thinking of something akin to GCE Firewalls where you can control traffic via source and target tags. This would work very well with Kubernetes and allow applications to be better isolated from one another.

See discussion here: kubernetes/kubernetes#2880