submariner-io / submariner Goto Github PK

View Code? Open in Web Editor NEW

2.4K 57.0 188.0 9.75 MB

Networking component for interconnecting Pods and Services across Kubernetes clusters.

Home Page: https://submariner.io

License: Apache License 2.0

Makefile 0.36% Go 98.65% Shell 0.99%

submariner's Introduction

Submariner

Architecture
- Network Path
Prerequisites
Installation
Building and Testing
Known Issues
Contributing

Submariner is a tool built to connect overlay networks of different Kubernetes clusters. Submariner is designed to be network plugin (CNI) agnostic and supports both encrypted and non-encrypted tunnels between the connected clusters.

Note that Submariner is in an early stage, and while we welcome usage and experimentation, it is quite possible that you could run into bugs.

Submariner is a Cloud Native Computing Foundation sandbox project.

Architecture

See the Architecture section of Submariner's website.

Network Path

The network path of Submariner varies depending on the origin/destination of the IP traffic. In all cases, traffic between two clusters will transit between the leader elected (in each cluster) gateway nodes, through the configured cable driver.

When the source Pod is on a worker node that is not the elected gateway node, the traffic destined for the remote cluster will transit through the submariner VXLAN tunnel (vx-submariner) to the local cluster gateway node. On the gateway node, traffic is forwarded to the remote cluster over the configured tunnel. Once the traffic reaches the destination gateway node, it is routed in one of two ways, depending on the destination CIDR. If the destination CIDR is a Pod network, the traffic is routed via CNI-programmed network. If the destination CIDR is a Service network, then traffic is routed through the facility configured via kube-proxy on the destination gateway node.

Prerequisites

See the Prerequisites docs on Submariner's website.

Installation

Submariner is always deployed using a Go-based Kubernetes custom controller, called an Operator, that provides API-based installation and management. Deployment tools like the subctl command line utility and Helm charts wrap the Operator. The recommended deployment method is subctl, as it is currently the default in CI and provides diagnostic features.

See the Deplyment docs on Submariner's website.

Installation using subctl

Submariner provides the subctl CLI utility to simplify the deployment and maintenance of Submariner across your clusters.

See the subctl Deployment docs on Submariner's website.

Installation using Helm

See the Helm Deployment docs on Submariner's website.

Validate Submariner is Working

See the subctl verify docs and Automated Troubleshooting docs on Submariner's website.

Building and Testing

See the Building and Testing docs on Submariner's website.

Known Issues

See the Known Issues docs on Submariner's website.

Contributing

See the Development section of Submariner's website.

submariner's People

Contributors

Stargazers

Watchers

Forkers

cimomo batermj isgasho cormite ddcprg tomzhang influx6 gatarelib hhy5277 dohuuviet zorrock awesome-archive alilosoft mbrukman shaunstanislauslau etsangsplk bruno580 skymysky guillermo-menjivar ajesse11x liyimeng acloudiator negashev anonymuse km4rcus hien s-you larryck tpantelis skitt livnats sridhargaddam aswinsuryan junneyang cwilkers mangelajo mkolesnik engin manosnoam evanspjz ringtail dfarrell07 google38438 mpeterson drcwr vthapar fredide trendingtechnology huangweiboy mudaiheem cyriltovena ming-ddtechcg limx59 kennylowe hunchback qadirluo mattmattox macduff23 roytman amasser mkimuram eshnil2000 astrisk saurabhtandon13 deanlorenz xiaoluhong robertdigital seokho-son stevemattar maayanf24 gampel nyechiel lorkokrisztian bsmr zhilong-rancher shuinoo dragonstuff devopstoday11 omerbenhayun lilitao salmon5 solomon-leiho lgs yongbig pinikomarov seantbooker hchenxa withlin th3architect jacky68147527 jaanki kevinanderson1 leogsilva morvencao yeonwook1993 doytsujin gaozhengwei zafisher nathanawmk menglingwei

submariner's Issues

Re setup submariner at old vm found error

We re setup submariner agent at node, found:

 error while adding route {Ifindex: 2 Dst: 10.63.0.0/16 Src: <nil> Gw: 172.31.23.192 Flags: [] Table: 0}: file exists

Maybe has a some route

submariner endpoint loss connection

I setup a three DC submariner cluster before 9 days.

Today, I found 3 DC loss Pod-to-Pod connection.

At broker

kubectl logs -f submariner-84c5b445db-4wp2f -n submariner

I0731 14:35:54.502386       1 leaderelection.go:249] failed to renew lease submariner/submariner-engine-lock: failed to tryAcquireOrRenew context deadline exceeded
F0731 14:35:54.502446       1 main.go:178] leaderelection lost
goroutine 1 [running]:
github.com/rancher/submariner/vendor/k8s.io/klog.stacks(0xc000266500, 0xc00036c1c0, 0x3f, 0x1ae)
	/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:828 +0xd4
github.com/rancher/submariner/vendor/k8s.io/klog.(*loggingT).output(0x1df5a60, 0xc000000003, 0xc00062e000, 0x1d80e12, 0x7, 0xb2, 0x0)
	/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:779 +0x306
github.com/rancher/submariner/vendor/k8s.io/klog.(*loggingT).printf(0x1df5a60, 0x3, 0x125080b, 0x13, 0x0, 0x0, 0x0)
	/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:678 +0x14b
github.com/rancher/submariner/vendor/k8s.io/klog.Fatalf(0x125080b, 0x13, 0x0, 0x0, 0x0)
	/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:1207 +0x67
main.startLeaderElection.func1()
	/go/src/github.com/rancher/submariner/main.go:178 +0x47
github.com/rancher/submariner/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1(0xc0003a0000)
	/go/src/github.com/rancher/submariner/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:163 +0x40
github.com/rancher/submariner/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0003a0000, 0x13ccfc0, 0xc000246cc0)
	/go/src/github.com/rancher/submariner/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:172 +0x112
github.com/rancher/submariner/vendor/k8s.io/client-go/tools/leaderelection.RunOrDie(0x13cd000, 0xc000042048, 0x13d24e0, 0xc000350480, 0x37e11d600, 0x2540be400, 0xb2d05e00, 0xc0000f9bd0, 0x12c7660, 0x0, ...)
	/go/src/github.com/rancher/submariner/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:184 +0x99
main.startLeaderElection(0x13eee60, 0xc000350360, 0x13cd500, 0xc0002db4c0, 0xc0000f9bd0)
	/go/src/github.com/rancher/submariner/main.go:170 +0x27c
main.main()
	/go/src/github.com/rancher/submariner/main.go:147 +0x580
I0731 14:35:55.601931       1 main.go:142] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"submariner", Name:"submariner-engine-lock", UID:"da65ab31-ad2d-11e9-bff9-02c47c506f46", APIVersion:"v1", ResourceVersion:"1221170", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-172-31-23-192.cn-north-1.compute.internal-submariner-engine stopped leading

And other DC log is :

kubectl logs submariner-84f57c45ff-wszbj -n submariner

07[NET] sending packet: from 172.16.152.62[500] to 52.83.57.176[500] (500 bytes)
I0802 04:05:51.256740       1 reflector.go:215] github.com/rancher/submariner/pkg/client/informers/externalversions/factory.go:117: forcing resync
I0802 04:05:51.258058       1 reflector.go:215] github.com/rancher/submariner/pkg/client/informers/externalversions/factory.go:117: forcing resync
I0802 04:05:51.261961       1 datastoresyncer.go:194] Attempting to trigger an update of the central datastore with the updated CRD
I0802 04:05:51.263240       1 tunnel.go:87] Processing endpoint object: submariner/nx-submariner-cable-nx-172-31-44-229
I0802 04:05:51.266810       1 datastoresyncer.go:236] The updated endpoint object was not for this cluster, skipping updating the datastore
I0802 04:05:51.266813       1 ipsec.go:169] Installing cable submariner-cable-nx-172-31-44-229
I0802 04:05:51.267849       1 ipsec.go:453] Found existing connection: submariner-cable-nx-172-31-44-229
I0802 04:05:51.267890       1 tunnel.go:109] endpoint processed by tunnel controller
I0802 04:05:51.267901       1 tunnel.go:87] Processing endpoint object: submariner/bj-submariner-cable-bj-172-31-23-192
I0802 04:05:51.268940       1 datastoresyncer.go:236] The updated endpoint object was not for this cluster, skipping updating the datastore
I0802 04:05:51.269919       1 ipsec.go:169] Installing cable submariner-cable-bj-172-31-23-192
I0802 04:05:51.270953       1 ipsec.go:453] Found existing connection: submariner-cable-bj-172-31-23-192
I0802 04:05:51.270990       1 tunnel.go:109] endpoint processed by tunnel controller
I0802 04:05:51.271000       1 tunnel.go:87] Processing endpoint object: submariner/hz-submariner-cable-hz-172-16-152-62
I0802 04:05:51.271023       1 datastoresyncer.go:249] Attempting to trigger an update of the central datastore with the updated endpoint CRD
I0802 04:05:51.273005       1 ipsec.go:162] Not installing cable for local cluster
I0802 04:05:51.273016       1 tunnel.go:109] endpoint processed by tunnel controller
I0802 04:05:51.294774       1 kubernetes.go:331] Cluster CRD matched what we received from k8s broker, not reconciling
I0802 04:05:51.294787       1 datastoresyncer.go:196] Update of cluster in central datastore was successful
I0802 04:05:51.303146       1 kubernetes.go:370] Endpoint CRD matched what we received from k8s broker, not reconciling
I0802 04:05:51.303157       1 datastoresyncer.go:254] Update of endpoint in central datastore was successful
I0802 04:05:51.327783       1 reflector.go:357] github.com/rancher/submariner/pkg/client/informers/externalversions/factory.go:117: Watch close - *v1.Cluster total 0 items received
I0802 04:05:53.777818       1 reflector.go:215] github.com/rancher/submariner/pkg/client/informers/externalversions/factory.go:117: forcing resync
I0802 04:05:53.780462       1 datastoresyncer.go:300] Cluster CRD matched what we received from datastore, not reconciling
I0802 04:05:53.782590       1 datastoresyncer.go:300] Cluster CRD matched what we received from datastore, not reconciling
I0802 04:05:53.784691       1 datastoresyncer.go:300] Cluster CRD matched what we received from datastore, not reconciling
I0802 04:05:53.792637       1 reflector.go:215] github.com/rancher/submariner/pkg/client/informers/externalversions/factory.go:117: forcing resync
I0802 04:05:53.794852       1 datastoresyncer.go:415] Endpoint CRD matched what we received from datastore, not reconciling
I0802 04:05:53.796762       1 datastoresyncer.go:415] Endpoint CRD matched what we received from datastore, not reconciling
I0802 04:05:53.799036       1 datastoresyncer.go:415] Endpoint CRD matched what we received from datastore, not reconciling

Thanks, buhe@netless

Routeagent found error ,and the at this node not access cross cluster pod

hi ,

Routeagent found error ,and the at this node not access cross cluster pod , but another node can access

E0818 12:41:46.281467       1 route.go:385] error while adding route {Ifindex: 2 Dst: 10.72.0.0/16 Src: <nil> Gw: 172.31.2.255 Flags: [] Table: 0}: network is unreachable
│ E0818 12:41:46.281521       1 route.go:385] error while adding route {Ifindex: 2 Dst: 10.73.0.0/16 Src: <nil> Gw: 172.31.2.255 Flags: [] Table: 0}: network is unreachable

thanks

buhe

Submariner in different AZs?

We tried running submariner in our cluster which spans three different AZs on AWS. It looks like the route agents fail to add the route to the destination cluster CIDRs when using as a default gateway a local node that sits in a different AZ. This is to be expected since you cannot route from one AZ via another in AWS since they are not directly attached. How to solve this?

E2E tests errors should print details of the cluster and node

For example, as seen in #267 - a test has failed, but it's unknown on which cluster the connection to the pod timed out:

[redundancy] Gateway fail-over tests
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:13
  when two gateway nodes are configured with one submariner engine replica and the gateway node fails
  /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:22
    should start a new submariner engine pod on the second gateway node and be able to connect from another cluster [It]
    /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/redundancy/gateway_failover.go:23

    Failed to find pods for app submariner-engine. Actual pod count 0 does not match the expected pod count 1
    Unexpected error:
        <*errors.errorString | 0xc00024eb20>: {
            s: "timed out waiting for the condition",
        }
        timed out waiting for the condition
    occurred

The error should rather print:
"Failed to find pods for app submariner-engine on cluster [name] - node [name]"

make e2e / ci section should be noted to be used only within KIND

In Readme - make e2e section should be noted to be used only for KIND tests (local or CI), but not on AWS (or other clouds), as it will corrupt the AWS cluster.

1:1 NAT

Hi,

I have a question about the 1:1 NAT requirement. From the manual:

"Direct IP connectivity between instances through the internet (or on the same network if not running Submariner over the internet). Submariner supports 1:1 NAT setups, but has a few caveats/provider specific configuration instructions in this configuration."

Does submariner work in the following scenario?

broker cluster:

gateway/master with public address
private addresses otherwise

cluster 1:

gateway/master with public address
private addresses otherwise

cluster 2:

private addresses

E2E on AWS + OSP: Timeouts in 6 out of 12 tests

After setting up Submariner on AWS cluster and OSP Cluster (Upshift, behind NAT), the connection was tested, and is working:

[submariner-operator]$ KUBECONFIG=/home/nmanos/automation/ocp-install/nmanos-cluster-a/auth/kubeconfig oc exec netshoot-785ffd8c8-42bxq -- curl --output /dev/null -m 30 --progress-bar --verbose --head --fail 100.96.72.226
*   Trying 100.96.72.226:80...
* TCP_NODELAY set
* Connected to 100.96.72.226 (100.96.72.226) port 80 (#0)
> HEAD / HTTP/1.1
> Host: 100.96.72.226
> User-Agent: curl/7.65.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.17.6
< Date: Mon, 23 Dec 2019 09:58:57 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 19 Nov 2019 15:14:41 GMT
< Connection: keep-alive
< ETag: "5dd406e1-264"
< Accept-Ranges: bytes

However, running the E2E tests, has failed on timeouts in 6 out of 12 tests:

Summarizing 6 Failures:

[Fail] [redundancy] Gateway fail-over tests when two gateway nodes are configured with one submariner engine replica and the gateway node fails [It] should start a new submariner engine pod on the second gateway node and be able to connect from another cluster
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/framework/framework.go:327

[Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is on a gateway [It] should have sent the expected data from the pod to the other pod
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/framework/framework.go:327

[Fail] [redundancy] Gateway fail-over tests when one gateway node is configured and the submariner engine pod fails [It] should start a new submariner engine pod and be able to connect from another cluster
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/framework/framework.go:327

[Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is on a gateway and the remote pod is on a gateway [It] should have sent the expected data from the pod to the other pod
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/framework/framework.go:327

[Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is on a gateway and the remote service is on a gateway [It] should have sent the expected data from the pod to the other pod
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/framework/framework.go:327

[Fail] [dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is not on a gateway and the remote service is on a gateway [It] should have sent the expected data from the pod to the other pod
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/framework/framework.go:327

Ran 12 of 13 Specs in 495.063 seconds
FAIL! -- 6 Passed | 6 Failed | 1 Pending | 0 Skipped

Full deployment + E2E output

Remove hard coded submariner namespace

Can we remove hard coded namespace?
https://github.com/rancher/submariner/blob/4153f762a21ea9b554166ad424faaae4162cd769/main.go#L160

prep_for_subm.sh to allow user input of security ports

(mangelajo note: we could even look into doing this from the operator if possible -if we can get the cloud credentials from openshift., otherwise at least subctl)
Currently the SubM ports are hard coded (4500 and 500):

submariner/tools/openshift/ocp-ipi-aws/ocp-ipi-aws-prep/ec2-resources.tf

Line 33 in 02e8a4f

from_port = 4500

submariner/tools/openshift/ocp-ipi-aws/ocp-ipi-aws-prep/ec2-resources.tf

Line 40 in 02e8a4f

from_port = 500

"Error while adding route" when trying to setup Submariner

Hi,

Following is the setup with which I am trying submariner:

One 3-node cluster on AWS.
Two 3-node bare-metal clusters which are reachable on the internet.

So, the thing is:

I tried to setup submariner by considering one bare-metal cluster as broker. It worked. But, later on, I wanted to change the instance types of the worker nodes on AWS.
I changed the instance type of worker nodes via Rancher GUI and new workers were added.

After that, both the submariner route pods (on both the clusters) are showing this error:

E0507 12:02:19.649758       1 route.go:385] error while adding route {Ifindex: 2 Dst: 10.45.0.0/16 Src: <nil> Gw: XX.XXX.XXX.XXX Flags: [] Table: 0}: file exists

So, I deleted the complete submariner installation on all clusters and retried it. But, I still see the same error.

Where are these thing stored? How do I make Submariner work again?

UPDATE:
I tried the whole sequence one more time by deleting helm releases and recreating everything. It still shows the same error.

I even recreated the clusters with different CIDR's and they all still show the same error message. Only the CIDR values get changed.

E0507 14:09:19.179172       1 route.go:385] error while adding route {Ifindex: 2 Dst: 10.53.0.0/16 Src: <nil> Gw: 192.168.0.189 Flags: [] Table: 0}: file exists
E0507 14:09:19.179210       1 route.go:385] error while adding route {Ifindex: 2 Dst: 10.54.0.0/16 Src: <nil> Gw: 192.168.0.189 Flags: [] Table: 0}: file exists

The broker should detect overlapping cluster/service CIDRs

Submariner does not function with overlapping cluster/service CIDRs.

Question on Prerequisites

As part of the Prerequisites documentation its mentioned that

Submariner has a few requirements in order to get started:

At least 3 Kubernetes clusters

We have a two kubernetes clusters and we want to extent the networking across the clusters. So can anybody confirm the requirement of 3 clusters is a hard requirement or it can work with 2 clusters also?

[Doc] Add instructions to run E2E on external clusters (not in KIND)

When having an existing Submariner deployment on external clusters, we need to have clear instructions how to run E2E tests.

For example, having Cluster A on AWS:
alias kubconf_a='KUBECONFIG=/home/nmanos/automation/ocp-install/nmanos-cluster-a/auth/kubeconfig'
And Cluster B on Upshift:
alias kubconf_b='KUBECONFIG=/home/nmanos/automation/ocp-install/ocpup/.config/cl2/auth/kubeconfig

Currently this is my attempt to run e2e against those clusters (it has failed):
$ kubconf_a go test -v ./test/e2e -args -dp-context admin -dp-context admin -ginkgo.v -ginkgo.randomizeAllSpecs

=== RUN   TestE2E
Running Suite: Submariner E2E suite
===================================
Random Seed: 1575624426 - Will randomize all specs
Will run 11 of 11 specs

[dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote pod when the pod is not on a gateway and the remote pod is on a gateway
  should have sent the expected data from the pod to the other pod
  /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:17
STEP: Creating kubernetes clients
STEP: Building namespace api objects, basename dataplane-conn-nd
STEP: Creating a namespace e2e-tests-dataplane-conn-nd-vh8s2 to execute the test in

• Failure in Spec Setup (BeforeEach) [0.895 seconds]
[dataplane] Basic TCP connectivity tests across clusters without discovery
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:12
  when a pod connects via TCP to a remote pod [BeforeEach]
  /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:22
    when the pod is not on a gateway and the remote pod is on a gateway
    /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:31
      should have sent the expected data from the pod to the other pod
      /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:17

      Error creating namespace &Namespace{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:e2e-tests-dataplane-conn-nd-vh8s2,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{e2e-framework: dataplane-conn-nd,},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:NamespaceSpec{Finalizers:[],},Status:NamespaceStatus{Phase:,},}
      Unexpected error:
          <*errors.StatusError | 0xc00022a360>: {
              ErrStatus: {
                  TypeMeta: {Kind: "", APIVersion: ""},
                  ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
                  Status: "Failure",
                  Message: "namespaces \"e2e-tests-dataplane-conn-nd-vh8s2\" already exists",
                  Reason: "AlreadyExists",
                  Details: {
                      Name: "e2e-tests-dataplane-conn-nd-vh8s2",
                      Group: "",
                      Kind: "namespaces",
                      UID: "",
                      Causes: nil,
                      RetryAfterSeconds: 0,
                  },
                  Code: 409,
              },
          }
          namespaces "e2e-tests-dataplane-conn-nd-vh8s2" already exists
      occurred

      /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/framework/framework.go:295
------------------------------
[dataplane] Basic TCP connectivity tests across clusters without discovery when a pod connects via TCP to a remote service when the pod is on a gateway and the remote service is not on a gateway
  should have sent the expected data from the pod to the other pod
  /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:17
STEP: Creating kubernetes clients
STEP: Building namespace api objects, basename dataplane-conn-nd
STEP: Creating a namespace e2e-tests-dataplane-conn-nd-h9lk8 to execute the test in

• Failure in Spec Setup (BeforeEach) [0.361 seconds]
[dataplane] Basic TCP connectivity tests across clusters without discovery
/home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:12
  when a pod connects via TCP to a remote service [BeforeEach]
  /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:44
    when the pod is on a gateway and the remote service is not on a gateway
    /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:57
      should have sent the expected data from the pod to the other pod
      /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/dataplane/tcp_pod_connectivity.go:17

      Error creating namespace &Namespace{ObjectMeta:k8s_io_apimachinery_pkg_apis_meta_v1.ObjectMeta{Name:e2e-tests-dataplane-conn-nd-h9lk8,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{e2e-framework: dataplane-conn-nd,},Annotations:map[string]string{},OwnerReferences:[],Finalizers:[],ClusterName:,Initializers:nil,},Spec:NamespaceSpec{Finalizers:[],},Status:NamespaceStatus{Phase:,},}
      Unexpected error:
          <*errors.StatusError | 0xc000115050>: {
              ErrStatus: {
                  TypeMeta: {Kind: "", APIVersion: ""},
                  ListMeta: {SelfLink: "", ResourceVersion: "", Continue: ""},
                  Status: "Failure",
                  Message: "namespaces \"e2e-tests-dataplane-conn-nd-h9lk8\" already exists",
                  Reason: "AlreadyExists",
                  Details: {
                      Name: "e2e-tests-dataplane-conn-nd-h9lk8",
                      Group: "",
                      Kind: "namespaces",
                      UID: "",
                      Causes: nil,
                      RetryAfterSeconds: 0,
                  },
                  Code: 409,
              },
          }
          namespaces "e2e-tests-dataplane-conn-nd-h9lk8" already exists
      occurred

      /home/nmanos/go/src/github.com/submariner-io/submariner/test/e2e/framework/framework.go:295

Pureport Cable Engine?

Hi there! I am CTO of a company called Pureport. Our Multicloud Fabric allows for private connectivity between clouds and sites to be provisioned on-demand via a simple API. We support the dedicated private connection methods (AWS DirectConnect, Azure ExpressRoute, etc.) in addition to IPSec connectivity. Our platform also supports NAT'ing of attached networks which may provide a solution for issue #1 .

What would love to explore creating a Pureport cable engine for Submariner. What would be the best path for this?

does not install on RancherOS

I've written a number of related issues... curl -sfL https://get.k3s.io | sh - does not install k3s on RancherOS and that would seem to be normal or common sense.

Submariner Broker Url is a private IP. When we change it to the public IP of the instance, we get this error.

I have setup a broker and and one other cluster on AWS.
When I get the submariner broker url using this command.

SUBMARINER_BROKER_URL=$(kubectl -n default get endpoints kubernetes -o jsonpath="{.subsets[0].addresses[0].ip}:{.subsets[0].ports[0].port}")

It returns a private address.
So I changed this to the public address of the instance.

However on running the submariner commands on one of the cluster,
I see that this pod has gone into error state submariner-b9644d584-kr7ks.

When I check the logs of this pod, I get the following there.

I0325 15:03:08.685347 1 datastoresyncer.go:78] Ensuring we are the only endpoint active for this cluster
F0325 15:03:08.698528 1 datastoresyncer.go:81] Error while retrieving endpoints Get https://52.221.253.20:443/apis/submariner.io/v1/namespaces/submariner-k8s-broker/endpoints: x509: certificate is valid for 101.64.0.1, 127.0.0.1, not 52.221.253.20
goroutine 49 [running]:
github.com/rancher/submariner/vendor/k8s.io/klog.stacks(0xc000290500, 0xc0004dc000, 0xfe, 0x204)
/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:828 +0xd4
github.com/rancher/submariner/vendor/k8s.io/klog.(*loggingT).output(0x1df5a60, 0xc000000003, 0xc0001100b0, 0x1d874a7, 0x12, 0x51, 0x0)
/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:779 +0x306
github.com/rancher/submariner/vendor/k8s.io/klog.(*loggingT).printf(0x1df5a60, 0x3, 0x125f2a6, 0x23, 0xc0003bfac8, 0x1, 0x1)
/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:678 +0x14b
github.com/rancher/submariner/vendor/k8s.io/klog.Fatalf(0x125f2a6, 0x23, 0xc0003bfac8, 0x1, 0x1)
/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:1207 +0x67
github.com/rancher/submariner/pkg/controllers/datastoresyncer.(*DatastoreSyncer).ensureExclusiveEndpoint(0xc0003f1180)
/go/src/github.com/rancher/submariner/pkg/controllers/datastoresyncer/datastoresyncer.go:81 +0x968
github.com/rancher/submariner/pkg/controllers/datastoresyncer.(*DatastoreSyncer).Run(0xc0003f1180, 0xc00008a6c0, 0x0, 0x0)
/go/src/github.com/rancher/submariner/pkg/controllers/datastoresyncer/datastoresyncer.go:139 +0x2af
main.main.func1.3(0xc00041a6f0, 0xc0003f1180, 0xc00008a6c0, 0xc0004302b0)
/go/src/github.com/rancher/submariner/main.go:132 +0x63
created by main.main.func1
/go/src/github.com/rancher/submariner/main.go:130 +0xb63
00[DMN] signal of type SIGTERM received. Shutting down

Submariner for development team

Is it possible to use Submariner as a development tool? The idea would be to have local development nodes for developer accessing services the main shared cluster.

Thanks!

customresourcedefinitions.apiextensions.k8s.io "clusters.submariner.io" already exists

trying to install submariner broker also in same cluster as that of "cluster one"

cluster-one = . broker and gateway
cluster-two = gateway

Will this work? im getting following error

Error: customresourcedefinitions.apiextensions.k8s.io "clusters.submariner.io" already exists

helm install submariner-latest/submariner \
--name submariner \
--namespace submariner \
--set ipsec.psk="${SUBMARINER_PSK}" \
--set broker.server="${SUBMARINER_BROKER_URL}" \
--set broker.token="${SUBMARINER_BROKER_TOKEN}" \
--set broker.namespace="${SUBMARINER_BROKER_NS}" \
--set broker.ca="${SUBMARINER_BROKER_CA}" \
\
> --set submariner.clusterId="sg" \
> --set submariner.clusterCidr="10.42.0.0/16" \
> --set submariner.serviceCidr="10.43.0.0/16" \
> --set submariner.natEnabled="false"
Error: customresourcedefinitions.apiextensions.k8s.io "clusters.submariner.io" already exists

iptables is not present in GKE nodes

Hello,

Even after mounting host / to the pod, /usr/sbin/iptables is not available on GKE nodes.

main.go:87] Error running route controller: createIPTableChains returned error. Unable to create SUBMARINER-POSTROUTING chain in iptables: running [/usr/sbin/iptables -t nat -S]: exit status 127: chroot: failed to run command '/usr/sbin/iptables': No such file or directory

For now I've moved the agent Dockerfile to alpine.

Not sure if you want to address this.

Thank you

First time deploy - crd not cleaned-up

What I did
initial install (1):
"helm install submariner-latest/submariner-k8s-broker --name "submariner-k8s-broker" --namespace "submariner-k8s-broker"
than (2):
helm delete --purge submariner-k8s-broker
and again (3):
"helm install submariner-latest/submariner-k8s-broker --name "submariner-k8s-broker" --namespace "submariner-k8s-broker"

and got::

Error: customresourcedefinitions.apiextensions.k8s.io "clusters.submariner.io" already exists

Use periodic Travis triggers to expand test matrix

We currently only run CI on PR-based triggers, which requires us to keep job run times short enough to provide timely feedback to developers. However, as our CI has expanded, we have parts of our test matrix that are not being automatically tested. Ie, currently the KubeFed deploys are broken and no one knew because we had to run them manually/locally to see.

Add cron-type Travis jobs to cover additional parts of the test matrix:

Helm-based deploys
KubeFed deploys with Helm
Kubefed deploys with Operator

submariner-engine fails to start with error about /var/run/charon.vici

E0115 13:35:07.406556 1 strongswan.go:502] Failed to connect to charon: dial unix /var/run/charon.vici: connect: no such file or directory

E0115 13:35:07.500297 1 strongswan.go:502] Failed to connect to charon: dial unix /var/run/charon.vici: connect: no such file or directory

E0115 13:35:08.407199 1 strongswan.go:502] Failed to connect to charon: dial unix /var/run/charon.vici: connect: no such file or directory

E0115 13:35:08.500558 1 strongswan.go:502] Failed to connect to charon: dial unix /var/run/charon.vici: connect: no such file or directory

F0115 13:35:09.407418 1 main.go:138] Error starting the cable engine: Failed to load connections from charon: dial unix /var/run/charon.vici: connect: no such file or directory

goroutine 39 [running]:

k8s.io/klog.stacks(0xc00024a200, 0xc000186000, 0xb7, 0x279)

/go/src/github.com/submariner-io/submariner/vendor/k8s.io/klog/klog.go:828 +0xb1

k8s.io/klog.(*loggingT).output(0x1e32e80, 0xc000000003, 0xc00021d0a0, 0x1dbd204, 0x7, 0x8a, 0x0)

/go/src/github.com/submariner-io/submariner/vendor/k8s.io/klog/klog.go:779 +0x2d9

k8s.io/klog.(*loggingT).printf(0x1e32e80, 0x3, 0x129ad50, 0x23, 0xc00039ffa0, 0x1, 0x1)

/go/src/github.com/submariner-io/submariner/vendor/k8s.io/klog/klog.go:678 +0x14e

k8s.io/klog.Fatalf(...)

/go/src/github.com/submariner-io/submariner/vendor/k8s.io/klog/klog.go:1207

main.main.func1.1(0xc000462160, 0x145cac0, 0xc000456240, 0xc000458020)

/go/src/github.com/submariner-io/submariner/main.go:138 +0xe5

created by main.main.func1

/go/src/github.com/submariner-io/submariner/main.go:135 +0xa14

Gateway fail-over tests: should not fail if more than one labels exist

Gateway label is a mandatory step for Submariner, and adding another label is not wrong (can be done by user on existing environment) - so the test should not fail if more than 1 one label exists. It should only fail if 0 label exists.

In general, Gateway fail-over tests (like any other test scenario) should not rely on a certain environment setup (in our case, assumes only one gateway label exists), unless it initially takes care of creating that environment before the test (i.e. adding setup and teardown functions to the test scenario).

I suggest to fix this:

submariner/test/e2e/redundancy/gateway_failover.go

Line 16 in 03cb609

    
           When("one gateway node is configured and the submariner engine pod fails", func() {

And add a BeforeEach step to list the current number of gateway labels on each cluster

clusterAName := framework.TestContext.KubeContexts[framework.ClusterA]
clusterBName := framework.TestContext.KubeContexts[framework.ClusterB]

gatewayNodesClusterA := f.FindNodesByGatewayLabel(framework.ClusterA, true)
gatewayNodesClusterB := f.FindNodesByGatewayLabel(framework.ClusterA, true)

In addition, remove this check, as this should be a condition state (Ginkgo When state), not a condition to assert and fail upon:
Expect(gatewayNodes).To(HaveLen(1), fmt.Sprintf("Expected only one gateway node on %q", clusterName))

How to safely remove a k8s-cluster from submariner

Hi Guys,
I see instructions on how to install submariner, but there are no instructions on how to remove it. Will helm delete submariner remove a k8s-cluster config safely from an existing setup of 4 different cluster connected via submariner.

Thanks

"report-dir" argument can be removed (Ginkgo has --reportFile option)

"report-dir" argument for specifying junit tests output directory - can be removed, including any references that uses it (also in Docs):

submariner/test/e2e/framework/test_context.go

Line 40 in 06332b9

    
           flag.StringVar(&TestContext.ReportDir, "report-dir", "", "Path to the directory where the JUnit XML reports should be saved. Default is empty, which doesn't generate these reports.")

Ginkgo has this feature already, for example
--ginkgo.reportFile ${WORKDIR}/e2e_junit_result.xml

[Question] Comparison with K8s Federation

Interesting and certainly useful project. However, I'm wondering about how it compares with related K8s Federation project: https://github.com/kubernetes/federation. Please use this issue to clarify. Thank you!

Extend the scope of submariner to cluster-to-non-cluster

Currently, scope of submariner is to achieve connectivity of cluster-to-cluster. This issue is to extend its scope to cluster-to-non-cluster.

Background of this issue is that k8s doesn't provide a fixed egress source IP from a pod when a pod access to outside k8s cluster, in a common way (Please see discussion in kubernetes/enhancements#1105.) So, this could be a problem to achieve the connectivity of cluster-to-non-cluster.
I'm not sure that submariner is the right component to handle this, but as I commented in kubernetes/enhancements#1105 (comment), submariner will help achieve this goal. Therefore, I would like to get submariner community's attentions to it. (At least to consider whether this could also be a submariner's scope or this should be handled in a different project that submariner can collaborate with, etc ..)

ci failing to access cluster2

@dimaunx kindly assist

Error occurred while ci tests run on PR:
#221

Error log can be found:
https://travis-ci.com/submariner-io/submariner/builds/136954091

Does it work on rancher 2.1?

Since it's not documented what is the minimum version. Or do we need to upgrade to rancher 2.2?

SUBMARINER-POSTROUTING chain slipped its position in POSTROUTING chain

Submariner programs certain IPTable rules to preserve the source-IP of the PODs for cross-cluster communication.
Generally, the local cluster CNI also programs iptable rules and when the destination-IP does not belong to the local cluster, it usually performs MASQ/SNAT.
So, Submariner requires that the iptable rules that its programming has higher precedence over the iptable rules that are programmed by the local Cluster CNI.
In order to do this, Submariner inserts a rule with the target as the SUBMARINER-POSTROUTING chain at the beginning of the POSTROUTING chain as shown below (sample output with Weavenet).
The necessary iptable rules for cross-cluster communication would now be added to the SUBMARINER-POSTROUTING chain.

Chain POSTROUTING (policy ACCEPT 10 packets, 600 bytes)
num pkts bytes target prot opt in out source destination
1 5360 329K SUBMARINER-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0
2 5513 342K KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0
3 5464 339K WEAVE all -- * * 0.0.0.0/0 0.0.0.0/0

So far we have not seen an issue with this, but on a cluster that uses Calico, the CNI seems to update(/monitor?) the POSTROUTING chain and reinsert (after a certain interval?) its own chain (aka cali-POSTROUTING) at the beginning of the POSTROUTING chain moving the SUBMARINER-POSTROUTING rule lower as shown below.
Because of this, cross-cluster communication is failing.

Chain POSTROUTING (policy ACCEPT 2 packets, 120 bytes)
num pkts bytes target prot opt in out source destination
1 16254 1039K cali-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0
2 8537 576K SUBMARINER-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0
3 1685K 110M KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0

We need to identify the reason why SUBMARINER-POSTROUTING rule slipped its position (is it because the local CNI was restarted for some reason or if its because the CNI was monitoring iptable rules and updating them when necessary?) and handle such scenarios in Submariner.

Deploy submariner broker with rancher fqdn cluster

Use Rancher 2.2.1 and submariner helm

Generate pem for broker.submariner.MY-DOMIAN.COM (openssl req -newkey rsa:2048 -nodes -keyout key.pem -x509 -days 3650 -out submariner.pem)
Create cluster (broker) with fqdn broker.submariner.MY-DOMIAN.COM ( 10.42.0.0/16, 10.43.0.0/16 )
Set dns broker.submariner.MY-DOMIAN.COM A row to nodes from cluster (broker)
Install submariner broker to cluster (broker) form helm
Get ca.crt and data.token (from kubectl)
Create cluster (cluser 1) ( 10.50.0.0/16, 10.60.0.0/16, cluster1.local )
Install submariner to cluster (cluser 1) form helm with Broker Server = broker.submariner.MY-DOMIAN.COM:6443, ca.crt and data.token
Got error

I0329 07:31:52.430872 1 shared_informer.go:123] caches populated
I0329 07:31:52.431140 1 datastoresyncer.go:78] Ensuring we are the only endpoint active for this cluster
I0329 07:31:52.431705 1 shared_informer.go:123] caches populated
I0329 07:31:52.431739 1 tunnel.go:64] Starting workers
I0329 07:31:52.431789 1 tunnel.go:67] Started workers
F0329 07:31:52.807627 1 datastoresyncer.go:81] Error while retrieving endpoints Get https://broker.submariner.MY-DOMIAN.COM:6443/apis/submariner.io/v1/namespaces/submariner-k8s-broker/endpoints: x509: certificate is valid for broker-submariner-0, broker-submariner-1, localhost, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not broker.submariner.MY-DOMIAN.COM
goroutine 37 [running]:
github.com/rancher/submariner/vendor/k8s.io/klog.stacks(0xc000290500, 0xc00021c000, 0x1c5, 0x4ed)
/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:828 +0xd4
github.com/rancher/submariner/vendor/k8s.io/klog.(*loggingT).output(0x1df5a60, 0xc000000003, 0xc0001108f0, 0x1d874a7, 0x12, 0x51, 0x0)
/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:779 +0x306
github.com/rancher/submariner/vendor/k8s.io/klog.(*loggingT).printf(0x1df5a60, 0x3, 0x125f2a6, 0x23, 0xc0004ebac8, 0x1, 0x1)
/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:678 +0x14b
github.com/rancher/submariner/vendor/k8s.io/klog.Fatalf(0x125f2a6, 0x23, 0xc0004ebac8, 0x1, 0x1)
/go/src/github.com/rancher/submariner/vendor/k8s.io/klog/klog.go:1207 +0x67
github.com/rancher/submariner/pkg/controllers/datastoresyncer.(*DatastoreSyncer).ensureExclusiveEndpoint(0xc0003ea540)
/go/src/github.com/rancher/submariner/pkg/controllers/datastoresyncer/datastoresyncer.go:81 +0x968
github.com/rancher/submariner/pkg/controllers/datastoresyncer.(*DatastoreSyncer).Run(0xc0003ea540, 0xc00008a6c0, 0x0, 0x0)
/go/src/github.com/rancher/submariner/pkg/controllers/datastoresyncer/datastoresyncer.go:139 +0x2af
main.main.func1.3(0xc000411410, 0xc0003ea540, 0xc00008a6c0, 0xc000273d30)
/go/src/github.com/rancher/submariner/main.go:132 +0x63
created by main.main.func1
/go/src/github.com/rancher/submariner/main.go:130 +0xb63
00[DMN] signal of type SIGHUP received. Reloading configuration

First time deploy - permissions

I see in the replicaset the following::
(when deployed on minishift as user system:admin)

Events:
Type Reason Age From Message

Warning FailedCreate 2m (x20 over 21m) replicaset-controller Error creating: pods "submariner-6b65665876-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed capabilities.add: Invalid value: "ALL": capability may not be added spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used]

Initial connection fails randomly, including when we add a cluster

There is some "warm up" or "caching" that needs to happen somewhere on the
stack that will make the first connection very slow or fail.

By warming up with test_connection which retries, we force that "caching" to happen
before "E2E", we should investigate the root cause and fix it.

Deploying Submariner: Invalid value: 0x0: must be specified for an update

Deploying Submariner - must be specified for an update.txt

[nmanos@nmanos submariner-operator]$ kubconf_a  subctl join --clusterid subm-cluster-a ./broker-info.subm --ikeport 501 --nattport 4501
* ./broker-info.subm says broker is at: https://api.nmanos-cluster-a.devcluster.openshift.com:6443
* There are 1 labeled nodes in the cluster:
  - ip-10-0-71-116.ec2.internal
* Deploying the submariner operator
* The operator is up and running
* Discovering network details
    Discovered network details:
        Network plugin:  OpenShift
        ClusterIP CIDRs: [172.30.0.0/16]
        Pod CIDRs:       [10.128.0.0/14]
* Deploying Submariner
panic: submariners.submariner.io "submariner" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

goroutine 1 [running]:
github.com/submariner-io/submariner-operator/pkg/subctl/operator/deploy.Ensure(0xc00000a3c0, 0x15366db, 0x5, 0x1546b40, 0x12, 0x1195, 0x1f5, 0x0, 0xc0002fc8c0, 0x40, ...)
	/go/src/github.com/submariner-io/submariner-operator/pkg/subctl/operator/deploy/ensure.go:51 +0x6d9
github.com/submariner-io/submariner-operator/pkg/subctl/cmd.joinSubmarinerCluster(0xc0004ccf20)
	/go/src/github.com/submariner-io/submariner-operator/pkg/subctl/cmd/join.go:137 +0x33e
github.com/submariner-io/submariner-operator/pkg/subctl/cmd.glob..func2(0x23c9040, 0xc0004d6b60, 0x1, 0x7)
	/go/src/github.com/submariner-io/submariner-operator/pkg/subctl/cmd/join.go:69 +0x14a
github.com/spf13/cobra.(*Command).execute(0x23c9040, 0xc0004d6af0, 0x7, 0x7, 0x23c9040, 0xc0004d6af0)
	/go/src/github.com/submariner-io/submariner-operator/vendor/github.com/spf13/cobra/command.go:830 +0x2ae
github.com/spf13/cobra.(*Command).ExecuteC(0x23c92c0, 0x1252269, 0xc00066df88, 0xc00003e118)
	/go/src/github.com/submariner-io/submariner-operator/vendor/github.com/spf13/cobra/command.go:914 +0x2fc
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/submariner-io/submariner-operator/vendor/github.com/spf13/cobra/command.go:864
github.com/submariner-io/submariner-operator/pkg/subctl/cmd.Execute(...)
	/go/src/github.com/submariner-io/submariner-operator/pkg/subctl/cmd/root.go:36
main.main()
	/go/src/github.com/submariner-io/submariner-operator/pkg/subctl/main.go:6 +0x2f

pod log attached.

Submariner should correct the k8s datastore endpoint when started with a different NATENABLED option

The submariner pod when reconciling the remote endpoint does not seem to do so if the endpoint already exists; instead, we should force an update of the remote endpoint at startup in case there is a configuration change to the endpoint.

Thank you to @sagargulabanicldcvr for helping me discover this bug.

ZeroTier as Connection Method?

Suggesting this to see if it can be integrated? Might facilitate the connection process, it's free and self-hostable.

https://www.zerotier.com/

Cleanup vars in e2e script/libs

Follow-up on #273 review comments related to additional variable refactoring that should be done in the e2e script and related Helm/subctl libraries.

I think that the setup_subm_vars part deserves a 2nd round of cleanups.
Would it be feasible to setup the env vars somewhere in the main e2e.sh, and then make sure that any shared/needed details from there where used on each module? (for example the CIDRs that we need to maintain in two places)
Other vars like subm_ns are intrisic to each method, but those could be declared as on the top level of each .sh (helm/operator file).

#273 (comment)

yeah the namespaces are different but the libraries already expose a $sub_ns var that is used elsewhere in the base code. So it seems like it could be used here as well. But not really a big deal either way.

#273 (comment)

endpoints submariner-cable-subm not found

Please see attached log

[chart] CE_IPSEC debug causes Submariner to crash

Instead of doing what it's supposed to, which is run and start Charon in debug mode, setting the CE_IPSEC debug directive causes Submariner to crash. This is most likely a fault with the logic contained inside of the bash script in conjunction with the way we detect whether to run the cable engine in debug mode.

Something wrong on rancher 2.2.1 (2.1.7 2.1.8)

1) start rancher HA on rke add catalog https://github.com/rancher/submariner-charts

2) create cluster broker

one node (etcd, control panel, worker)
deploy submariner-k8s-broker

addon_job_timeout: 30
authentication: 
  strategy: "x509"
bastion_host: 
  ssh_agent_auth: false
ignore_docker_version: true
# 
#   # Currently only nginx ingress provider is supported.
#   # To disable ingress controller, set `provider: none`
#   # To enable ingress on specific nodes, use the node_selector, eg:
#      provider: nginx
#      node_selector:
#        app: ingress
# 
ingress: 
  provider: "nginx"
kubernetes_version: "v1.13.5-rancher1-2"
monitoring: 
  provider: "metrics-server"
# 
#   # If you are using calico on AWS
# 
#      network:
#        plugin: calico
#        calico_network_provider:
#          cloud_provider: aws
# 
#   # To specify flannel interface
# 
#      network:
#        plugin: flannel
#        flannel_network_provider:
#          iface: eth1
# 
#   # To specify flannel interface for canal plugin
# 
#      network:
#        plugin: canal
#        canal_network_provider:
#          iface: eth1
# 
network: 
  options: 
    flannel_backend_type: "vxlan"
  plugin: "canal"
restore: 
  restore: false
# 
#      services:
#        kube-api:
#          service_cluster_ip_range: 10.43.0.0/16
#        kube-controller:
#          cluster_cidr: 10.42.0.0/16
#          service_cluster_ip_range: 10.43.0.0/16
#        kubelet:
#          cluster_domain: cluster.local
#          cluster_dns_server: 10.43.0.10
# 
services: 
  etcd: 
    backup_config: 
      enabled: true
      interval_hours: 12
      retention: 6
    creation: "12h"
    extra_args: 
      election-timeout: "5000"
      heartbeat-interval: "500"
    retention: "72h"
    snapshot: false
  kube-api: 
    always_pull_images: false
    pod_security_policy: false
    service_node_port_range: "30000-32767"
  kubelet: 
    fail_swap_on: false
ssh_agent_auth: false
# 
#   # Rancher Config
# 
docker_root_dir: "/var/lib/docker"
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
local_cluster_auth_endpoint: 
  enabled: false
name: "test-submariner-broker"

3) create east cluster

two nodes (1 - etcd, worker; 2 - control panel, worker)
deploy submariner
deploy simple nginx:alpine (global)

addon_job_timeout: 30
authentication: 
  strategy: "x509"
bastion_host: 
  ssh_agent_auth: false
dns: 
  provider: "kube-dns"
ignore_docker_version: true
# 
#   # Currently only nginx ingress provider is supported.
#   # To disable ingress controller, set `provider: none`
#   # To enable ingress on specific nodes, use the node_selector, eg:
#      provider: nginx
#      node_selector:
#        app: ingress
# 
ingress: 
  provider: "nginx"
kubernetes_version: "v1.13.5-rancher1-2"
monitoring: 
  provider: "metrics-server"
# 
#   # If you are using calico on AWS
# 
#      network:
#        plugin: calico
#        calico_network_provider:
#          cloud_provider: aws
# 
#   # To specify flannel interface
# 
#      network:
#        plugin: flannel
#        flannel_network_provider:
#          iface: eth1
# 
#   # To specify flannel interface for canal plugin
# 
#      network:
#        plugin: canal
#        canal_network_provider:
#          iface: eth1
# 
network: 
  options: 
    flannel_backend_type: "vxlan"
  plugin: "canal"
restore: 
  restore: false
# 
#      services:
#        kube-api:
#          service_cluster_ip_range: 10.43.0.0/16
#        kube-controller:
#          cluster_cidr: 10.42.0.0/16
#          service_cluster_ip_range: 10.43.0.0/16
#        kubelet:
#          cluster_domain: cluster.local
#          cluster_dns_server: 10.43.0.10
# 
services: 
  etcd: 
    backup_config: 
      enabled: true
      interval_hours: 12
      retention: 6
    creation: "12h"
    extra_args: 
      election-timeout: "5000"
      heartbeat-interval: "500"
    retention: "72h"
    snapshot: false
  kube-api: 
    always_pull_images: false
    pod_security_policy: false
    service_cluster_ip_range: "10.99.0.0/16"
    service_node_port_range: "30000-32767"
  kube-controller: 
    cluster_cidr: "10.98.0.0/16"
    service_cluster_ip_range: "10.99.0.0/16"
  kubelet: 
    cluster_dns_server: "10.99.0.10"
    cluster_domain: "east.local"
    fail_swap_on: false
ssh_agent_auth: false
# 
#   # Rancher Config
# 
docker_root_dir: "/var/lib/docker"
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
local_cluster_auth_endpoint: 
  enabled: false
name: "test-submariner-east"

4) create west cluster

two nodes (1 - etcd, worker; 2 - control panel, worker)
deploy submariner
deploy simple nginx:alpine (global)

addon_job_timeout: 30
authentication: 
  strategy: "x509"
bastion_host: 
  ssh_agent_auth: false
dns: 
  provider: "kube-dns"
ignore_docker_version: true
# 
#   # Currently only nginx ingress provider is supported.
#   # To disable ingress controller, set `provider: none`
#   # To enable ingress on specific nodes, use the node_selector, eg:
#      provider: nginx
#      node_selector:
#        app: ingress
# 
ingress: 
  provider: "nginx"
kubernetes_version: "v1.13.5-rancher1-2"
monitoring: 
  provider: "metrics-server"
# 
#   # If you are using calico on AWS
# 
#      network:
#        plugin: calico
#        calico_network_provider:
#          cloud_provider: aws
# 
#   # To specify flannel interface
# 
#      network:
#        plugin: flannel
#        flannel_network_provider:
#          iface: eth1
# 
#   # To specify flannel interface for canal plugin
# 
#      network:
#        plugin: canal
#        canal_network_provider:
#          iface: eth1
# 
network: 
  options: 
    flannel_backend_type: "vxlan"
  plugin: "canal"
restore: 
  restore: false
# 
#      services:
#        kube-api:
#          service_cluster_ip_range: 10.43.0.0/16
#        kube-controller:
#          cluster_cidr: 10.42.0.0/16
#          service_cluster_ip_range: 10.43.0.0/16
#        kubelet:
#          cluster_domain: cluster.local
#          cluster_dns_server: 10.43.0.10
# 
services: 
  etcd: 
    backup_config: 
      enabled: true
      interval_hours: 12
      retention: 6
    creation: "12h"
    extra_args: 
      election-timeout: "5000"
      heartbeat-interval: "500"
    retention: "72h"
    snapshot: false
  kube-api: 
    always_pull_images: false
    pod_security_policy: false
    service_cluster_ip_range: "10.1.0.0/16"
    service_node_port_range: "30000-32767"
  kube-controller: 
    cluster_cidr: "10.0.0.0/16"
    service_cluster_ip_range: "10.1.0.0/16"
  kubelet: 
    cluster_dns_server: "10.1.0.10"
    cluster_domain: "west.local"
    fail_swap_on: false
ssh_agent_auth: false
# 
#   # Rancher Config
# 
docker_root_dir: "/var/lib/docker"
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
local_cluster_auth_endpoint: 
  enabled: false
name: "test-submariner-west"

5) test!

on west we have 2 nginx pods
10.0.0.5
10.0.1.4
on east we have 2 nginx pods
10.98.1.5
10.98.0.4

interest things

east cluster
on server when deployed rancher/submariner (engine) we execute shell to pod
and wget wests pods 10.98.1.5 and 10.98.0.4 its okay 200ok!
but on server without submariner (engine)
wget freeze and timeout
west cluster
like east

conclusion:

Cross cluster network works only on the machine on which the engine is running, as a result, only one server sees all the pods of another cluster

GKE (and other VPC-cni clusters) do not allow remote-destined traffic to route

When examining the network flow between two clusters (AWS and GKE), it was observed that while IPsec tunnels could be established between the clusters, remote destined traffic to GKE pods that were not on the gateway node was not passed. Looking at this further, the GKE gateway node was trying to pass on traffic with the source IP being the remote (AWS) cluster gateway node, and thus the GKE VPC gateway was rejecting this traffic due to the "bogus" source IP (as it should).

In order to resolve this, we should examine putting an iptables SNAT rule into place on the destination node in order to NAT traffic from the remote cluster to the gateway node IP. Before we do this, we should examine any potential issues that may occur due to this change, and see if we should universally make the change. Otherwise, the change may be introduced as a flag.

The rule that was put into place to conduct this traffic was:

iptables -t nat -A POSTROUTING -s <remote-endpoint-ip>/32 -d <cluster/service-cidr> -j SNAT --to-source <local-endpoint-ip>

Thank you to @sagargulabanicldcvr for reporting this issue and working with me to debug it

tls: internal error on OpenShift deployment

By using subctl I installed Submariner on 2 OpenShift v4.1 clusters. The installation successfully finished, and I did not get any errors.
However, when later I tried to create a busybox, i got the following error:
Error from server: error dialing backend: remote error: tls: internal error.
Looks like I have faced with the issue described here.

Therefore, according to the above recommendations I checked for pending CSRs. There were around 230 submariner's pending CSRs, such as
csr-z8fsp 26m system:node:submariner02-master01.submariner02.openshift.haifa.ibm.com Pending
I approved the CSRs by oc get csr -o name | xargs oc adm certificate approve
and the issue was resolved.

Should we add the resolution description into the installation guide? or add automatic CSRs approvement?

Cross clusters DNS discovery

What about dns discovery like cilium

Refactoring E2E sh to use armada and separate helm/subctl

Use armada to deploy kind clusters
Separate helm and subctl logic to different "sh" libs that we can import from e2e.sh, the functions to setup stuff would have the same name in the helm and subctl versions...
Remove all the conditionals: we can now call setup_broker, install_subm, setup_xxx_gateway ....

prep_for_subm.sh flag to accept terraform actions

Please add '--accept' or similar flag as user input for prep_for_subm.sh script, to accept Terraform actions, without user interact.

Prototype Armada-based deployments, related e2e script refactoring

This is to cover part of #256 that should be broken out into its own Issue.

Use armada to deploy kind clusters

Refactor the e2e scripts to deploy kind-based clusters using Armada. Note that Armada is still a PR under review, being refactored on a feature branch (https://github.com/submariner-io/armada/pull/1/files).

the pod of submariner-engine always Back-off

when I exec "helm install submariner-latest/submariner
--name submariner
--namespace submariner
--set ipsec.psk="${SUBMARINER_PSK}"
--set broker.server="${SUBMARINER_BROKER_URL}"
--set broker.token="${SUBMARINER_BROKER_TOKEN}"
--set broker.namespace="${SUBMARINER_BROKER_NS}"
--set broker.ca="${SUBMARINER_BROKER_CA}"

--set submariner.clusterId="<CLUSTER_ID>"
--set submariner.clusterCidr="<CLUSTER_CIDR>"
--set submariner.serviceCidr="<SERVICE_CIDR>"
--set submariner.natEnabled="<NAT_ENABLED>"
the submariner-engine pod status:
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
submariner-engine-token-llv7j:
Type: Secret (a volume populated by a Secret)
SecretName: submariner-engine-token-llv7j
Optional: false
QoS Class: BestEffort
Node-Selectors: submariner.io/gateway=true
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message

Normal Scheduled 85s default-scheduler Successfully assigned submariner/submariner-585556659b-txbb4 to instance-qynqwmj9-1
Normal Pulling 34s (x4 over 85s) kubelet, instance-qynqwmj9-1 pulling image "rancher/submariner:v0.0.1"
Normal Pulled 33s (x4 over 81s) kubelet, instance-qynqwmj9-1 Successfully pulled image "rancher/submariner:v0.0.1"
Normal Created 33s (x4 over 81s) kubelet, instance-qynqwmj9-1 Created container
Normal Started 33s (x4 over 81s) kubelet, instance-qynqwmj9-1 Started container
Warning BackOff 8s (x7 over 78s) kubelet, instance-qynqwmj9-1 Back-off restarting failed container
I want to know how to fix it .Thanks.

Suggestion for Architecture diagram

Hi 👋

Your project looks very cool and it reminds me a bit of inlets.dev which also bridges networks, but in a bit more of a rudimentary way.

Given that the primary example in the repo uses 3 clusters (east, west and broker), and the diagram shows two clusters I found this a bit misleading.

Would you consider updating the image or putting together a new one to match the example?

https://raw.githubusercontent.com/rancher/submariner/master/docs/img/architecture.png

maybe use a model VPN like wireguard ?

https://www.wireguard.com/

Its open and in c and golang