Code Monkey home page Code Monkey logo

Comments (11)

ikatson avatar ikatson commented on May 22, 2024 1

Looks like this issue is still there if calico is used for network policy, as Calico triggers its own rules before the ones that solve the issue, see my comment here #231 (comment)

from amazon-vpc-cni-k8s.

lbernail avatar lbernail commented on May 22, 2024

A solution to this issue would be to mark nodeport traffic and use conntrack to force the reverse path through the primay ENI:

iptables -A PREROUTING -i eht0 -t mangle -p tcp --dport 30000:32767 -j CONNMARK --set-mark 42
iptables -t mangle -A PREROUTING -j CONNMARK -i eni+ --restore-mark
ip rule add fwmark 42 lookup main pref 1024

from amazon-vpc-cni-k8s.

liwenwu-amazon avatar liwenwu-amazon commented on May 22, 2024

@lbernail can I reproduce this by:

  • create a cluster which contains only 1 node
  • create enough Pods which will exhausts all addresses on primary ENI
  • create a service and its back-end pods, and back-end pods will get IP addresses from 2nd ENIs
  • then Pods on primary ENIs will NOT able to communicate with the service which is backed by back-end pods on 2nd ENIs

right? thanks

from amazon-vpc-cni-k8s.

lbernail avatar lbernail commented on May 22, 2024

@liwenwu-amazon yes that would work but you need to access the service using nodeip:nodeport from outside the node: if I recall correctly accessing a nodeport from a pod will redirect you to the standard service endpoints and bypass the nodeport iptables rule

from amazon-vpc-cni-k8s.

liwenwu-amazon avatar liwenwu-amazon commented on May 22, 2024

@lbernail , we seems only can reproduce this problem when we disable external SNAT. #120 . Can you confirm if this is same case for you? With external SNAT, all traffic for node-port will get sent and received on eth0, so it works. When external SNAT #120 is enabled, the incoming traffic is sent to eth0, and outing traffic is sent to eth1, so it breaks the Linux connection tracking ...

from amazon-vpc-cni-k8s.

lbernail avatar lbernail commented on May 22, 2024

@liwenwu-amazon if the traffic is coming from outside the VPC-CIDR it will be SNATed so yes it will solve the issue. However if the request comes from the VPC-CIDR the answer will be routed using the additional ENI (if the target pod is not on the primary ENI) and you should have the reverse-path issue

from amazon-vpc-cni-k8s.

fasaxc avatar fasaxc commented on May 22, 2024

I think this issue impacts anyone who tries to use NodePorts from within the same VPC too, which seems fairly mainline. Since this is impacting my team, I've started working on a fix based on the above suggestion to use the connmark with a couple of changes:

  • only use one connmark bit so that it doesn't clash with kube-proxy or Calico's use of the mark
  • use an interface match to avoid needing to match on the port number.

Here's my current set of rules:

iptables -t mangle -A PREROUTING -i eth0 -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-mark 0x1/0x1
iptables -t mangle -A PREROUTING -i eni+ -j CONNMARK --restore-mark --mask=0x1

from amazon-vpc-cni-k8s.

lbernail avatar lbernail commented on May 22, 2024

@fasaxc : you do it this way to increase performances?

from amazon-vpc-cni-k8s.

fasaxc avatar fasaxc commented on May 22, 2024

@lbernail The change to use the addrtype? That does a couple of things:

  • It syncs up with kupe-proxy's rule:
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS

which also has the addrtype clause; this makes sure that we don't match traffic that isn't heading to an IP assigned to the host (i.e. traffic that is going directly to a pod but happens to be going to a port in the node port range)

  • It avoids needing to hard code or configure the NodePort range, which, at least in theory, is configurable. Also, I think you can create a NodePort manually outside that range.

When you combine the two rules, you get <connection seen on eth0 heading to a local IP> AND <connection seen leaving a veth>. Putting those two together, I think it implies that the packet was heading to a NodePort.

from amazon-vpc-cni-k8s.

liwenwu-amazon avatar liwenwu-amazon commented on May 22, 2024

Today, aws-vpc-cni assigns a native VPC IP address to a Pod. So it works directly with AWS ALB and NLB without any port mapping. In another word, Pod IP can be added ALB and NLB 's target group. This improves network performance, operation, debugging and remove the need to manipulate IP tables or IPVS tables.

There is a woke-in-progress PR/Support routing directly to pods in aws-alb-ingress-controller.

We are also actively working on NLB controller, so that traffic towards a service VIP backed by NLB can directly sent to Pod IP.

from amazon-vpc-cni-k8s.

lbernail avatar lbernail commented on May 22, 2024

👍
This is very good news!
We were actually thinking about working on a similar PR for the ALB ingress

from amazon-vpc-cni-k8s.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.