Code Monkey home page Code Monkey logo

loxilb's Introduction

image

Website eBPF Emerging Project Go Report Card OpenSSF Best Practices build workflow sanity workflow
apache Info Slack

What is loxilb

loxilb is an open source cloud-native load-balancer based on GoLang/eBPF with the goal of achieving cross-compatibility across a wide range of on-prem, public-cloud or hybrid K8s environments.

Kubernetes with loxilb

Kubernetes defines many service constructs like cluster-ip, node-port, load-balancer etc for pod to pod, pod to service and service from outside communication.

All these services are provided by load-balancers/proxies operating at Layer4/Layer7. Since Kubernetes's is highly modular, these services can be provided by different software modules. For example, kube-proxy is used by default to provide cluster-ip and node-port services.

Service type load-balancer is usually provided by public cloud-provider(s) as a managed entity. But for on-prem and self-managed clusters, there are only a few good options available. Even for provider-managed K8s like EKS, there are many who would want to bring their own LB to clusters running anywhere. loxilb provides service type load-balancer as its main use-case. loxilb can be run in-cluster or ext-to-cluster as per user need.

Additionally, loxilb can also support cluster-ip and node-port services, thereby providing end-to-end connectivity for Kubernetes.

Why choose loxilb?

  • Performs much better compared to its competitors across various architectures
  • Utitlizes ebpf which makes it flexible as well as customizable
  • Advanced quality of service for workloads (per LB, per end-point or per client)
  • Works with any Kubernetes distribution/CNI - k8s/k3s/k0s/kind/OpenShift + Calico/Flannel/Cilium/Weave/Multus etc
  • Extensive support for SCTP workloads (with multi-homing) on K8s
  • Dual stack with NAT66, NAT64 support for K8s
  • K8s multi-cluster support (planned ๐Ÿšง)
  • Runs in any cloud (public cloud/on-prem) or standalone environments

Overall features of loxilb

  • L4/NAT stateful loadbalancer
    • NAT44, NAT66, NAT64 with One-ARM, FullNAT, DSR etc
    • Support for TCP, UDP, SCTP (w/ multi-homing), QUIC, FTP, TFTP etc
  • High-availability support with BFD detection for hitless/maglev/cgnat clustering
  • Extensive and scalable end-point liveness probes for cloud-native environments
  • Stateful firewalling and IPSEC/Wireguard support
  • Optimized implementation for features like Conntrack, QoS etc
  • Full compatibility for ipvs (ipvs policies can be auto inherited)
  • Policy oriented L7 proxy support - HTTP1.0, 1.1, 2.0 etc (planned ๐Ÿšง)

Components of loxilb

  • GoLang based control plane components
  • A scalable/efficient eBPF based data-path implementation
  • Integrated goBGP based routing stack
  • A kubernetes agent kube-loxilb written in Go

Layer4 Vs Layer7

loxilb works as a L4 load-balancer/service-mesh by default. Although it provides great performance, at times, L7 load-balancing might become necessary in K8s. There are many good L7 proxies already available for K8s. Still, we are working on providing a great L7 solution natively in eBPF. It is a tough endeavor one which should reap great benefits once completed. Please keep an eye for updates on this.

Telco-Cloud with loxilb

For deploying telco-cloud with cloud-native functions, loxilb can be used as a SCP(service communication proxy). SCP is nothing but a glorified term for Kubernetes load-balancing/proxy. But telco-cloud requires load-balancing across various interfaces/standards like N2, N4, E2(ORAN), S6x, 5GLAN, GTP etc. Each of these interfaces present its own unique challenges(and DPI) for load-balancing which loxilb aims to solve e.g.

  • N4 requires PFCP level session-intelligence
  • N2 requires NGAP parsing capability
  • S6x requires Diameter/SCTP multi-homing LB support
  • MEC use-cases might require UL-CL understanding
  • Hitless failover support might be essential for mission-critical applications
  • E2 might require SCTP-LB with OpenVPN bundled together

How-To Guides

Getting started with different K8s distributions/tools

loxilb as ext-cluster pod

loxilb as in-cluster pod

loxilb as service-proxy

Knowledge-Base

Community

Slack

Join the loxilb Slack channel to chat with loxilb developers and other loxilb users. This is a good place to learn about loxilb, ask questions, and work collaboratively.

General Discussion

Feel free to post your queries in github discussion. If you find any issue/bugs, please raise an issue in github and members from loxilb community will be happy to help.

CICD Workflow Status

Features(Ubuntu20.04) Features(Ubuntu22.04) Features(RedHat9)
build workflow Docker-Multi-Arch SCTP-LB-Sanity-CI-RH9
simple workflow Sanity-CI-Ubuntu-22 Sanity-CI-RH9
tcp-lb-sanity-CI tcp-lb-sanity-CI TCP-LB-Sanity-CI-RH9
udp-lb-sanity-CI udp-lb-sanity-CI UDP-LB-Sanity-CI-RH9
sctp-lb-sanity-CI ipsec-sanity-CI IPsec-Sanity-CI-RH9
extlb workflow nat66-sanity-CI NAT66-LB-Sanity-CI-RH9
ipsec-sanity-CI Scale-Sanity-CI-Ubuntu-22 Adv-LB-Sanity-CI-RH9
scale-sanity-CI perf-CI
liveness-sanity-CI
nat66-sanity-CI
perf-CI
K3s Tests K8s Cluster Tests EKS Test
K3s-Base-Sanity-CI K8s-Calico-Cluster-IPVS-CI EKS
k3s-flannel-CI K8s-Calico-Cluster-IPVS2-CI
k3s-flannel-ubuntu22-CI K8s-Calico-Cluster-IPVS3-CI
k3s-flannel-cluster-CI K8s-Calico-Cluster-IPVS3-HA-CI
k3s-flannel-incluster-CI
k3s-flannel-incluster-l2-CI
k3s-calico-CI
k3s-cilium-cluster-CI
k3s-sctpmh-CI
k3s-sctpmh-ubuntu22-CI
k3s-sctpmh-2-CI

๐Ÿ“š Please check loxilb website for more detailed info.

loxilb's People

Contributors

backguynn avatar codesnip12 avatar cybwan avatar ianchen0119 avatar inhogog2 avatar k8sguru avatar krizerg avatar luisgerhorst avatar nik-netlox avatar packetcrunch avatar trekkiecoder avatar ultrainstinct14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

loxilb's Issues

cicd, scale and long-run testing effort

Scale and performance are the defining factors for load-balancers. Although we have some performance numbers with wrk, we need to get some scale numbers eg. how many sessions , how many CTs etc. We need to find a proper open-source tool for this testing e.g Trex

HA clustering support

Usually load-balancers need to be deployed in cluster. So, as a first step we need two things -

  1. Integration with keepalived or something similar (need to discuss)
  2. HA state management. Most importantly eBPF conntrack data/maps need to be in proper sync during HA transition

Overall need to make sure, there is no traffic loss in loxilb during HA transitions

Host DNAT functionality

In certain cases, when end-point of a load-balancer rule is the originating host itself, it results in traffic loss. It is especially required in K8s CNI LB implementation but less so in external LB situation.

sctp processing problem in 5.4 linux kernel

I created a sctp load-balancer rule as follows in loxilb docker based on loxilb documentation -

root@5affc126b9e2:/# loxicmd  get lb -o wide
| EXTERNAL IP | PORT | PROTOCOL | SELECT | ENDPOINT IP | TARGET PORT | WEIGHT |
|-------------|------|----------|--------|-------------|-------------|--------|
| 20.20.20.1  | 2020 | sctp     |      0 | 32.32.32.1  |        5001 |      1 |
|             |      |          |        | 33.33.33.1  |        5001 |      1 |
|             |      |          |        | 34.34.34.1  |        5001 |      1 |

But when LB session packets are sent towards the VIP (20.20.20.1), nothing is shown in conntrack table. However TCP rule is being processed properly.

When kernel was upgraded to 5.13. the sctp problem went away on its own. Can somebody clarify this behavior ??

Evaluate Go report card for loxilb

Go report card always shows the following

There was an error processing your request: Could not analyze the repository: could not download repo: could not get latest module version from https://proxy.golang.org/loxilb/@latest: bad request: invalid escaped module path "loxilb": malformed module path "loxilb": missing dot in first path element

The same is reported properly for loxilib.

Basic Sanity-CI fails randomly

It is seen that basic sanity-CI workflow fails randomly with the following error when running go unit test framework

unknown flag `t'
exit status 1
FAIL	github.com/loxilb-io/loxilb	0.936s
make: *** [Makefile:19: test] Error 1
Error: Process completed with exit code 2.

Need to look into it. Additional logs

goBGP integration stability

goBGP integration is in nascent stage. We need to test and stabilize it for both imported and exported routes

Random LB session initiation gets dropped

After initial creation of a LB rule, initial traffic session which uses hits this rule gets dropped. It is further observed that randomly some sessions do not connect.

Steps to reproduce -

  1. Create LB rule
loxicmd -p 11112 create lb 20.20.20.1 --tcp=2020:5001 --endpoints=31.31.31.1:1,32.32.32.1:1,17.17.17.1:1
  1. Send traffic to hit the LB rule

goBGP handling

We need to be able to manage goBGP process from inside loxilb ( forking, restarting etc)

loxilb crash after creating load-balancer rule

loxilb build info

loxilb version: 0.7.0 2022_08_31-main

Logs

panic: runtime error: invalid memory address or nil pointer dereference

goroutine 21 [running]:
[github.com/loxilb-io/loxilb/loxinet.(*DpEbpfH).DpStat(0xc0000bcf20](http://github.com/loxilb-io/loxilb/loxinet.(*DpEbpfH).DpStat(0xc0000bcf20)?, 0xc003f06100)
	/root/loxilb-io/loxilb/loxinet/dpebpf_linux.go:758 +0x429
[github.com/loxilb-io/loxilb/loxinet.(*DpH).DpWorkOnStat(...)](http://github.com/loxilb-io/loxilb/loxinet.(*DpH).DpWorkOnStat(...))
	/root/loxilb-io/loxilb/loxinet/dpbroker.go:335
[github.com/loxilb-io/loxilb/loxinet.DpWorkSingle(0xc0000bcf88](http://github.com/loxilb-io/loxilb/loxinet.DpWorkSingle(0xc0000bcf88)?, {0xb7af80?, 0xc003f06100?})
	/root/loxilb-io/loxilb/loxinet/dpbroker.go:372 +0x1d3
[github.com/loxilb-io/loxilb/loxinet.DpWorker(0x0](http://github.com/loxilb-io/loxilb/loxinet.DpWorker(0x0)?, 0xc00010ed20, 0xc000130c60)
	/root/loxilb-io/loxilb/loxinet/dpbroker.go:387 +0xe5
created by [github.com/loxilb-io/loxilb/loxinet.DpBrokerInit](http://github.com/loxilb-io/loxilb/loxinet.DpBrokerInit)
	/root/loxilb-io/loxilb/loxinet/dpbroker.go:406 +0x16e

Steps to reproduce

  1. Create any LB rule as follows
loxicmd create lb 20.20.20.1 --tcp=2020:5001 --endpoints=32.32.32.1:1

Passive stateful conntrack mode support in loxilb

loxilb provide's its own alternate conntrack implementation. Some users have requested conntrack only mode where loxilb does nothing else but conntrack mode. It might be an interesting feature for quick debugging in the cloud-native networking arena without affecting anything.

ULCL classifier support

GTP is the de-facto standard tunneling used in 3GPP. We need to be able to parse (including extension), support encap-decap and load-balance on outer or inner header fields in ebpf kernel.

L7 parsing support

We need to be able to support L7 proxy or splicing as popularly known

Hang issue when failed to add new neighbor info

when i get This error log, loxilb get hang:

INFO: 2022/08/03 07:25:39 [NLP] NH 192.168.57.101 mac [8 0 39 36 110 98] dev eth0 added
INFO: 2022/08/03 07:25:39 [NLP] NH 192.168.57.101 mac [8 0 39 36 110 98] dev eth0 added
ERR:  2022/08/03 07:25:44 Neigh MAC add failed-Same FDB
ERR:  2022/08/03 07:25:44 [NLP] NH 192.168.57.104 mac [8 0 39 157 200 222] dev eth0 add failed NH mac error

This is lock issue in loxinet/apiclient.go

Performance issues and random drops in SCTP sessions

How to reproduce -

  • Run sctp server
sctp_test -H 32.32.32.1 -P 5001 -l
  • Run sctp client ( repeated )
sctp_test -H 100.100.100.1  -h 32.32.32.1 -p 5001 -s -c 1 -M 100
  • Check loxilb ct status
root@8b74b5ddc4d2:~/loxilb-io# loxicmd -p 11112 get ct
| DESTINATIONIP |   SOURCEIP    | DESTINATIONPORT | SOURCEPORT | PROTOCOL | STATE | ACT | PACKETS | BYTES  |
|---------------|---------------|-----------------|------------|----------|-------|-----|---------|--------|
| 32.32.32.1    | 100.100.100.1 |            5001 |      38066 | sctp     | est   |     |      47 | 207472 |
| 100.100.100.1 | 32.32.32.1    |           44888 |       5001 | sctp     | est   |     |      59 |   3204 |
| 32.32.32.1    | 100.100.100.1 |            5001 |      44888 | sctp     | est   |     |      67 | 269500 |
| 100.100.100.1 | 32.32.32.1    |           38066 |       5001 | sctp     | est   |     |      32 |   1580 |

Some sessions do not transition to SCTP shutdown-complete state which is expected behavior. For example, ideally, following is expected for all SCTP sessions -

oot@8b74b5ddc4d2:~/loxilb-io# loxicmd -p 11112 get ct
| DESTINATIONIP |   SOURCEIP    | DESTINATIONPORT | SOURCEPORT | PROTOCOL |     STATE     | ACT | PACKETS | BYTES  |
|---------------|---------------|-----------------|------------|----------|---------------|-----|---------|--------|
| 32.32.32.1    | 100.100.100.1 |            5001 |      46201 | sctp     | shut-complete |     |      26 | 111824 |
| 32.32.32.1    | 100.100.100.1 |            5001 |      52981 | sctp     | shut-complete |     |      28 | 120648 |
| 100.100.100.1 | 32.32.32.1    |           57020 |       5001 | sctp     | shut-complete |     |      39 |   1888 |
| 100.100.100.1 | 32.32.32.1    |           44950 |       5001 | sctp     | shut-complete |     |      31 |   1488 |
| 100.100.100.1 | 32.32.32.1    |           60093 |       5001 | sctp     | shut-complete |     |      31 |   1488 |

Configurable timeout per LB rule

We need to support a configurable timeout usually for TCP connections. Normally the LB should send TCP reset in established mode, if timeout is reached.

Need garbage collection of eBPF fc-map

After running 1k LB session run, it is seen that fc-map entries remain in loxilb

bpftool map dump pinned /opt/loxilb/dp/bpf/fc_v4_map  | grep -i key | wc -l
1024

fc-map eBPF entries can get reused inside eBPF logic depending on usage but that depends on incoming traffic. Hence, we need to do garbage collection of fc-map entries.

Travis-CI failing

Travis-CI is failing with the following logs :

/usr/bin/ld: cannot find -lbsd
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:27: ip] Error 1

Github CI/CD integration

We need to initially have basic CI/CD pipeline based on go unit test framework throughout loxilb. Later we can build on this pipeline.

vMirroring support

We need to support mirroring or SPAN as is better known for debugging as well as for logging as and when required

loxilb ebpf is not working in 5.4 linux kernel

When we run loxilb in little older kernel we get the following logs -

288=mm0000mm fp-296=00000000
1420: (b7) r1 = 1
; lock_xadd(&act->ctd.pb.packets, 1);
1421: (db) lock *(u64 *)(r7 +104) += r1
 R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1_w=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
 R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1_w=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
1422: (05) goto pc+54
1477: safe

from 1458 to 1460: R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
; int z = 0;
1460: (b7) r1 = 0
1461: (63) *(u32 *)(r10 -16) = r1
; if (F->l4m.ct_sts != 0) {
1462: (71) r1 = *(u8 *)(r10 -114)
; if (F->l4m.ct_sts != 0) {
1463: (55) if r1 != 0x0 goto pc+13
1464: (bf) r2 = r10
; 
1465: (07) r2 += -16
1466: (bf) r3 = r10
1467: (07) r3 += -296
; bpf_map_update_elem(&xfis, &z, F, BPF_ANY);
1468: (18) r1 = 0xffff8dca7928ba00
1470: (b7) r4 = 0
1471: (85) call bpf_map_update_elem#2
; bpf_tail_call(ctx, &pgm_tbl, idx);
1472: (bf) r1 = r6
1473: (18) r2 = 0xffff8dca7b94d200
1475: (b7) r3 = 1
1476: (85) call bpf_tail_call#12
tail_calls are not allowed in programs with bpf-to-bpf calls

Support for SCTP connection tracking

Most telco/3GPP systems and frameworks use SCTP protocol. We need to implement stateful conntracking for SCTP and support proper load-balancing of the same

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.