Code Monkey home page Code Monkey logo

kubegateway's Introduction

KubeGateway

English | 简体中文

Overview

Kube-gateway is a best practice for managing massive kubernetes clusters within ByteDance.

It is a layer 7 load balancing proxy specifically designed and customized for HTTP2 flow for kube-apiserver.

The goal is to provide flexible and stable flow governance solutions for massive large-scale kubernetes clusters (more than 1,000 nodes).

Features

In terms of traffic governance:

  • It proactively performs request-level load balancing for multiple kube-apiservers;
  • It provides kube-apiserver with routing rules customized for flow characteristics. It can distinguish requests through verb, apiGroup, resource, user, userGroup, serviceAccounts, nonResourceURLs and other information, and perform differentiated forwarding. It also has flow governance functions such as limited flow, degradation, and fuse;
  • It converges the number of TCP connections on a single kube-apiserver instance by at least an order of magnitude;
  • Its configuration, such as routing, takes effect immediately without restarting the service.

In terms of massive cluster proxies:

  • It is able to dynamically add and remove proxy support for new clusters;
  • It provides different TLS certificates and ClientCA for different clusters;
  • It provides allow/disable list, monitoring alarm, fuse and other functions.

Detailed Doc

Contributing

Please refer to Contributing

Code of Conduct

Please refer to Code of Conduct for more details.

Contact Us

Please refer to Maintainers

Security

If you find a potential security issue in this project, or think you may have discovered a security issue.

We hope you notify Bytedance Security via our Security Center or Vulnerability Report Email.

Please do not create a public GitHub issue.

License

This project follows Apache-2.0 License.

kubegateway's People

Contributors

drinktee avatar kubemeta avatar riete avatar sof3 avatar weiyuanke avatar xuchen-xiaoying avatar xuqingyun avatar ytghost avatar zaunist avatar zoumo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubegateway's Issues

[bug] proxy metrics api return error

What happened:

get proxy metric from proxy port:
curl -k https://127.0.0.1:6443/metrics

metrics api return 503 error:
`{
"kind": "Status",
"apiVersion": "v1",
"metadata": {

},
"status": "Failure",
"message": "the request cluster(127.0.0.1) is not being proxied",
"reason": "ServiceUnavailable",
"code": 503
}`

What you expected to happen:

return proxy metrics

支持kubegateway的单实例分批断开长连接

当kubegateway某个实例异常或者重建时,该实例上建立的list/watch 长连接会一次性断开重新连接新的实例,当k8s集群的list/watch数量比较大时,这个kubegateway实例重启的那一刻,kube-apiserver的cpu使用率和负载会飙高,kubegateway能不能支持重建pod实例分批断开连接之后再重启?

连接后端apiserver走https的时候能否不写死tlsClientConfig里面的name

What would you like to be added:

连接后端apiserver走https的时候能否不写死tlsClientConfig里面的name

Why is this needed:
pkg/clusters/util.go 49

	if httpScheme == "https" {
		tlsCfg := rest.TLSClientConfig{
			ServerName: cluster.Name,
			KeyData:    cluster.Spec.ClientConfig.KeyData,
			CertData:   cluster.Spec.ClientConfig.CertData,
			CAData:     cluster.Spec.ClientConfig.CAData,
			Insecure:   cluster.Spec.ClientConfig.Insecure,
		}
		cfg.TLSClientConfig = tlsCfg
	}

从代码看,会去设置ServerName为cluster.Name, 在我们的应用场景中会造成而外的问题。
我们的场景大致如下 k8s service1/service2/...-> kubegateway -> proxy service <----tunnel----> cluster1 apiserver/cluster2 apiserver/....
最终是访问的service名称,通过service名称和CR里面的name关联,同时在kubegateway里面创建的CR的endpoints是proxy service的名称(我们这里就叫kubernetes),通过不同的端口到不同的apiserver,这样的话由于代理的域名默认就在k8s证书的dns里面,正常情况不需要对已有集群去重新签发证书,但kubegateway这里访问的时候去设置了tlsClientConfig.ServerName,就会改变 http host,造成后端apiserver域名验证不通过,解决办法一个CR里面Insecure=true忽略服务端证书校验,或者把对应的service名称签进去(这个不太好对现有集群操作),再或者走http,所以这里是否可以不强制写死,提供个可选项配置。

kubegateway如何选择cluster?

背景

  • 场景:kube-apiserver挂载在4层LB后边,kubeConfig中的server为LB;
  • kubegateway通过kubeConfig中server.host == UpstreamCluster.name进行k8s cluster的选择。kubeconfig 中的 server 填入的是 upstream cluster 的 domain,这个 domain 的 DNS 解析到gateway,gateway 通过这个 domain 来区分不同的集群;

问题

针对上述LB场景(不想dns适配),如何进行集群选择?

[bug] is not registered in FeatureGate

E0406 08:50:05.173727 1 runtime.go:78] Observed a panic: &errors.errorString{s:"feature "AllAlpha=true|false (ALPHA - default=false)" is not registered in FeatureGate """} (feature "AllAlpha=true|false (ALPHA - default=false)" is not registered in FeatureGate "")

小哥们晚好,请教一个问题,困了两天了。

同样的文件同样的启动方式。
差异只是一个命令行启动,一个是用deploy控制器创建的pod;
我在向pod运行的kube-gateway中9443端口应用upstreamcluster时提示了上边的错,kuba-gateway的pod所在集群版本是1.18.9。
但在主机中以命令的形式启动的时候一切正常,不知这个问题你们有遇到过吗?应该如何解决呢?或者有没有好的解决方法?

有解决方案是升级k8s版本,还有一种是开启ALLAlpha特性门,但是我还是想知道为何命令行和容器运行的结果会不同?

多有打扰,由于第一次使用issue,语言和格式上的不当之处还请见谅;
@zoumo
谢谢,盼复。

informer relist when upstream apiserver restart

What would you like to be added:
send goaway to client so that informer will re-watch rather than relist

Why is this needed:
upstream apiserver send goaway when restart, but reverse proxy panic when long running connection break, client receives code 500 and informer will relist

使用featuregate annotation熔断集群,无法回滚

背景

集群降级熔断功能:使用featuregate,给upstreamCluster增加annotation:【proxy.kubegateway.io/feature-gates: DenyAllRequests=true】

现象

将上述annotation remove后,upstreamCluster仍旧处于熔断中

Global ratelimiter

Description

The current rate limiting method of KubeGateway is local rate limiting for each instance, which does not require additional dependencies and is simple to implement, but it also has some issues:

  • The quotas are inaccurate. Each gateway instance limits based on its own quota. The HTTP/2 long connections between the client and gateway may cause requests to concentrate on certain gateway instances, resulting in clients receiving less quota than the total configured rate limiting quota.

  • The precision of rate limiting thresholds is poor. When the number of gateway instances is scaled up, the total rate limiting quota for all instances increases, so it is necessary to readjust the threshold for each instance.. For requests with small rate limiting thresholds like "list," it is difficult to precisely limit the flow.

  • The round-robin load balancing strategy cannot guarantee strict balance of requests to backend apiserver instances. Even slight deviations in requests for requests like "full list" can put significant pressure on the apiserver.

To address the above issues, we can integrate a global rate limiting center to implement a global rate limiting strategy. The gateway supports both local rate limiting and integration with the rate limiting center. When the rate limiting center is unavailable, the local rate limiting capability serves as a fallback. The rate limiting center is a weak dependency of KubeGateway. During data center construction, local rate limiting capability is used first, and integration with the rate limiting center is done once its deployment is completed.

支持设置默认的UpstreamCluster,不根据UpstreamCluster名称匹配转发

目前要求UpstreamCluster的名称和kubeconfig的cluster的server地址的域名相同表示转发到对应的k8s集群的apiserver地址,对于一个kubegateway只代理一个upstreamcluster时,如果能支持代理转发到默认的upstreamcluster,只需要将上游k8s集群的kubeconfig的server地址设置为kubegateway的地址,就可以走kubegateway代理,即原kubeconfig只需要变更server地址就可以切换是直连apiserver还是通过kubegateway代理,这样就不需要专门为kubegateway代理专门申请一个新的域名和证书,是不是很方便呢?

[bug] upstream do first health check 5s later after gateway starting

What happened:

upstream do first health check 5s later after gateway starting

What you expected to happen:

upstream do first health check immediately after gateway starting

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

upstream apiserver upgrading cause many 502 response

What happened:
upstream apiserver endpoint health check period is 5s, when upstream apiserver is upgrading, many 502 response during health check interval

What you expected to happen:
when upstream apiserver is upgrading, "connection refused" error can be captured and trigger health check immediately

How to reproduce it (as minimally and precisely as possible):

如何实现apiserver的watch代理?

我在使用proxy := httputil.NewSingleHostReverseProxy(apiServerURL)对apiserver进行代理时,当收到watch=true的请求时,报错httputil: ReverseProxy read error during body copy: context deadline exceeded,我看kubegateway项目是支持的,万分感谢您的回复

How does kube-gateway downgrade

What would you like to be added:
I don't see a degraded configuration,that is implemented by dispatchPolicies. rules?
Why is this needed:

Failed to run "./hack/local-up.sh" on Mac

What happened:
when running ./hack/local-up.sh on Mac, following errors occurred:
(1)failed to start kubegateway StatefulSet

$kubectl get pods
NAME            READY   STATUS   RESTARTS       AGE
kubegateway-0   0/1     Error    5 (100s ago)   3m3s

[[email protected] ~]
$kubectl logs kubegateway-0
exec /usr/local/bin/kube-gateway: no such file or directory

(2)invalid options for base64 command

service/kubegateway unchanged
clusterrole.rbac.authorization.k8s.io/kubegateway unchanged
clusterrolebinding.rbac.authorization.k8s.io/kubegateway unchanged
base64: invalid option -- w
Usage:	base64 [-Ddh] [-b num] [-i in_file] [-o out_file]
  -b, --break    break encoded string into num character lines
  -Dd, --decode   decodes input
  -h, --help     display this message
  -i, --input    input file (default: "-" for stdin)
  -o, --output   output file (default: "-" for stdout)
base64: invalid option -- w
Usage:	base64 [-Ddh] [-b num] [-i in_file] [-o out_file]
  -b, --break    break encoded string into num character lines
  -Dd, --decode   decodes input
  -h, --help     display this message
  -i, --input    input file (default: "-" for stdin)
  -o, --output   output file (default: "-" for stdout)

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release): Mac
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:

通过kubernetes service连接apiserver的组件如何批量切换kubegateway代理

很多组件(开源组件,自定义控制器以及informer类组件)默认是通过kubernetes.default的service访问apiserver的,如果要保留默认的kubernetes service连接apiserver地址不变,有些组件要切换为kubegateway代理,是不是需要针对每个组件配置apisever地址,如果有些组件没有提供apiserver配置参数,还需要做一些改造,针对这类场景有没最佳实践建议?

本地启动kube-gateway proxy到既有集群

期望效果

本地启动的kube-gateway,proxy到既有集群(非本地)

过程

仿照local-up.sh的相关步骤,生成的upStreamCluster.name 和 kubeConfig中的severHost均为"localhost"

报错

E1123 20:00:55.545031 2984 upstream_controller.go:217] upstream health check failed, cluster="localhost" endpoint="https://x.x.x.x:6443" reason="Failure" message="Get "https://x.x.x.x:6443/healthz?timeout=5s\": x509: certificate is valid for xx, kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, not localhost"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.