Code Monkey home page Code Monkey logo

kubeflow-chart's Introduction

Kubeflow Chart

English Version Here

使用 Helm Chart 在多种环境更加快速安装和配置 Kubeflow

作为 kubeflow manifests 的另一种开源部署方式,您可以轻松快速的在任意环境(公有云,本地集群,minikube)之上部署并运行 Kubeflow。

开源版和企业版

参考 https://www.alauda.cn/open/detail/id/701.html 了解更多企业版信息和联系试用

相对于开源版本,企业版提供了更多,更完善的功能增强,包括:

  • 功能增强
    • 增强的集群分布式训练调度器
    • 增强的模型仓库
    • 集成/增强的 MLFlow 实验追踪
    • 使用 SQL 语言的建模工具:SQLFlow
    • 可视化拖拽开发环境
    • 使用工作流编排分布式训练任务
    • 完整汉化
    • 内置教程+案例notebook
    • 多环境适配、快速部署
    • 国产化硬件支持
  • 高性能
    • Intel Tensorflow 
    • NeuralCompressor
    • vGPU支持
  • 高可用
    • MLOps Control Plane 高可用
    • 推理服务高可用

快速安装(本地 Minikube):

  1. helm repo add alauda https://alauda.github.io/kubeflow-chart
  2. helm install kubeflow alauda/kubeflow

使用国内镜像源安装

使用 values-cn.yaml 覆盖安装镜像配置:

wget -O values-cn.yaml https://raw.githubusercontent.com/alauda/kubeflow-chart/main/values-cn.yaml
helm install kubeflow alauda/kubeflow -f values-cn.yaml

访问 Kubeflow 界面:

启动端口转发:

kubectl port-forward svc/istio-ingressgateway -n istio-system --address=0.0.0.0 8080:80

然后通过浏览器访问:http://localhost:8080/, 使用默认账号密码:[email protected], 12341234 即可登录。

使用 MLOps IDE

在目录 mlops-ide 下包含了构建社区版本的 MLOps IDE 的 Dockerfile。如您希望构建自己的 IDE 镜像,可以执行: docker build -f mlops-ide/Dockerfile . 构建镜像,也可以修改 Dockerfile 构建定制的镜像,比如支持 GPU + CUDA 的镜像,可以替换 FROM nvidia/cuda:11.4.3-devel-ubuntu20.04 即可。

同时我们预先构建了镜像:typhoon1986/mlops-ide:3.15.0,可以直接在创建 Notebook 时,勾选“自定义镜像”,并输入此镜像地址即可快速试用。进入 Jupyterlab 首页之后,可以看到对应功能已经启用,也可以在 settings 目录下选择界面语言:

注:社区版本 MLOps IDE 暂未支持流水线内编排分布式训练,可以关注此 PR 的进度:elyra-ai/elyra#3102

关于 Kubeflow 的使用,我们准备了一些帮助您快速上手的 Notebook 教程,可以将这些文件拖拽到 Notebook 环境中即可运行示例:

配置包含认证的私有镜像源:

如果您将镜像同步到一个私有镜像源,并包含认证,可以在 values.yaml 中增加如下认证信息配置:

global:
  imageCredentials: ""
  useRegistryCredentials: false
  registry: quay.io
  username: someone
  password: sillyness
  email: [email protected]
minio:
  useKubeflowImagePullSecrets: true

卸载 Kubeflow

执行命令 helm delete kubeflow 即可完成卸载。

在生产集群中部署 Kubeflow

在生产集群中部署 Kubeflow,通常需要根据当前集群环境信息,完成如下配置:

使用 HTTPS

Kubeflow 强依赖 HTTPS,只有使用 localhost 访问可以不使用 HTTPS,所以在使用 Minikube 快速部署时不需要配置 HTTPS 相关配置项。当需要配置 HTTPS 时,请配置 values.yaml 中的 tlsCrttlsKey 为 HTTPS 证书。

配置访问方式

  • 通过 port-forward 方式(不推荐):
    • 使用 HTTP: kubectl port-forward svc/istio-ingressgateway -n istio-system --address=0.0.0.0 8080:80, 然后访问执行该命令的服务器地址:http://ip/
    • 开启 HTTPS: kubectl port-forward svc/istio-ingressgateway -n istio-system --address=0.0.0.0 443:443, 然后访问执行该命令的服务器地址:https://ip/
  • 使用默认账号密码:[email protected], 12341234 即可登录。
  • 通过 node port 方式:查看 istio ingressgateway 服务是否开启了 nodeport:kubectl -n istio-system get svc istio-ingressgateway,根据这里 配置 nodeport 之后,即可访问。
  • 使用 Ingress: 集群中 Ingress 可用时,可以配置 values.yamlenableIngress: true, 并设置 kubeflowHost 为需要使用的访问域名,比如 kubeflowHost: "kubeflow.test.info"

配置 Dex 登录认证 (可选)

如果不使用本 chart 内置的 dex 部署,即需要连接到已有的 dex 部署,需要:

  1. 修改 dex: enabled: false
  2. 修改 values.yaml 下面的选项已联通您已有的 dex:
# 配置和认证服务 Dex 的联动
oidcAuthURL: /dex/auth
oidcProvider: http://dex.auth.svc.cluster.local:5556/dex
oidcRedirectURL: /login/oidc
skipAuthURI: "/dex"
useridClaim: email
useridHeader: kubeflow-userid
useridPrefix: "\"\""
oidcScopes: "profile email groups"

kubeflow-chart's People

Contributors

lanzhiwang avatar typhoonzero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubeflow-chart's Issues

knative-serving相关镜像拉不下来,希望开放一下

[root@k8s-master01 kubeflow]# kubectl get po -n knative-serving  
NAME                                READY   STATUS             RESTARTS   AGE
activator-568477768c-47f2n          1/2     ImagePullBackOff   0          59m
autoscaler-84bd65c959-qjld4         1/2     ImagePullBackOff   0          59m
controller-6f84dd5b45-6rr72         1/2     ImagePullBackOff   0          59m
istio-webhook-6dfc675577-qwdzh      1/2     ImagePullBackOff   0          59m
networking-istio-7dcc8666d5-975zw   1/2     ImagePullBackOff   0          59m
webhook-55679d59d4-vffj8            1/2     ImagePullBackOff   0          59m
  Warning  Failed     19m (x3 over 24m)     kubelet            Failed to pull image "registry.cn-hangzhou.aliyuncs.com/kubeflow-chart/activator:v0.22.1": rpc error: code = Unknown desc = failed to pull and unpack image "registry.cn-hangzhou.aliyuncs.com/kubeflow-chart/activator:v0.22.1": failed to resolve reference "registry.cn-hangzhou.aliyuncs.com/kubeflow-chart/activator:v0.22.1": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
  Warning  Failed     19m (x4 over 20m)     kubelet            Error: ImagePullBackOff
  Normal   Pulling    18m (x4 over 25m)     kubelet            Pulling image "registry.cn-hangzhou.aliyuncs.com/kubeflow-chart/activator:v0.22.1"
  Warning  Failed     18m (x4 over 24m)     kubelet            Error: ErrImagePull
  Normal   BackOff    4m59s (x64 over 20m)  kubelet            Back-off pulling image "registry.cn-hangzhou.aliyuncs.com/kubeflow-chart/activator:v0.22.1"

Knative serving version hpa to autoscaling/v2 as autoscaling/v2beta2 was deprecated in k8s 1.23

Hey I installed this chart but I found on my kubernetes 1.27.1 cluster that:

knative

I checked what versions were installed in my cluster, and found that autoscaling/v2beta2 is not present.

# kubectl api-versions
...
autoscaling/v1
autoscaling/v2
...

I think the easiest fix is to change the chart version of knative as newer versions fix this since it comes from knative. Also allowing for knative to be disabled to use one that is already in the cluster like the other resources would be excellent as its the only one that doesn't:

- name: knative-serving
version: "1.2.5"

I am going to fix these in my fork but I thought I would open an issue to let you know! Otherwise tho I hope you have a lovely weekend, and thanks for the helm chart, this is sorely needed for kubeflow!

dashboard no namespace show

版本信息

操作系统: ubuntu20.04
rancher: v2.5.2
k8s: v1.18.20
kubeflow-chart: kubeflow-1.5.1

报错

dashboard no namespace show

request

GET http://192.168.136.128:8080/api/workgroup/exists 403 (Forbidden) vendor.bundle.js:765 

response

{"error":{}}

request

GET http://192.168.136.128:8080/pipeline/apis/v1beta1/runs?page_size=5&sort_by=created_at%20desc&resource_reference_key.type=NAMESPACE&resource_reference_key.id=undefined 403 (Forbidden) vendor.bundle.js:765 

response

{
	"error": "Failed to authorize with namespace resource reference.: Failed to authorize with API resource references: PermissionDenied: User '[email protected]' is not authorized with reason:  (request: \u0026ResourceAttributes{Namespace:undefined,Verb:list,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:runs,Subresource:,Name:,}): Unauthorized access",
	"code": 7,
	"message": "Failed to authorize with namespace resource reference.: Failed to authorize with API resource references: PermissionDenied: User '[email protected]' is not authorized with reason:  (request: \u0026ResourceAttributes{Namespace:undefined,Verb:list,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:runs,Subresource:,Name:,}): Unauthorized access",
	"details": [{
		"@type": "type.googleapis.com/api.Error",
		"error_message": "User '[email protected]' is not authorized with reason:  (request: \u0026ResourceAttributes{Namespace:undefined,Verb:list,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:runs,Subresource:,Name:,})",
		"error_details": "Failed to authorize with namespace resource reference.: Failed to authorize with API resource references: PermissionDenied: User '[email protected]' is not authorized with reason:  (request: \u0026ResourceAttributes{Namespace:undefined,Verb:list,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:runs,Subresource:,Name:,}): Unauthorized access"
	}]
}

k8s 1.22部署问题

apiextensions.k8s.io/v1beta1 在k8s1.22版本已经被移除,导致很多的CRD字段不一样了,这样当前的很多CRD文件部署就会出错,希望官方 修改一个兼容1.22的版本出来。谢谢、

error: executing "kubeflow/templates/platform-agnostic-multi-user.yaml" at <.Values.mysql.host>: nil pointer evaluating interface {}.host

helm lint kubeflow/
==> Linting kubeflow/
[INFO] Chart.yaml: icon is recommended
[ERROR] templates/: template: kubeflow/templates/platform-agnostic-multi-user.yaml:2208:20: executing "kubeflow/templates/platform-agnostic-multi-user.yaml" at <.Values.mysql.host>: nil pointer evaluating interface {}.host

在values.yaml中添加 mysql:host:“*” 后

(base) [root@k8s-master charts]# helm lint kubeflow
==> Linting kubeflow
[INFO] Chart.yaml: icon is recommended
[ERROR] templates/: template: kubeflow/templates/platform-agnostic-multi-user.yaml:2559:9: executing "kubeflow/templates/platform-agnostic-multi-user.yaml" at <include "kubeflow.nodeAffinity" .>: error calling include: template: no template "kubeflow.nodeAffinity" associated with template "gotpl"

是我用的方式不对吗,求帮助

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists.

Install

mkdir -p ~/kubeflow_install/helm_chart_install
cd ~/kubeflow_install/helm_chart_install
wget -O values-cn.yaml https://raw.githubusercontent.com/alauda/kubeflow-chart/main/values-cn.yaml
helm install kubeflow alauda/kubeflow -f values-cn.yaml

Output

$ helm install kubeflow alauda/kubeflow -f values-cn.yaml
W0224 15:54:26.704541  154043 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W0224 15:54:26.999392  154043 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W0224 15:54:27.129135  154043 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W0224 15:54:27.151566  154043 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W0224 15:54:27.168133  154043 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W0224 15:54:27.206162  154043 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W0224 15:54:27.955386  154043 warnings.go:70] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
W0224 15:54:31.997890  154043 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0224 15:54:32.000794  154043 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0224 15:54:32.271321  154043 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole
W0224 15:54:32.484430  154043 warnings.go:70] rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding
W0224 15:54:32.992814  154043 warnings.go:70] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfiguration
Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: MutatingWebhookConfiguration "cache-webhook-kubeflow" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "kubeflow"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "default"

部署后,Volumes 报错: User "system:serviceaccount:kubeflow:volumes-web-app-service-account" cannot list resource "tensorboards" in API group "tensorboard.kubeflow.org" in the namespace "kubeflow-user-example-com"

[403] tensorboards.tensorboard.kubeflow.org is forbidden: User "system:serviceaccount:kubeflow:volumes-web-app-service-account" cannot list resource "tensorboards" in API group "tensorboard.kubeflow.org" in the namespace "kubeflow-user-example-com" http://ip:port/volumes/api/namespaces/kubeflow-user-example-com/tensorboards

这个pod启动失败,没有权限

istio-system authservice-0 0/1 CrashLoopBackOff 15 (3m16s ago) 13

authservice-0 Error opening bolt store: open /var/lib/authservice/data.db: permission denied

Flexible docker registry

Screenshot 2022-12-21 at 07 28 14

Hi, i found a hard configuration. I want to change this value by setting in Values.yaml when executing helm install command, but i'm stuck here. So you have any solution?

storage-initializer:v0.7.0镜像下载有问题

[root@localhost ~]# docker pull registry.cn-hangzhou.aliyuncs.com/kubeflow-chart/storage-initializer:v0.7.0
Error response from daemon: manifest for registry.cn-hangzhou.aliyuncs.com/kubeflow-chart/storage-initializer:v0.7.0 not found: manifest unknown: manifest unknown

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.