Code Monkey home page Code Monkey logo

terraform-aws-eks-data-addons's Introduction

Terraform Module: ๐Ÿš€ Data & AI/ML Kubernetes Add-ons โš™๏ธ

This Terraform module contains commonly used Data & AI/ML related Kubernetes add-ons that are typically included in Data on EKS blueprints. The purpose of this module is to provide users with the flexibility to select and customize the add-ons they require while leveraging the Data on EKS blueprints.

โš ๏ธ Important Note

Users can consume this Terraform module in their projects to deploy any of the available addons. We will continue to maintain and update the existing Data/ML add-ons. However, we kindly request that you refrain from submitting Pull Requests (PRs) to add new addons at the moment, unless there is a supported blueprint available in the Data on EKS repository. The Apache and CNCF communities offer numerous open-source Data and ML add-ons, and while we appreciate their value, supporting all of them poses challenges.

Your understanding and cooperation are highly appreciated. ๐Ÿ™

Usage

Create Addon with the following example using Terraform registry. Checkout the complete example under test folder.

module "eks_data_addons" {
  source = "aws-ia/eks-data-addons/aws"
  version = "~> 1.0" # ensure to update this to the latest/desired version

  oidc_provider_arn = module.eks.oidc_provider_arn

  # Example to deploy AWS Neuron Device Plugin for Trainium and Inferentia instances
  enable_aws_efa_k8s_device_plugin = true

  # Example to deploy EFA K8s Device Plugin for GPU/Neuron instances
  enable_aws_efa_k8s_device_plugin = true

  # Example to deploy NVIDIA GPU Operator
  enable_nvidia_gpu_operator = true

  # Example to deploy Spark Operator Helm Chart
  enable_spark_opertor = true

  # Example to deploy Flink Operator Helm Chart
  enable_flink_operator = true

  # Example to deploy Apache YuniKorn Helm Chart
  enable_yunikorn = true

  # Example that uses ECR authentication for a particular registry ID
  enable_emr_spark_operator = var.enable_emr_spark_operator
  emr_spark_operator_helm_config = {
    repository_username = data.aws_ecr_authorization_token.token.user_name
    repository_password = data.aws_ecr_authorization_token.token.password
  }

  # Example to deploy Helm chart that uses IAM Role for ServiceAccounts. You can disable `create_irsa` and bring your own IAM role.
  enable_spark_history_server = var.enable_emr_spark_operator
  spark_history_server_helm_config = {
    create_irsa = true
    values = [
      templatefile("${path.module}/test/helm-values/spark-history-server-values.yaml", {
        s3_bucket_name   = module.s3_bucket.s3_bucket_id
        s3_bucket_prefix = aws_s3_object.this.key
      })
    ]
  }
}

Requirements

Name Version
terraform >= 1.0.0
aws >= 3.72
helm >= 2.4.1

Providers

Name Version
aws >= 3.72
helm >= 2.4.1

Modules

Name Source Version
spark_history_server_irsa ./irsa n/a

Resources

Name Type
helm_release.airflow resource
helm_release.aws_efa_k8s_device_plugin resource
helm_release.aws_neuron_device_plugin resource
helm_release.cnpg_operator resource
helm_release.dask_operator resource
helm_release.daskhub resource
helm_release.emr_flink_operator resource
helm_release.emr_spark_operator resource
helm_release.flink_operator resource
helm_release.jupyterhub resource
helm_release.karpenter_resources resource
helm_release.kubecost resource
helm_release.kuberay_operator resource
helm_release.mlflow_tracking resource
helm_release.nvidia_device_plugin resource
helm_release.nvidia_gpu_operator resource
helm_release.pinot resource
helm_release.spark_history_server resource
helm_release.spark_operator resource
helm_release.strimzi_kafka_operator resource
helm_release.trino resource
helm_release.volcano resource
helm_release.yunikorn resource
aws_partition.current data source
aws_region.current data source

Inputs

Name Description Type Default Required
airflow_helm_config Airflow Helm Chart config any {} no
aws_efa_k8s_device_plugin_helm_config EFA K8s Plugin add-on Helm Chart config any {} no
aws_neuron_device_plugin_helm_config AWS Neuron Device Plugin Helm Chart config any {} no
cnpg_operator_helm_config CloudNative PG Operator Helm Chart config any {} no
dask_operator_helm_config Dask Operator add-on configurations any {} no
daskhub_helm_config DaskHub add-on configurations any {} no
emr_flink_operator_helm_config Helm configuration for Flink Operator with EMR Runtime any {} no
emr_spark_operator_helm_config Helm configuration for Spark Operator with EMR Runtime any {} no
enable_airflow Enable Airflow add-on bool false no
enable_aws_efa_k8s_device_plugin Enable EFA K8s Plugin add-on bool false no
enable_aws_neuron_device_plugin Enable AWS Neuron Device Plugin add-on bool false no
enable_cnpg_operator Enable CloudNative PG Operator add-on bool false no
enable_dask_operator Enable Dask Operator add-on bool false no
enable_daskhub Enable DaskHub bool false no
enable_emr_flink_operator Enable the Flink Operator to run Flink application with EMR Runtime bool false no
enable_emr_spark_operator Enable the Spark Operator to submit jobs with EMR Runtime bool false no
enable_flink_operator Enable Flink Operator add-on bool false no
enable_jupyterhub Enable Jupyterhub Add-On bool false no
enable_karpenter_resources Enable Karpenter Resources (NodePool and EC2NodeClass) bool false no
enable_kubecost Enable Kubecost add-on bool false no
enable_kuberay_operator Enable Kuberay Operator add-on bool false no
enable_mlflow_tracking Enable MLflow Tracking add-on bool false no
enable_nvidia_device_plugin Enable NVIDIA Device Plugin add-on bool false no
enable_nvidia_gpu_operator Enable NVIDIA GPU Operator add-on bool false no
enable_pinot Enable Apache Pinot Add-On bool false no
enable_spark_history_server Enable Spark History Server add-on bool false no
enable_spark_operator Enable Spark on K8s Operator add-on bool false no
enable_strimzi_kafka_operator Enable the Strimzi Kafka Operator bool false no
enable_trino Enable Trino add-on bool false no
enable_volcano Enable volcano scheduler add-on bool false no
enable_yunikorn Enable Apache YuniKorn K8s scheduler add-on bool false no
flink_operator_helm_config Flink Operator Helm Chart config any {} no
jupyterhub_helm_config Helm configuration for JupyterHub any {} no
karpenter_resources_helm_config Karpenter Resources Helm Chart config any {} no
kubecost_helm_config Kubecost Helm Chart config any {} no
kuberay_operator_helm_config Helm configuration for Kuberay Operator any {} no
mlflow_tracking_helm_config MLflow Tracking add-on Helm Chart config any {} no
nvidia_device_plugin_helm_config NVIDIA Device Plugin Helm Chart config any {} no
nvidia_gpu_operator_helm_config Helm configuration for NVIDIA GPU Operator any {} no
oidc_provider_arn The ARN of the cluster OIDC Provider string n/a yes
pinot_helm_config Apache Pinot Helm Chart config any {} no
spark_history_server_helm_config Helm configuration for Spark History Server any {} no
spark_operator_helm_config Helm configuration for Spark K8s Operator any {} no
strimzi_kafka_operator_helm_config Helm configuration for Strimzi Kafka Operator any {} no
trino_helm_config Trino Helm Chart config any {} no
volcano_helm_config Volcano scheduler add-on configurations any {} no
yunikorn_helm_config Helm configuration for Apache YuniKorn any {} no

Outputs

Name Description
airflow Airflow Helm Chart metadata
aws_efa_k8s_device_plugin AWS EFA K8s Plugin Helm Chart metadata
aws_neuron_device_plugin AWS Neuron Device Plugin Helm Chart metadata
dask_hub Dask Hub Helm Chart metadata
dask_operator Dask Operator Helm Chart metadata
emr_spark_operator EMR Spark Operator Helm Chart metadata
flink_operator Flink Operator Helm Chart metadata
jupyterhub Jupyterhub Helm Chart metadata
kubecost Kubecost Helm Chart metadata
kuberay_operator Kuberay Operator Helm Chart metadata
nvidia_gpu_operator Nvidia GPU Operator Helm Chart metadata
pinot Apache Pinot Helm Chart metadata
spark_history_server Spark History Server Helm Chart metadata
spark_operator Spark Operator Helm Chart metadata
strimzi_kafka_operator Strimzi Kafka Operator Helm Chart metadata
volcano Volcano Batch Scheduler Helm Chart metadata
yunikorn Yunikorn Helm Chart metadata

terraform-aws-eks-data-addons's People

Contributors

alanty avatar askulkarni2 avatar jihed avatar lusoal avatar ovaleanu avatar tbulding avatar vara-bonthu avatar wahab-io avatar youngjeong46 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-aws-eks-data-addons's Issues

[feature] update EFA plugin addon to use eks-charts location instead of local copy

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Currently the EFA Device plugin chart is copied locally to this repo and then referenced in the addon configuration
https://github.com/aws-ia/terraform-aws-eks-data-addons/blob/main/aws-efa-k8s-device-plugin.tf#L6

  chart                      = try(var.aws_efa_k8s_device_plugin_helm_config["chart"], "${path.module}/helm-charts/aws-efa-k8s-device-plugin")

This chart is now being maintained in the eks-charts repo: https://github.com/aws/eks-charts/tree/master/stable/aws-efa-k8s-device-plugin

We should reference that helm repo instead of the local copy so it's easier to update.

Describe the solution you would like

Less management of "local" helm charts ๐Ÿ˜„

Describe alternatives you have considered

Additional context

While we use the eks-charts repo we should also upgrade the chart version as the default affinity includes the P5.48xlarge instance types now.

[Feature] Trino add-on

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Add Trino as a Data Add-on.

Describe the solution you would like

Add Trino helm chart with option for user configurations.

Describe alternatives you have considered

Additional context

Upgrade Apache Airflow Helm Chart to 1.10.0

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

Upgrade Apache Airflow Helm Chart to 1.10.0

Describe the solution you would like

Describe alternatives you have considered

Additional context

Github workflow E2E tests

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • Write a Github workflow to run tf apply and tf destroy when the PR is merged to main
  • tf apply should run on DoEKS AWS account and this requires github secrets config

What is the outcome that you are trying to reach?

Describe the solution you would like

Describe alternatives you have considered

Additional context

[Feature] MLflow Helm chart

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

What is the outcome that you are trying to reach?

MLflow Helm chart

Describe the solution you would like

Describe alternatives you have considered

Additional context

Remove S3 readonly policy from SHS

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Changes to be incorporated


1. Remove S3 readonly policy from Spark history server
2. Allow users to pass the IAM policy and ensure that policy is not null (verify create_irsa ==true and iam_polciy_arns!=null)
3. Leverage EKS Blueprints add-on to create IRSA and remove internal IRSA module

What is the outcome that you are trying to reach?

Describe the solution you would like

Describe alternatives you have considered

Additional context

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.