Code Monkey home page Code Monkey logo

data-on-eks's Introduction

Data on EKS

(Pronounced: "Do.eks")

πŸ’‘ Optimized Solutions for Data and AI on EKS

Build, Scale, and Optimize Data & AI/ML Platforms on Amazon EKS πŸš€

Welcome to Data on EKS, your gateway to scaling Data and AI workloads on Amazon EKS. Unlock the potential of Gen AI with a rich collection of Terraform Blueprints featuring best practices for deploying robust solutions with advanced logging and observability.

Explore practical examples and patterns for running Data workloads on EKS using advanced frameworks such as Apache Spark for distributed data processing, Apache Flink for real-time stream processing, and Apache Kafka for high-throughput distributed messaging. Automate and orchestrate complex workflows with Apache Airflow and leverage the robust capabilities of Amazon EMR on EKS to build resilient clusters, seamlessly integrating Kubernetes with big data solutions for enhanced scalability and performance.

On the AI/ML front, Explore practical patterns for running AI/ML workloads on EKS, leveraging the power of the Ray ecosystem for distributed computing. Utilize advanced serving solutions like NVIDIA Triton Server, vLLM for efficient and scalable model inference, and TensorRT-LLM for optimizing deep learning models.

Take advantage of high-performance NVIDIA GPUs for intensive computational tasks and leverage AWS’s specialized hardware, including AWS Trainium for efficient model training and AWS Inferentia for cost-effective model inference at scale.

Note: DoEKS is in active development. For upcoming features and enhancements, check out the issues section.

πŸ—οΈ Architecture

The diagram below showcases the wide array of open-source data tools, Kubernetes operators, and frameworks used by DoEKS. It also highlights the seamless integration of AWS Data Analytics managed services with the powerful capabilities of DoEKS open-source tools.

image

🌟 Features

Data on EKS(DoEKS) solution is categorized into the following focus areas.

🎯 Data Analytics on EKS

🎯 AI/ML on EKS

🎯 Streaming Platforms on EKS

🎯 Scheduler Workflow Platforms on EKS

🎯 Distributed Databases & Query Engine on EKS

πŸƒβ€β™€οΈGetting Started

In this repository, you'll find a variety of deployment blueprints for creating Data/ML platforms with Amazon EKS clusters. These examples are just a small selection of the available blueprints - visit the DoEKS website for the complete list of options.

🧠 AI

πŸš€ Trainium-Inferentia on EKS πŸ‘ˆ This blueprint used for running Gen AI models on AWS Neuron accelerators.

πŸš€ JARK-Stack on EKS πŸ‘ˆ This blueprint deploys JARK stack for AI workloads with NVIDIA GPUs.

πŸš€ JupyterHub on EKS πŸ‘ˆ This blueprint deploys a self-managed JupyterHub on EKS with Amazon Cognito authentication.

πŸš€ Generative AI on EKS πŸ‘ˆ Collection of Generative AI Training and Inference LLM deployment patterns

πŸ“Š Data

πŸš€ EMR-on-EKS with Karpenter πŸ‘ˆ Start here if you are new to EMR on EKS. This blueprint deploys EMR on EKS cluster and uses Karpenter to scale Spark jobs.

πŸš€ Spark Operator with Apache YuniKorn on EKS πŸ‘ˆ This blueprint deploys EKS cluster and uses Spark Operator and Apache YuniKorn for running self-managed Spark jobs

πŸš€ Self-managed Airflow on EKS πŸ‘ˆ This blueprint sets up a self-managed Apache Airflow on an Amazon EKS cluster, following best practices.

πŸš€ Argo Workflows on EKS πŸ‘ˆ This blueprint sets up a self-managed Argo Workflow on an Amazon EKS cluster, following best practices.

πŸš€ Kafka on EKS πŸ‘ˆ This blueprint deploys a self-managed Kafka on EKS using the popular Strimzi Kafka operator.

πŸ“š Documentation

For instructions on how to deploy Data on EKS patterns and run sample tests, visit the DoEKS website.

πŸ† Motivation

Kubernetes is a widely adopted system for orchestrating containerized software at scale. As more users migrate their data and machine learning workloads to Kubernetes, they often face the complexity of managing the Kubernetes ecosystem and selecting the right tools and configurations for their specific needs.

At AWS, we understand the challenges users encounter when deploying and scaling data workloads on Kubernetes. To simplify the process and enable users to quickly conduct proof-of-concepts and build production-ready clusters, we have developed Data on EKS (DoEKS). DoEKS offers opinionated open-source blueprints that provide end-to-end logging and observability, making it easier for users to deploy and manage Spark on EKS, Kubeflow, MLFlow, Airflow, Presto, Kafka, Cassandra, and other data workloads. With DoEKS, users can confidently leverage the power of Kubernetes for their data and machine learning needs without getting overwhelmed by its complexity.

🀝 Support & Feedback

DoEKS is maintained by AWS Solution Architects and is not an AWS service. Support is provided on a best effort basis by the Data on EKS Blueprints community. If you have feedback, feature ideas, or wish to report bugs, please use the Issues section of this GitHub.

πŸ” Security

See CONTRIBUTING for more information.

πŸ’Ό License

This library is licensed under the Apache 2.0 License.

πŸ™Œ Community

We welcome all individuals who are enthusiastic about data on Kubernetes to become a part of this open source community. Your contributions and participation are invaluable to the success of this project.

Built with ❀️ at AWS.

data-on-eks's People

Contributors

vara-bonthu avatar askulkarni2 avatar ovaleanu avatar ratnopamc avatar lusoal avatar lmouhib avatar dependabot[bot] avatar lindarr915 avatar alanty avatar raykrueger avatar dalbhanj avatar github-actions[bot] avatar bryantbiggs avatar alyibrahim avatar hustshawn avatar yarikoptic avatar victorgu-github avatar jihed avatar codesometech avatar sanjeevrg89 avatar nabuskey avatar asmacdo avatar bbgu1 avatar melodyyangaws avatar rajarshighosal avatar omrishiv avatar jaradtke-aws avatar wahab-io avatar senkinnar avatar youngjeong46 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.