Code Monkey home page Code Monkey logo

medium-spark-k8s's Introduction

Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator

Repository accompanying the article on Medium:

http://bit.ly/spark-k8s

Intro:

For each challenge there are many technology stacks that can provide the solution. I’m not claiming this approach is the holy grail of data processing, but this more the tale of my quest to combine tools that are widely supported in a maintainable fashion. From the onset I’ve always tried to generate as much configuration as possible, because I’ve experienced it’s easy to drown in a sea of yaml-files, conf-files and incompatible versions in registries, repositories, CI/CD pipelines and deployments. What I created was a sbt script that when triggered build a fat-jar, wrapped it in a docker-file and generated an image, and also updating the helm values of a chart with the new config. The image is pushed to the registry, the helm chart is augmented with environmental settings and pushed to chart museum. I’ve deployed this both locally on minikube as remotely in Azure, but the Azure flow is maybe less generic to discuss, because the Azure Container Registry can be used for both the images and the helm charts. Also remote deployments are relying on terraform scripts and CI/CD pipelines that are too specific anyway. Do note that all infra is setup via brew on a mac. But it should be easy to find equivalents for other environments.

  1. Kubernetes
  2. Helm
  3. Image Registry
  4. Helm Chart Museum
  5. Spark Operator
  6. Spark App
  7. sbt setup
  8. Base Image setup
  9. Helm config
  10. Deploying
  11. Conclusion

Read more...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.