Code Monkey home page Code Monkey logo

docker-spark's Introduction

Spark client docker image

DockerPulls DockerStars

This repository contains a docker image to run Apache Spark client.

To run simple spark shell :

docker run -it epahomov/docker-spark:lightweighted /spark/bin/spark-shell

To run simple python spark shell (known as pyspark) :

docker run -it epahomov/docker-spark:lightweighted /spark/bin/pyspark

Examples before used lightweighted version of this image. It's very small, so it would download very fast, but it's not very flexible. All next examples would be with default version

To run simple spark R shell :

docker run -it epahomov/docker-spark /spark/bin/sparkR

To run simple spark sql shell :

docker run -it epahomov/docker-spark /spark/bin/spark-sql

To run simple spark shell with some changed properties like here :

docker run -it epahomov/docker-spark /spark/bin/spark-shell  --master local[4]

To run simple spark shell with changed spark-defaults.conf do:

printf "spark.master local[4] \nspark.executor.cores 4" > spark-defaults.conf
sudo docker run -v $(pwd)/spark-defaults.conf:/spark/conf/spark-defaults.conf -it epahomov/docker-spark /spark/bin/spark-shell

First line write conf into file spark-defaults.conf, and second line mount this file from host file system to filesystem in container and puts it in conf directory.

To be able to use spark ui, add " -p 4040:4040 " argument:

docker run -ti -p 4040:4040 epahomov/docker-spark /spark/bin/spark-shell

To run some python script do:

echo "import pyspark\nprint(pyspark.SparkContext().parallelize(range(0, 10)).count())" > count.py
docker run -it -p 4040:4040 -v $(pwd)/count.py:/count.py epahomov/docker-spark /spark/bin/spark-submit /count.py

Hadoop

With this image you can connect to Hadoop cluster from spark. All you need is specify HADOOP_CONF_DIR and pass directory with hadoop configs as volume

docker run -v $(pwd)/hadoop:/etc/hadoop/conf -e "HADOOP_CONF_DIR=/etc/hadoop/conf" --net=host  -it epahomov/docker-spark /spark/bin/spark-shell --master yarn-client

Versions

This container exists in next versions:

  • spark_2.0_hadoop_2.7
  • spark_2.0_hadoop_2.6
  • spark_2.1_hadoop_2.7
  • spark_2.1_hadoop_2.6
  • lightweighted - lightweighted version of this image. It's based on alpine linux and downloaded binary, not build from source with all possible plags(like -Pyarn).
  • old-spark - Old functionality with setting up spark cluster. Not supported, not recommended to use.

Master has version spark_2.1_hadoop_2.7

Zeppelin

This image is base image for Apache Zeppelin Image

docker-spark's People

Contributors

arian avatar epahomov avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.