Code Monkey home page Code Monkey logo

hands-on-spark-1's Introduction

Installation

  1. Visit Apache Spark Downloads page

    http://spark.apache.org/downloads.html

  2. Select following options

    1. Choose a Spark release: 2.2.x or greater (I'll be using 2.2.1)
    2. Choose a package type: Pre-built for Apache Hadoop 2.7 and later
    3. Download Spark: spark-2.2.1-bin-hadoop2.7.tgz

    Download that tar compressed file to your local machine.

  3. After downloading the compressed file, unzip it to desired location:

    $ tar -xvzf spark-2.2.1-bin-hadoop2.7.tgz -C /home/prakshi/spark-2.2.1/

  4. Setting up the environment for Spark:

    To set up environment variable:

    Add following lines to your ~/.bashrc

    export SPARK_HOME=/home/prakshi/spark-2.2.1/
    export PATH=$SPARK_HOME/bin:$PATH

    Make sure you change the path in SPARK_HOME as per your spark software file are located. Reload your ~/.bashrc file using:

    $ source ~/.bashrc
    
  5. That's all! Spark has been set-up. Try running pyspark command to use Spark from Python.

Pyspark in Jupyter Notebook

Two methods to do so.

  1. Configure PySpark driver Update PySpark driver environment variables: add these lines to your ~/.bashrc (or ~/.zshrc) file.

    export PYSPARK_DRIVER_PYTHON=jupyter
    export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

    Restart your terminal and launch PySpark again:

    $ pyspark

    Now, this command should start a Jupyter Notebook in your web browser.

  2. Using findspark module

    findSpark package is not specific to Jupyter Notebook, you can use this trick in your favorite IDE too.

    To install findspark:

    $ pip install findspark

    Irrespective of Jupyter notebook/Python script all you need to do to use spark is add following line in your code:

    import findspark
    findspark.init()

hands-on-spark-1's People

Contributors

prakshi-epi avatar prakshi24 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.