Light

prakshi-epi / hands-on-spark-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sanchitaharlalka/hands-on-spark

0.0 1.0 0.0 5.57 MB

This repository contains the whole content for a introductory spark session.

License: MIT License

Jupyter Notebook 100.00%

hands-on-spark-1's Introduction

Installation

Visit Apache Spark Downloads page

http://spark.apache.org/downloads.html
Select following options
1. Choose a Spark release: 2.2.x or greater (I'll be using 2.2.1)
2. Choose a package type: Pre-built for Apache Hadoop 2.7 and later
3. Download Spark: spark-2.2.1-bin-hadoop2.7.tgz
Download that tar compressed file to your local machine.
After downloading the compressed file, unzip it to desired location:

$ tar -xvzf spark-2.2.1-bin-hadoop2.7.tgz -C /home/prakshi/spark-2.2.1/
Setting up the environment for Spark:

To set up environment variable:

Add following lines to your ~/.bashrc
```
export SPARK_HOME=/home/prakshi/spark-2.2.1/
export PATH=$SPARK_HOME/bin:$PATH
```
Make sure you change the path in SPARK_HOME as per your spark software file are located. Reload your ~/.bashrc file using:
```
$ source ~/.bashrc
```
That's all! Spark has been set-up. Try running pyspark command to use Spark from Python.

Pyspark in Jupyter Notebook

Two methods to do so.

Configure PySpark driver Update PySpark driver environment variables: add these lines to your ~/.bashrc (or ~/.zshrc) file.
```
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
```
Restart your terminal and launch PySpark again:

$ pyspark

Now, this command should start a Jupyter Notebook in your web browser.
Using findspark module

findSpark package is not specific to Jupyter Notebook, you can use this trick in your favorite IDE too.

To install findspark:

$ pip install findspark

Irrespective of Jupyter notebook/Python script all you need to do to use spark is add following line in your code:
```
import findspark
findspark.init()
```

hands-on-spark-1's People

Contributors

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.