-
OS: Ubuntu 19.04, requirement software: spark 2.4.3, python3.7.3, jupyter notebook
-
required python packages: requirements.txt
$ pip install -r requirements.txt
-
edit $~/.bashrc file as below
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
-
execute pyspark
$ pyspark
We use 'data' directory in spark distribution source file.
It needs to adjust actual 'data' directory path in example codes.