Hi, It seems that the Jupyter initialization s are failing to

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Jupyter setup failure for Dataproc 1.1 (Spark 2.0),about googleclouddataproc/initialization-actions

Comments (4)

gilmar commented on June 8, 2024

Ok, the problem seems to be that Spark 2.0 changed the version of Py4J that is referenced in jupyter/kernels/pyspark/kernel.json
I was able to make it work just by applying this change:

<     "PYTHONPATH": "/usr/lib/spark/python/:/usr/lib/spark/python/lib/py4j-0.9-src.zip",
---
>     "PYTHONPATH": "/usr/lib/spark/python/:/usr/lib/spark/python/lib/py4j-0.10.1-src.zip",

I'm don't know it is the only broken reference as I didn't do an extensive test.

from initialization-actions.

mobcdi commented on June 8, 2024

Hi @gilmar where did you make the change (master?,workers?) and would you be able to walk me though the steps?

import pyspark
sc.version

If I run the code above on an SSH terminal to the master node, it returned u'2.0.0'
When I run the code above in a new pyspark workbook with a fresh kernel started.

I get the error you mentioned and then if I run the cell again I get this 1

ImportErrorTraceback (most recent call last)
<ipython-input-2-cc2b46586f8c> in <module>()
----> 1 import pyspark
      2 sc.version

/usr/lib/spark/python/pyspark/__init__.py in <module>()
     42 
     43 from pyspark.conf import SparkConf
---> 44 from pyspark.context import SparkContext
     45 from pyspark.rdd import RDD
     46 from pyspark.files import SparkFiles

/usr/lib/spark/python/pyspark/context.py in <module>()
     26 from tempfile import NamedTemporaryFile
     27 
---> 28 from pyspark import accumulators
     29 from pyspark.accumulators import Accumulator
     30 from pyspark.broadcast import Broadcast

ImportError: cannot import name accumulators

from initialization-actions.

gilmar commented on June 8, 2024

Hi @mobcdi ,

I was doing a local test. But I have just created a pull request.
Meanwhile, you can try to use the script below as your initialization script. The only difference from the original is that this one is pointing to my fork instead of this repo.
Just move it to your GCP bucket and point to it when creating your cluster, like this:
--initialization-actions gs://$YOUR_BUCKET/jupyter.sh


#!/usr/bin/env bash
set -e

ROLE=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-role)
INIT_ACTIONS_REPO=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/INIT_ACTIONS_REPO || true)
INIT_ACTIONS_REPO="${INIT_ACTIONS_REPO:-https://github.com/gilmar/dataproc-initialization-actions.git}"
INIT_ACTIONS_BRANCH=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/INIT_ACTIONS_BRANCH || true)
INIT_ACTIONS_BRANCH="${INIT_ACTIONS_BRANCH:-master}"
DATAPROC_BUCKET=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-bucket)

echo "Cloning fresh dataproc-initialization-actions from repo $INIT_ACTIONS_REPO and branch $INIT_ACTIONS_BRANCH..."
git clone -b "$INIT_ACTIONS_BRANCH" --single-branch $INIT_ACTIONS_REPO
# Ensure we have conda installed.
./dataproc-initialization-actions/conda/bootstrap-conda.sh
#./dataproc-initialization-actions/conda/install-conda-env.sh

source /etc/profile.d/conda_config.sh
if [[ "${ROLE}" == 'Master' ]]; then
    conda install jupyter
    if gsutil -q stat "gs://$DATAPROC_BUCKET/notebooks/**"; then
        echo "Pulling notebooks directory to cluster master node..."
        gsutil -m cp -r gs://$DATAPROC_BUCKET/notebooks /root/
    fi  
    ./dataproc-initialization-actions/jupyter/internal/setup-jupyter-kernel.sh
    ./dataproc-initialization-actions/jupyter/internal/launch-jupyter-kernel.sh
fi
echo "Completed installing Jupyter!"

# Install Jupyter extensions (if desired)
# TODO: document this in readme
if [[ ! -v $INSTALL_JUPYTER_EXT ]]
    then
    INSTALL_JUPYTER_EXT=false
fi
if [[ "$INSTALL_JUPYTER_EXT" = true ]]
then
    echo "Installing Jupyter Notebook extensions..."
    ./dataproc-initialization-actions/jupyter/internal/bootstrap-jupyter-ext.sh
    echo "Jupyter Notebook extensions installed!"
fi

from initialization-actions.

grivescorbett commented on June 8, 2024

I'm getting the "ImportError: cannot import name accumulators" error as well. Has anyone solved this?

from initialization-actions.

Jupyter setup failure for Dataproc 1.1 (Spark 2.0) about initialization-actions HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent