Code Monkey home page Code Monkey logo

Comments (4)

gilmar avatar gilmar commented on June 8, 2024

Ok, the problem seems to be that Spark 2.0 changed the version of Py4J that is referenced in jupyter/kernels/pyspark/kernel.json
I was able to make it work just by applying this change:

<     "PYTHONPATH": "/usr/lib/spark/python/:/usr/lib/spark/python/lib/py4j-0.9-src.zip",
---
>     "PYTHONPATH": "/usr/lib/spark/python/:/usr/lib/spark/python/lib/py4j-0.10.1-src.zip",

I'm don't know it is the only broken reference as I didn't do an extensive test.

from initialization-actions.

mobcdi avatar mobcdi commented on June 8, 2024

Hi @gilmar where did you make the change (master?,workers?) and would you be able to walk me though the steps?

import pyspark
sc.version

If I run the code above on an SSH terminal to the master node, it returned u'2.0.0'
When I run the code above in a new pyspark workbook with a fresh kernel started.

I get the error you mentioned and then if I run the cell again I get this 1

ImportErrorTraceback (most recent call last)
<ipython-input-2-cc2b46586f8c> in <module>()
----> 1 import pyspark
      2 sc.version

/usr/lib/spark/python/pyspark/__init__.py in <module>()
     42 
     43 from pyspark.conf import SparkConf
---> 44 from pyspark.context import SparkContext
     45 from pyspark.rdd import RDD
     46 from pyspark.files import SparkFiles

/usr/lib/spark/python/pyspark/context.py in <module>()
     26 from tempfile import NamedTemporaryFile
     27 
---> 28 from pyspark import accumulators
     29 from pyspark.accumulators import Accumulator
     30 from pyspark.broadcast import Broadcast

ImportError: cannot import name accumulators

from initialization-actions.

gilmar avatar gilmar commented on June 8, 2024

Hi @mobcdi ,

I was doing a local test. But I have just created a pull request.
Meanwhile, you can try to use the script below as your initialization script. The only difference from the original is that this one is pointing to my fork instead of this repo.
Just move it to your GCP bucket and point to it when creating your cluster, like this:
--initialization-actions gs://$YOUR_BUCKET/jupyter.sh


#!/usr/bin/env bash
set -e

ROLE=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-role)
INIT_ACTIONS_REPO=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/INIT_ACTIONS_REPO || true)
INIT_ACTIONS_REPO="${INIT_ACTIONS_REPO:-https://github.com/gilmar/dataproc-initialization-actions.git}"
INIT_ACTIONS_BRANCH=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/INIT_ACTIONS_BRANCH || true)
INIT_ACTIONS_BRANCH="${INIT_ACTIONS_BRANCH:-master}"
DATAPROC_BUCKET=$(curl -f -s -H Metadata-Flavor:Google http://metadata/computeMetadata/v1/instance/attributes/dataproc-bucket)

echo "Cloning fresh dataproc-initialization-actions from repo $INIT_ACTIONS_REPO and branch $INIT_ACTIONS_BRANCH..."
git clone -b "$INIT_ACTIONS_BRANCH" --single-branch $INIT_ACTIONS_REPO
# Ensure we have conda installed.
./dataproc-initialization-actions/conda/bootstrap-conda.sh
#./dataproc-initialization-actions/conda/install-conda-env.sh

source /etc/profile.d/conda_config.sh
if [[ "${ROLE}" == 'Master' ]]; then
    conda install jupyter
    if gsutil -q stat "gs://$DATAPROC_BUCKET/notebooks/**"; then
        echo "Pulling notebooks directory to cluster master node..."
        gsutil -m cp -r gs://$DATAPROC_BUCKET/notebooks /root/
    fi  
    ./dataproc-initialization-actions/jupyter/internal/setup-jupyter-kernel.sh
    ./dataproc-initialization-actions/jupyter/internal/launch-jupyter-kernel.sh
fi
echo "Completed installing Jupyter!"

# Install Jupyter extensions (if desired)
# TODO: document this in readme
if [[ ! -v $INSTALL_JUPYTER_EXT ]]
    then
    INSTALL_JUPYTER_EXT=false
fi
if [[ "$INSTALL_JUPYTER_EXT" = true ]]
then
    echo "Installing Jupyter Notebook extensions..."
    ./dataproc-initialization-actions/jupyter/internal/bootstrap-jupyter-ext.sh
    echo "Jupyter Notebook extensions installed!"
fi

from initialization-actions.

grivescorbett avatar grivescorbett commented on June 8, 2024

I'm getting the "ImportError: cannot import name accumulators" error as well. Has anyone solved this?

from initialization-actions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.