Code Monkey home page Code Monkey logo

sparkmonitor's Introduction

Build Status

Spark Monitor - An extension for Jupyter Notebook

Note: This project is now maintained at https://github.com/swan-cern/sparkmonitor

For the google summer of code final report of this project click here

About

+ =
SparkMonitor is an extension for Jupyter Notebook that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself.

jobdisplay

Features

  • Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
  • A table of jobs and stages with progressbars
  • A timeline which shows jobs, stages, and tasks
  • A graph showing number of active tasks & executor cores vs time
  • A notebook server extension that proxies the Spark UI and displays it in an iframe popup for more details
  • For a detailed list of features see the use case notebooks
  • How it Works

Quick Installation

pip install sparkmonitor
jupyter nbextension install sparkmonitor --py --user --symlink 
jupyter nbextension enable sparkmonitor --py --user            
jupyter serverextension enable --py --user sparkmonitor
ipython profile create && echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

For more detailed instructions click here

To do a quick test of the extension:

docker run -it -p 8888:8888 krishnanr/sparkmonitor

Integration with ROOT and SWAN

At CERN, the SparkMonitor extension would find two main use cases:

  • Distributed analysis with ROOT and Apache Spark using the DistROOT module. Here is an example demonstrating this use case.
  • Integration with SWAN, A service for web based analysis, via a modified container image for SWAN user sessions.

sparkmonitor's People

Contributors

abdealiloko avatar krishnan-r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sparkmonitor's Issues

when start kernel,will be throw the following error

when start kernel,will be throw the following error

[W 11:26:28.271 NotebookApp] 404 GET /api/kernels/0ed75691-8a30-42ba-a856-8fd1f4a07446/channels?session_id=5890403E360A4F228A55153B574F22CB (::1): Kernel does not exist: 0ed75691-8a30-42ba-a856-8fd1f4a07446
[W 11:26:28.281 NotebookApp] 404 GET /api/kernels/0ed75691-8a30-42ba-a856-8fd1f4a07446/channels?session_id=5890403E360A4F228A55153B574F22CB (::1) 21.33ms referer=None
[W 11:26:32.297 NotebookApp] Replacing stale connection: 0ed75691-8a30-42ba-a856-8fd1f4a07446:5890403E360A4F228A55153B574F22CB
[I 11:26:33.917 NotebookApp] Kernel started: 0e6e9856-8288-4cc3-ae2e-2c5dd3d0d226
[W 11:26:33.932 NotebookApp] 404 GET /nbextensions/sparkmonitor/module.js?v=20180301112625 (::1) 8.88ms referer=http://localhost:8888/notebooks/Untitled7.ipynb
[W 11:26:34.104 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20180301112625 (::1) 3.33ms referer=http://localhost:8888/notebooks/Untitled7.ipynb

Not Able to Access Spark UI through Monitor

Hello,

Amazing tool! Thank you. Everything is working great except I do not see the icon to open the Spark UI:

image

Here is my Spark configuration:

spark.driver.extraClassPath=/usr/local/tools/spark/sparkmonitor/jars/listener.jar
spark.driver.memory=5g
spark.eventLog.dir=lustre:///sparkLogging/2.4.4
spark.eventLog.enabled=true
spark.eventLog.permissions=777
spark.executor.heartbeatInterval=7500
spark.extraListeners=sparkmonitor.listener.JupyterSparkMonitorListener
spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=3d
spark.history.fs.logDirectory=lustre:///sparkLogging/2.4.4
spark.kryoserializer.buffer.max=128m
spark.master=local[*]
spark.network.timeout=10000000
spark.network.timetout=10000000
spark.rdd.compress=True
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.serializer.objectStreamReset=100
spark.sql.parquet.enableVectorizedReader=false
spark.sql.shuffle.partitions=108
spark.submit.deployMode=client
spark.ui.showConsoleProgress=true

I can access the UI through my browser, however. Any help would be appreciated.

Error loading server extension sparkmonitor.serverextension

My best guess is incompatibility with Tornado 6.

The root cause seems to be: AttributeError: module 'tornado.web' has no attribute 'asynchronous':

$ jupyter serverextension enable --py --sys-prefix sparkmonitor
Enabling: sparkmonitor.serverextension
- Writing config: /opt/anaconda3/etc/jupyter
    - Validating...
Error loading server extension sparkmonitor.serverextension
      X is sparkmonitor.serverextension importable?
$ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sparkmonitor.serverextension
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.7/site-packages/sparkmonitor/serverextension.py", line 24, in <module>
    class SparkMonitorHandler(IPythonHandler):
  File "/opt/anaconda3/lib/python3.7/site-packages/sparkmonitor/serverextension.py", line 27, in SparkMonitorHandler
    @tornado.web.asynchronous
AttributeError: module 'tornado.web' has no attribute 'asynchronous'
$ pip freeze | egrep "tornado|jupyter|spark"
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
jupyterlab==0.35.4
jupyterlab-server==0.2.0
sparkmonitor==0.0.9
tornado==6.0.2

I am trying to create a game ,and i have toinput the following files for the game to work,but it is showing this in the kernel

What do i do
I have also changed the file format from png to jpg
this is the error :-

File "C:\HT Python Gaming\untitled1.py", line 20, in
walkRight = [pygame.image.load('R1.jpg'), pygame.image.load('R2.jpg'), pygame.image.load('R3.jpg'), pygame.image.load('R4.jpg'), pygame.image.load('R5.jpg'), pygame.image.load('R6.jpg'), pygame.image.load('R7.jpg'), pygame.image.load('R8.jpg'), pygame.image.load('R9.jpg')]

error: Couldn't open R1.jpg

please help ..and respond asap

Python 3 Kernel Issue

Hello
I have added a python3 kernel to my jupyter docker image
Is it a way to have sparkmonitor working with both 2.x and 3.x ?

It works fine with a python 2 kernel, but when i switch to 3.x kernel, the conf test raise an error


print(conf.toDebugString())

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-0a5e403cf2b8> in <module>
----> 1 print(conf.toDebugString())

NameError: name 'conf' is not defined

Thanks for your help

Originally posted by @Ftagn92 in #1 (comment)

Error loading server extension

Hi,
I am using JupyterHub on AWS EMR v5.24
JupyterHub is installed inside a docker container from AWS.
It comes with:

  • jupyter_client v5.2.3
  • jupyter_core v4.4.0
  • jupyterhub v0.9.6

I try to install sparkmonitor within the container as follow

sudo docker exec jupyterhub bash -c "pip install sparkmonitor"
sudo docker exec jupyterhub bash -c "jupyter nbextension install sparkmonitor --py --user --symlink"
sudo docker exec jupyterhub bash -c "jupyter nbextension enable sparkmonitor --py --user"
sudo docker exec jupyterhub bash -c "jupyter serverextension enable --py --user sparkmonitor"

I get message - Validating: OK for the two nbextention lines but after the last line I get the error message:

Enabling: sparkmonitor.serverextension
- Writing config: /home/jovyan/.jupyter
    - Validating...
Error loading server extension sparkmonitor.serverextension
      X is sparkmonitor.serverextension importable?

I continue with

sudo docker exec jupyterhub bash -c "echo \"c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')\" >>  /home/jovyan/.ipython/profile_default/ipython_kernel_config.py"

When I run JupyterHub, I see a button "Toggle Spark Monitoring Displays" but I cannot see the monitoring module... any Idea ?

Scala support?

This looks neat, but the test notebook only shows example usage with pyspark. Does this work at all with Scala notebooks?

Test compatibility with PYSPARK_SUBMIT_ARGS

Based on the discussion at #6 (comment)

The extension is doing an import pyspark inside the extension. Which means, that if I as a jupyter user want to do something like:

import os

spark_pkgs=('com.amazonaws:aws-java-sdk:1.7.4',
            'org.apache.hadoop:hadoop-aws:2.7.3',
            'joda-time:joda-time:2.9.3',)

os.environ['PYSPARK_SUBMIT_ARGS'] = (
    '--packages {spark_pkgs} pyspark-shell'.format(spark_pkgs=",".format(spark_pkgs)))

import findspark
findspark.init()
import pyspark

spark = pyspark.sql.SparkSession.builder \
    .getOrCreate()

I cannot, because the PYSPARK_SUBMIT_ARGS environment variable will be created after the pyspark imported in the sparkmonitor module.

Using sparkmonitor for remote kernels in Jupyter Enterprise Gateway setup.

Hello,

We have a setup of jupyter where we are spawing ipython kernels remotely on a spark cluster via jupyter enterprise gateway setup.
This means that the kernel extension has to be installed on the remote machines before startup.
Although we have managed to have sparkmonitor installed on the remote machines, configuring the ipython_kernel_config.py does not seem possible for us on these remote machines.
Since these kernels are being launched using YARN as a resource manager providing a command line option to load this extension also does not seem to work.

I tried doing %load_extension post which i can load the extension and the conf variable shows all the relevant details. I have also installed sparkmonitor on my local machine for notebook ui extension and notebook server extension.

But although the spark jobs are executing properly, the extension does not seem to show the UI or display.

Do you have any ideas how can we fix this issue ?
Let me know if you need additional info. (we have tried installing this on local setup and things were working fine)

UI not visible in notebook

I'm running spark and jupyter notebook in a docker container and I configured the sparkmonitor extension is loaded in the config files. I am running a Spark job in the notebook using the example provided in the github repo. However, even though the job is completed and i can see the final results and the stages in the logs, the UI does not display the screen shots described. What could happen?

Disable logging to file?

In kernelextension.py and serverextension.py there are references to the sparkmonitor_kernelextension.log and sparkmonitor_serverextension.log files being created and logged to. It specifically mentions that it is for debugging the module. Can we get rid of the debug mode? These log files are cluttering all my folders... :P

    fh = logging.FileHandler("sparkmonitor_serverextension.log", mode="w")
    fh.setLevel(logging.DEBUG)
    formatter = logging.Formatter(
        "%(levelname)s:  %(asctime)s - %(name)s - %(process)d - %(processName)s - \
        %(thread)d - %(threadName)s\n %(message)s \n")
    fh.setFormatter(formatter)
    logger.addHandler(fh) ## Comment this line to disable logging to a file.```

Integration with nteract

nteract is a frontend for Jupyter that runs natively on the Desktop, on the web and in other places like within the Atom editor.

The nteract desktop app runs on electron and is implemented using React, Redux, RxJs, along with other libraries, also using typescript.

The goal of this feature is to implement support for SparkMonitor into nteract directly providing a seamless user experience for using Spark from nteract.

Work needs to be done on refactoring SparkMonitor to support nteract, improve jupyter protocol support for this use case and support Scala kernels.

This issue summarizes discussions on Slack with @rgbkrk and others at the nteract/spark_integration channel

Python3 compatible

Seems like when I use the library, I get a lot of the following in my logs:

[E 19:06:06.771 NotebookApp] Uncaught exception GET /sparkmonitor/static/timeline-view.js (::1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/sparkmonitor/static/timeline-view.js', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/Users/abdealijk/anaconda3/lib/python3.6/site-packages/tornado/web.py", line 1499, in _stack_context_handle_exception
        raise_exc_info((type, value, traceback))
      File "<string>", line 4, in raise_exc_info
      File "/Users/abdealijk/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 315, in wrapped
        ret = fn(*args, **kwargs)
      File "/Users/abdealijk/anaconda3/lib/python3.6/site-packages/sparkmonitor/serverextension.py", line 68, in handle_response
        "location.origin", "location.origin +'" + self.replace_path + "' ")
    TypeError: a bytes-like object is required, not 'str'
[E 19:06:06.774 NotebookApp] {
      "Host": "localhost:8888",
      "Connection": "keep-alive",
      "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36",
      "Accept": "*/*",
      "Referer": "http://localhost:8888/sparkmonitor/",
      "Accept-Encoding": "gzip, deflate, br",
      "Accept-Language": "en-US,en;q=0.9",
      "Cookie": "_ga=GA1.1.939770584.1522777592; username-localhost-8889=\"2|1:0|10:1524844138|23:username-localhost-8889|44:YTM2ZTA5MTY5ODBjNGZmYTlkMjU5NmMyZDg2ODMxZmI=|2a48a65b005e9bdee1e24a4998f5a0ecfa4403d19cd979a3999cd4c1cfc6d1e6\"; username-localhost-9990=\"2|1:0|10:1527410921|23:username-localhost-9990|44:YTYwOGU1YzY4Yjk1NDhkM2JhMWM0YzYxZTU5NTk4ZDA=|8e91be00c8de10fbb098a1e228d324a1a5d3437138d4f76b7bba04db0b159929\"; _xsrf=2|a8530e06|b62af1e2bf7923224299195c03d70803|1527410923; username-localhost-8888=\"2|1:0|10:1527427803|23:username-localhost-8888|44:YjQ1ZWRkYTE0ZjEyNDNhZGI3NDVlMmEzYjhiYmQ3Zjg=|b31f12794744caf6a4d3bdf59c5e8c64cd22c6d6c2a2e1f636fdbf92c69531f0\""
    }
[E 19:06:06.775 NotebookApp] 500 GET /sparkmonitor/static/timeline-view.js (::1) 14.24ms referer=http://localhost:8888/sparkmonitor/
SPARKMONITOR_SERVER: Request_path static/log-view.js
 Replace_path:/sparkmonitor

The bytes/str issue seems like a classic python2/python3 issue. Has this been tested with python3 ?

Things To Do

Issues and things to fix

  • When there are more than atleast 200 tasks to show, the timeline appearing and scrolling lags

    • This should depend on users browser and machine resources
    • TODO: Beyond a certain threshold hide individual tasks entirely.
      • This needs to be done in the backend listener itself for scalability.
  • Some jobs do not have names

    • For example when reading a parquet file, job name is null
    • TODO: Use first stage name instead as done in Spark UI
  • Timeline annotations do not appear when number of tasks is too huge.

    • Timeline loads asynchronously...
    • TODO: Fix this or add option for user to show annotations by toggling checkbox
  • Cases where spark application is started and stopped multiple times in the same cell causes conflict in display as job Ids and stageIds are duplicate

    • This could happen if jobs are called from an imported python script and context is stopped and started mutliple times.
    • TODO Either clear the previous applications display or append appId to each jobId/stageId to make it unique.
    • TODO Cases where a stage attempted again (never encountered this though)
  • When running multiple cells and an intermediate cell fails, further executions detect the wrong cell

    • Restart and Run All doesnt work
    • Cell Queue that is used to detect current cell needs to be cleared in frontend
    • Further execution requests are possibly discarded in the kernel.
    • TODO: How to detect this?
  • Error in some browsers like Internet Explorer, when frontend extension fails to load, python throws 'comm' error

    • TODO: Supress error
    • TODO: Replicate issue and identify possible causes

Pending Features

  • Handle skipped stages name and no: of tasks properly in the progressbars
  • Show failed tasks in red
    • In the timeline
    • In the table of jobs
    • Also show reason of failure.
  • Dynamically update executors in task graph
  • Aggregate no: active tasks over finite interval to make graph smoother
  • Add annotations to task graph regarding start and end of jobs
    • Change current charting library as annotations are not properly implemented.
  • Popup with more details when clicking on an items in the timeline
  • Ability to Cancel Jobs - The cancel button
    • TODO What is the right API to do this?
    • Using SparkContext
      • setJobGroup / cancelJobGroup
      • Currently there is no access to the SparkContext
      • Current communication mechanism prevents messages to kernel when kernel is busy.
    • However the Spark UI has an internal REST API to kill individual Jobs
      • This is the (kill) link that appears in the UI

Look and Feel

  • In Firefox prevent tables css from expanding rows to fill container.

  • Jquery UI dialog css styles conflicting with matplotlib output,

    • can be fixed
  • Add scrollbars to table when the number of jobs/stages is more.

  • Add a visual indicator that shows overall status of a cell - running/completed

  • Possibly show number of active executors somewhere as a number.

  • Display overall cell execution time somewhere

New Features

  • Add an option to remove display all together from a cell

    • For trivial operations like a read or viewing count/take, user may prefer to hide the display.
    • Maybe a global option to hide all displays
    • Respond to "cell -> clear all/current output" and toggle options in the menu
    • Too many displays in a notebook creates clutter
  • When automatically creating SparkConf in users namespace,in a new notebook, create a cell which displays the conf so that user does not by mistake recreate it.

Other Possible Future Things/Ideas

  • Include a configuration system for the user to configure things
    • Option to disable the extension altogether.
    • Configure other parameters such as refresh interval or display themes etc
    • Jupyter nbextension configurator integration
  • Use a package manager for javascript dependencies instead of storing dependencies in the repo itself
  • Build and minify javascript for production
  • Upload module to PIP pypi registry
  • Write Tests
  • Document Code
  • Future Integration/compatibility with JupyterLab??

Clarify license

Hello, can you clarify if you intend for everything in this repo to now be under the Apache-2.0 license? In particular, is the license for the files in the js folder now under Apache-2.0 or LGPL-2.1 (which is listed on the package.json)? Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.