krishnan-r / sparkmonitor Goto Github PK

View Code? Open in Web Editor NEW

172.0 9.0 54.0 4.06 MB

Monitor Apache Spark from Jupyter Notebook

Home Page: https://krishnan-r.github.io/sparkmonitor/

License: Apache License 2.0

JavaScript 40.59% HTML 3.32% CSS 7.16% Python 6.80% Jupyter Notebook 27.81% Scala 14.00% Dockerfile 0.31%

jupyter extension spark

sparkmonitor's Introduction

Spark Monitor - An extension for Jupyter Notebook

Note: This project is now maintained at https://github.com/swan-cern/sparkmonitor

Google Summer of Code - Final Report

For the google summer of code final report of this project click here

About

SparkMonitor is an extension for Jupyter Notebook that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself.

Features

Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
A table of jobs and stages with progressbars
A timeline which shows jobs, stages, and tasks
A graph showing number of active tasks & executor cores vs time
A notebook server extension that proxies the Spark UI and displays it in an iframe popup for more details
For a detailed list of features see the use case notebooks
How it Works

Quick Installation

pip install sparkmonitor
jupyter nbextension install sparkmonitor --py --user --symlink 
jupyter nbextension enable sparkmonitor --py --user            
jupyter serverextension enable --py --user sparkmonitor
ipython profile create && echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

For more detailed instructions click here

To do a quick test of the extension:

docker run -it -p 8888:8888 krishnanr/sparkmonitor

Integration with ROOT and SWAN

At CERN, the SparkMonitor extension would find two main use cases:

Distributed analysis with ROOT and Apache Spark using the DistROOT module. Here is an example demonstrating this use case.
Integration with SWAN, A service for web based analysis, via a modified container image for SWAN user sessions.

sparkmonitor's People

Contributors

Stargazers

Watchers

Forkers

diocas rijul1999 ayifru wangqiaoshi ntauth lucacanali kedemdor sangramga jdavidd jamesbconner beautifulnow1992 peay bsuryadevara alan-buaa d3v3l0 aengusrooneyunravel aengusrooney datalayer-externals jahstreet cygusmile amangarg96 plthiyagu hail-is ben-epstein abc582915847 belfhi dav-v jreissup euangms rockie-yang knockdata samratbhatnagar sandikodev dciangot lrxcy techguybiswa nameartem hansohn marcialf abdulkadirdere mahevarma mksaraf vinay26k hkowrada mhshabani harishraj daarko10 newquest

sparkmonitor's Issues

when start kernel,will be throw the following error

[W 11:26:28.271 NotebookApp] 404 GET /api/kernels/0ed75691-8a30-42ba-a856-8fd1f4a07446/channels?session_id=5890403E360A4F228A55153B574F22CB (::1): Kernel does not exist: 0ed75691-8a30-42ba-a856-8fd1f4a07446
[W 11:26:28.281 NotebookApp] 404 GET /api/kernels/0ed75691-8a30-42ba-a856-8fd1f4a07446/channels?session_id=5890403E360A4F228A55153B574F22CB (::1) 21.33ms referer=None
[W 11:26:32.297 NotebookApp] Replacing stale connection: 0ed75691-8a30-42ba-a856-8fd1f4a07446:5890403E360A4F228A55153B574F22CB
[I 11:26:33.917 NotebookApp] Kernel started: 0e6e9856-8288-4cc3-ae2e-2c5dd3d0d226
[W 11:26:33.932 NotebookApp] 404 GET /nbextensions/sparkmonitor/module.js?v=20180301112625 (::1) 8.88ms referer=http://localhost:8888/notebooks/Untitled7.ipynb
[W 11:26:34.104 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20180301112625 (::1) 3.33ms referer=http://localhost:8888/notebooks/Untitled7.ipynb

Not Able to Access Spark UI through Monitor

Hello,

Amazing tool! Thank you. Everything is working great except I do not see the icon to open the Spark UI:

Here is my Spark configuration:

spark.driver.extraClassPath=/usr/local/tools/spark/sparkmonitor/jars/listener.jar
spark.driver.memory=5g
spark.eventLog.dir=lustre:///sparkLogging/2.4.4
spark.eventLog.enabled=true
spark.eventLog.permissions=777
spark.executor.heartbeatInterval=7500
spark.extraListeners=sparkmonitor.listener.JupyterSparkMonitorListener
spark.history.fs.cleaner.enabled=true
spark.history.fs.cleaner.interval=1d
spark.history.fs.cleaner.maxAge=3d
spark.history.fs.logDirectory=lustre:///sparkLogging/2.4.4
spark.kryoserializer.buffer.max=128m
spark.master=local[*]
spark.network.timeout=10000000
spark.network.timetout=10000000
spark.rdd.compress=True
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.serializer.objectStreamReset=100
spark.sql.parquet.enableVectorizedReader=false
spark.sql.shuffle.partitions=108
spark.submit.deployMode=client
spark.ui.showConsoleProgress=true

I can access the UI through my browser, however. Any help would be appreciated.

Error loading server extension sparkmonitor.serverextension

My best guess is incompatibility with Tornado 6.

The root cause seems to be: AttributeError: module 'tornado.web' has no attribute 'asynchronous':

sparkmonitor/extension/sparkmonitor/serverextension.py

Line 27 in b023845

@tornado.web.asynchronous

$ jupyter serverextension enable --py --sys-prefix sparkmonitor
Enabling: sparkmonitor.serverextension
- Writing config: /opt/anaconda3/etc/jupyter
    - Validating...
Error loading server extension sparkmonitor.serverextension
      X is sparkmonitor.serverextension importable?

$ python
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sparkmonitor.serverextension
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.7/site-packages/sparkmonitor/serverextension.py", line 24, in <module>
    class SparkMonitorHandler(IPythonHandler):
  File "/opt/anaconda3/lib/python3.7/site-packages/sparkmonitor/serverextension.py", line 27, in SparkMonitorHandler
    @tornado.web.asynchronous
AttributeError: module 'tornado.web' has no attribute 'asynchronous'

$ pip freeze | egrep "tornado|jupyter|spark"
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
jupyterlab==0.35.4
jupyterlab-server==0.2.0
sparkmonitor==0.0.9
tornado==6.0.2

I am trying to create a game ,and i have toinput the following files for the game to work,but it is showing this in the kernel

What do i do
I have also changed the file format from png to jpg
this is the error :-

File "C:\HT Python Gaming\untitled1.py", line 20, in
walkRight = [pygame.image.load('R1.jpg'), pygame.image.load('R2.jpg'), pygame.image.load('R3.jpg'), pygame.image.load('R4.jpg'), pygame.image.load('R5.jpg'), pygame.image.load('R6.jpg'), pygame.image.load('R7.jpg'), pygame.image.load('R8.jpg'), pygame.image.load('R9.jpg')]

error: Couldn't open R1.jpg

please help ..and respond asap

Python 3 Kernel Issue

Hello
I have added a python3 kernel to my jupyter docker image
Is it a way to have sparkmonitor working with both 2.x and 3.x ?

It works fine with a python 2 kernel, but when i switch to 3.x kernel, the conf test raise an error


print(conf.toDebugString())

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-0a5e403cf2b8> in <module>
----> 1 print(conf.toDebugString())

NameError: name 'conf' is not defined

Thanks for your help

Originally posted by @Ftagn92 in #1 (comment)

Error loading server extension

Hi,
I am using JupyterHub on AWS EMR v5.24
JupyterHub is installed inside a docker container from AWS.
It comes with:

jupyter_client v5.2.3
jupyter_core v4.4.0
jupyterhub v0.9.6

I try to install sparkmonitor within the container as follow

sudo docker exec jupyterhub bash -c "pip install sparkmonitor"
sudo docker exec jupyterhub bash -c "jupyter nbextension install sparkmonitor --py --user --symlink"
sudo docker exec jupyterhub bash -c "jupyter nbextension enable sparkmonitor --py --user"
sudo docker exec jupyterhub bash -c "jupyter serverextension enable --py --user sparkmonitor"

I get message - Validating: OK for the two nbextention lines but after the last line I get the error message:

Enabling: sparkmonitor.serverextension
- Writing config: /home/jovyan/.jupyter
    - Validating...
Error loading server extension sparkmonitor.serverextension
      X is sparkmonitor.serverextension importable?

I continue with

sudo docker exec jupyterhub bash -c "echo \"c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')\" >>  /home/jovyan/.ipython/profile_default/ipython_kernel_config.py"

When I run JupyterHub, I see a button "Toggle Spark Monitoring Displays" but I cannot see the monitoring module... any Idea ?

Databricks Jupyter Integration

Howto configure with Databricks Jupyter Lab integration?

Scala support?

This looks neat, but the test notebook only shows example usage with pyspark. Does this work at all with Scala notebooks?

Test compatibility with PYSPARK_SUBMIT_ARGS

Based on the discussion at #6 (comment)

The extension is doing an import pyspark inside the extension. Which means, that if I as a jupyter user want to do something like:

import os

spark_pkgs=('com.amazonaws:aws-java-sdk:1.7.4',
            'org.apache.hadoop:hadoop-aws:2.7.3',
            'joda-time:joda-time:2.9.3',)

os.environ['PYSPARK_SUBMIT_ARGS'] = (
    '--packages {spark_pkgs} pyspark-shell'.format(spark_pkgs=",".format(spark_pkgs)))

import findspark
findspark.init()
import pyspark

spark = pyspark.sql.SparkSession.builder \
    .getOrCreate()

I cannot, because the PYSPARK_SUBMIT_ARGS environment variable will be created after the pyspark imported in the sparkmonitor module.

show to configure SPARKMONITOR_KERNEL_PORT

how to configure SPARKMONITOR_KERNEL_PORT

SPARKLISTENER: Exception creating socket:java.lang.NumberFormatException: For input string: "ERRORNOTFOUND"

Using sparkmonitor for remote kernels in Jupyter Enterprise Gateway setup.

Hello,

We have a setup of jupyter where we are spawing ipython kernels remotely on a spark cluster via jupyter enterprise gateway setup.
This means that the kernel extension has to be installed on the remote machines before startup.
Although we have managed to have sparkmonitor installed on the remote machines, configuring the ipython_kernel_config.py does not seem possible for us on these remote machines.
Since these kernels are being launched using YARN as a resource manager providing a command line option to load this extension also does not seem to work.

I tried doing %load_extension post which i can load the extension and the conf variable shows all the relevant details. I have also installed sparkmonitor on my local machine for notebook ui extension and notebook server extension.

But although the spark jobs are executing properly, the extension does not seem to show the UI or display.

Do you have any ideas how can we fix this issue ?
Let me know if you need additional info. (we have tried installing this on local setup and things were working fine)

Does this support multiple spark notebooks ?

The architecture at https://krishnan-r.github.io/sparkmonitor/how.html#the-notebook-webserver-extension---a-spark-web-ui-proxy seems to suggest that if I run multiple notebooks running spark its not going to work as only :4040 will be proxied

How to solve "[IPKernelApp] ERROR | No such comm target registered: SparkMonitor"?

I installed the "sparkmonitor" follow the document, but there is an issue just like "[IPKernelApp] ERROR | No such comm target registered: SparkMonitor" when I start notebook. So how to solve it? Thanks very much!

Support for jupyter-lab

Hello, Does it support jupyter-lab as well ?

UI not visible in notebook

I'm running spark and jupyter notebook in a docker container and I configured the sparkmonitor extension is loaded in the config files. I am running a Spark job in the notebook using the example provided in the github repo. However, even though the job is completed and i can see the final results and the stages in the logs, the UI does not display the screen shots described. What could happen?

Disable logging to file?

In kernelextension.py and serverextension.py there are references to the sparkmonitor_kernelextension.log and sparkmonitor_serverextension.log files being created and logged to. It specifically mentions that it is for debugging the module. Can we get rid of the debug mode? These log files are cluttering all my folders... :P

    fh = logging.FileHandler("sparkmonitor_serverextension.log", mode="w")
    fh.setLevel(logging.DEBUG)
    formatter = logging.Formatter(
        "%(levelname)s:  %(asctime)s - %(name)s - %(process)d - %(processName)s - \
        %(thread)d - %(threadName)s\n %(message)s \n")
    fh.setFormatter(formatter)
    logger.addHandler(fh) ## Comment this line to disable logging to a file.```

Integration with nteract

nteract is a frontend for Jupyter that runs natively on the Desktop, on the web and in other places like within the Atom editor.

The nteract desktop app runs on electron and is implemented using React, Redux, RxJs, along with other libraries, also using typescript.

The goal of this feature is to implement support for SparkMonitor into nteract directly providing a seamless user experience for using Spark from nteract.

Work needs to be done on refactoring SparkMonitor to support nteract, improve jupyter protocol support for this use case and support Scala kernels.

This issue summarizes discussions on Slack with @rgbkrk and others at the nteract/spark_integration channel

Python3 compatible

Seems like when I use the library, I get a lot of the following in my logs:

[E 19:06:06.771 NotebookApp] Uncaught exception GET /sparkmonitor/static/timeline-view.js (::1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/sparkmonitor/static/timeline-view.js', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/Users/abdealijk/anaconda3/lib/python3.6/site-packages/tornado/web.py", line 1499, in _stack_context_handle_exception
        raise_exc_info((type, value, traceback))
      File "<string>", line 4, in raise_exc_info
      File "/Users/abdealijk/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 315, in wrapped
        ret = fn(*args, **kwargs)
      File "/Users/abdealijk/anaconda3/lib/python3.6/site-packages/sparkmonitor/serverextension.py", line 68, in handle_response
        "location.origin", "location.origin +'" + self.replace_path + "' ")
    TypeError: a bytes-like object is required, not 'str'
[E 19:06:06.774 NotebookApp] {
      "Host": "localhost:8888",
      "Connection": "keep-alive",
      "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36",
      "Accept": "*/*",
      "Referer": "http://localhost:8888/sparkmonitor/",
      "Accept-Encoding": "gzip, deflate, br",
      "Accept-Language": "en-US,en;q=0.9",
      "Cookie": "_ga=GA1.1.939770584.1522777592; username-localhost-8889=\"2|1:0|10:1524844138|23:username-localhost-8889|44:YTM2ZTA5MTY5ODBjNGZmYTlkMjU5NmMyZDg2ODMxZmI=|2a48a65b005e9bdee1e24a4998f5a0ecfa4403d19cd979a3999cd4c1cfc6d1e6\"; username-localhost-9990=\"2|1:0|10:1527410921|23:username-localhost-9990|44:YTYwOGU1YzY4Yjk1NDhkM2JhMWM0YzYxZTU5NTk4ZDA=|8e91be00c8de10fbb098a1e228d324a1a5d3437138d4f76b7bba04db0b159929\"; _xsrf=2|a8530e06|b62af1e2bf7923224299195c03d70803|1527410923; username-localhost-8888=\"2|1:0|10:1527427803|23:username-localhost-8888|44:YjQ1ZWRkYTE0ZjEyNDNhZGI3NDVlMmEzYjhiYmQ3Zjg=|b31f12794744caf6a4d3bdf59c5e8c64cd22c6d6c2a2e1f636fdbf92c69531f0\""
    }
[E 19:06:06.775 NotebookApp] 500 GET /sparkmonitor/static/timeline-view.js (::1) 14.24ms referer=http://localhost:8888/sparkmonitor/
SPARKMONITOR_SERVER: Request_path static/log-view.js
 Replace_path:/sparkmonitor

The bytes/str issue seems like a classic python2/python3 issue. Has this been tested with python3 ?

Screenshots

PyPI version of sparkmonitor lacks Tornado 6 fix

See: #16

It would be swell if you deploy the latest master commit to PyPI.

Things To Do

Issues and things to fix

When there are more than atleast 200 tasks to show, the timeline appearing and scrolling lags
- This should depend on users browser and machine resources
- TODO: Beyond a certain threshold hide individual tasks entirely.
  - This needs to be done in the backend listener itself for scalability.
Some jobs do not have names
- For example when reading a parquet file, job name is null
- TODO: Use first stage name instead as done in Spark UI
Timeline annotations do not appear when number of tasks is too huge.
- Timeline loads asynchronously...
- TODO: Fix this or add option for user to show annotations by toggling checkbox
Cases where spark application is started and stopped multiple times in the same cell causes conflict in display as job Ids and stageIds are duplicate
- This could happen if jobs are called from an imported python script and context is stopped and started mutliple times.
- TODO Either clear the previous applications display or append appId to each jobId/stageId to make it unique.
- TODO Cases where a stage attempted again (never encountered this though)
When running multiple cells and an intermediate cell fails, further executions detect the wrong cell
- Restart and Run All doesnt work
- Cell Queue that is used to detect current cell needs to be cleared in frontend
- Further execution requests are possibly discarded in the kernel.
- TODO: How to detect this?
Error in some browsers like Internet Explorer, when frontend extension fails to load, python throws 'comm' error
- TODO: Supress error
- TODO: Replicate issue and identify possible causes

Pending Features

Handle skipped stages name and no: of tasks properly in the progressbars
Show failed tasks in red
- In the timeline
- In the table of jobs
- Also show reason of failure.
Dynamically update executors in task graph
Aggregate no: active tasks over finite interval to make graph smoother
Add annotations to task graph regarding start and end of jobs
- Change current charting library as annotations are not properly implemented.
Popup with more details when clicking on an items in the timeline
Ability to Cancel Jobs - The cancel button
- TODO What is the right API to do this?
- Using SparkContext
  - setJobGroup / cancelJobGroup
  - Currently there is no access to the SparkContext
  - Current communication mechanism prevents messages to kernel when kernel is busy.
- However the Spark UI has an internal REST API to kill individual Jobs
  - This is the (kill) link that appears in the UI

Look and Feel

In Firefox prevent tables css from expanding rows to fill container.
Jquery UI dialog css styles conflicting with matplotlib output,
- can be fixed
Add scrollbars to table when the number of jobs/stages is more.
Add a visual indicator that shows overall status of a cell - running/completed
Possibly show number of active executors somewhere as a number.
Display overall cell execution time somewhere

New Features

Add an option to remove display all together from a cell
- For trivial operations like a read or viewing count/take, user may prefer to hide the display.
- Maybe a global option to hide all displays
- Respond to "cell -> clear all/current output" and toggle options in the menu
- Too many displays in a notebook creates clutter
When automatically creating SparkConf in users namespace,in a new notebook, create a cell which displays the conf so that user does not by mistake recreate it.

Other Possible Future Things/Ideas

Include a configuration system for the user to configure things
- Option to disable the extension altogether.
- Configure other parameters such as refresh interval or display themes etc
- Jupyter nbextension configurator integration
Use a package manager for javascript dependencies instead of storing dependencies in the repo itself
Build and minify javascript for production
Upload module to PIP pypi registry
Write Tests
Document Code
Future Integration/compatibility with JupyterLab??

Clarify license

Hello, can you clarify if you intend for everything in this repo to now be under the Apache-2.0 license? In particular, is the license for the files in the js folder now under Apache-2.0 or LGPL-2.1 (which is listed on the package.json)? Thank you