jupyter / docker-stacks Goto Github PK

View Code? Open in Web Editor NEW

7.8K 190.0 3.0K 7.43 MB

Ready-to-run Docker images containing Jupyter applications

Home Page: https://jupyter-docker-stacks.readthedocs.io

License: Other

Makefile 2.59% Shell 12.52% Python 66.75% Dockerfile 14.09% Jupyter Notebook 4.03% Cython 0.01%

notebook jupyter docker jupyterhub jupyter-notebook jupyterlab python ipython ipython-notebook jupyter-notebooks

docker-stacks's Introduction

Jupyter Docker Stacks

Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools. You can use a stack image to do any of the following (and more):

Start a personal Jupyter Server with the JupyterLab frontend (default)
Run JupyterLab for a team using JupyterHub
Start a personal Jupyter Server with the Jupyter Notebook frontend in a local Docker container
Write your own project Dockerfile

Quick Start

You can try a relatively recent build of the quay.io/jupyter/base-notebook image on mybinder.org. Otherwise, the examples below may help you get started if you have Docker installed, know which Docker image you want to use, and want to launch a single Jupyter Application in a container.

The User Guide on ReadTheDocs describes additional uses and features in detail.

Since `2023-10-20` our images are only pushed to `Quay.io` registry.
Older images are available on Docker Hub, but they will no longer be updated.

Example 1

This command pulls the jupyter/scipy-notebook image tagged 2024-03-14 from Quay.io if it is not already present on the local host. It then starts a container running a Jupyter Server with the JupyterLab frontend and exposes the container's internal port 8888 to port 10000 of the host machine:

docker run -p 10000:8888 quay.io/jupyter/scipy-notebook:2024-03-14

You can modify the port on which the container's port is exposed by changing the value of the -p option to -p 8888:8888.

Visiting http://<hostname>:10000/?token=<token> in a browser loads JupyterLab, where:

The hostname is the name of the computer running Docker
The token is the secret token printed in the console.

The container remains intact for restart after the Server exits.

Example 2

This command pulls the jupyter/datascience-notebook image tagged 2024-03-14 from Quay.io if it is not already present on the local host. It then starts an ephemeral container running a Jupyter Server with the JupyterLab frontend and exposes the server on host port 10000.

docker run -it --rm -p 10000:8888 -v "${PWD}":/home/jovyan/work quay.io/jupyter/datascience-notebook:2024-03-14

The use of the -v flag in the command mounts the current working directory on the host (${PWD} in the example command) as /home/jovyan/work in the container. The server logs appear in the terminal.

Visiting http://<hostname>:10000/?token=<token> in a browser loads JupyterLab.

Due to the usage of the --rm flag Docker automatically cleans up the container and removes the file system when the container exits, but any changes made to the ~/work directory and its files in the container will remain intact on the host. The -i flag keeps the container's STDIN open, and lets you send input to the container through standard input. The -t flag attaches a pseudo-TTY to the container.

By default, [jupyter's root_dir](https://jupyter-server.readthedocs.io/en/latest/other/full-config.html) is `/home/jovyan`.
So, new notebooks will be saved there, unless you change the directory in the file browser.

To change the default directory, you must specify `ServerApp.root_dir` by adding this line to the previous command: `start-notebook.py --ServerApp.root_dir=/home/jovyan/work`.

Choosing Jupyter frontend

JupyterLab is the default for all the Jupyter Docker Stacks images. It is still possible to switch back to Jupyter Notebook (or to launch a different startup command). You can achieve this by passing the environment variable DOCKER_STACKS_JUPYTER_CMD=notebook (or any other valid jupyter subcommand) at container startup; more information is available in the documentation.

Resources

Acknowledgments

Starting from 2022-07-05, aarch64 self-hosted runners were sponsored by @mathbunnyru. Please, consider sponsoring his work on GitHub
Starting from 2023-10-31, aarch64 self-hosted runners are sponsored by an amazing 2i2c non-profit organization

CPU Architectures

We publish containers for both x86_64 and aarch64 platforms
Single-platform images have either aarch64- or x86_64- tag prefixes, for example, quay.io/jupyter/base-notebook:aarch64-python-3.11.6
Starting from 2022-09-21, we create multi-platform images (except tensorflow-notebook)
Starting from 2023-06-01, we create a multi-platform tensorflow-notebook image as well
Starting from 2024-03-14, we create CUDA enabled variant of pytorch-notebook image for x86_64 platform

Using old images

This project only builds one set of images at a time. If you want to use the older Ubuntu and/or Python version, you can use the following images:

Build Date	Ubuntu	Python	Tag
2022-10-09	20.04	3.7	`1aac87eb7fa5`
2022-10-09	20.04	3.8	`a374cab4fcb6`
2022-10-09	20.04	3.9	`5ae537728c69`
2022-10-09	20.04	3.10	`f3079808ca8c`
2022-10-09	22.04	3.7	`b86753318aa1`
2022-10-09	22.04	3.8	`7285848c0a11`
2022-10-09	22.04	3.9	`ed2908bbb62e`
2023-05-30	22.04	3.10	`4d70cf8da953`
weekly build	22.04	3.11	`latest`

Contributing

Please see the Contributor Guide on ReadTheDocs for information about how to contribute recipes, features, tests, and community-maintained stacks.

Alternatives

rocker/binder - From the R focused rocker-project, lets you run both RStudio and Jupyter either standalone or in a JupyterHub
jupyter/repo2docker - Turn git repositories into Jupyter-enabled Docker Images
openshift/source-to-image - A tool for building artifacts from source code and injecting them into docker images
jupyter-on-openshift/jupyter-notebooks - OpenShift compatible S2I builder for basic notebook images

docker-stacks's People

Contributors

Stargazers

Watchers

Forkers

parente bcc2xp modulexcite gitter-badger jerrylam rdhyee jlamcanopy jtyberg torz cognitivescale hrishikeshvganu jakirkham davidemalagoli yilab danielballan dchabot kimstebel poplav stevenrush masinoa zachgarner ijstokes mgawinski depend yepengxj dougnukem dragon9783 joshloyal alewitt hute37 viveksaini mottalrd dpcollins xsongx m-ueno wshuyi lbustelo ben-github wenh123 aruizga7 jeffamcgee kelvl davidoury bcajes hydrosquall brcondor hedata chapmanbe gifford-lab pcallier nkhuyu ldolberg sjsrey tma-comms bonethrown sprapat getzyway lipengyu agatha2016 ricebeans fperez baldur mahesh2013 webmalex jonahj sholtom drpaulbrewer blalpert sergeyparamoshkin tborgstadt mekler mwaaas mcapuccini ea7igw minrk mkolovic ryan-pream wxdublin nzpr willingc rommeljohnsantos victormocioiu idundercover zam121118 stevensu1977 yellowspring pyxixi2012 jaketbouma harikiranvuyyuru evansde77 sanori clashboy chirayukong tourunen tromika norivicjr busbud baomingt fh-wedel yvlasov

docker-stacks's Issues

Doc / alias pip for python 2 and 3

See chat starting about here:

https://gitter.im/jupyter/docker-stacks?at=55d5beb41250e6600d08d27d

%matplotlib inline must be at the top

If %matplotlib inline is executed after imports that drag matplotlib, then the kernel issues an error about the missing libXrender.so.1

Installing the missing library with
sudo apt-get install libxrender1
fixes the isue but grows the image.

Another fix would be to have %matplotlib inline be implicit at the top of all notebooks. Then you need to have a way to disable it if need be.

Importing matplotlib gives an error

I basically just have a number of import statements, with matplotlib.pyplot causing problems. I have it in the scipy notebook but not in the all-spark notebook

import matplotlib.pyplot as plt
%matplotlib inline

ImportError: libXrender.so.1: cannot open shared object file: No such file or directory

Full error trace:
----> 1 import matplotlib.pyplot

/opt/conda/lib/python3.4/site-packages/matplotlib/pyplot.py in ()
107
108 from matplotlib.backends import pylab_setup
--> 109 _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
110
111

/opt/conda/lib/python3.4/site-packages/matplotlib/backends/init.py in pylab_setup()
30 # imports. 0 means only perform absolute imports.
31 backend_mod = import(backend_name,
---> 32 globals(),locals(),[backend_name],0)
33
34 # Things we pull in from all backends

/opt/conda/lib/python3.4/site-packages/matplotlib/backends/backend_qt4agg.py in ()
15 from matplotlib.figure import Figure
16
---> 17 from .backend_qt5agg import NavigationToolbar2QTAgg
18 from .backend_qt5agg import FigureCanvasQTAggBase
19

/opt/conda/lib/python3.4/site-packages/matplotlib/backends/backend_qt5agg.py in ()
16
17 from .backend_agg import FigureCanvasAgg
---> 18 from .backend_qt5 import QtCore
19 from .backend_qt5 import QtGui
20 from .backend_qt5 import FigureManagerQT

/opt/conda/lib/python3.4/site-packages/matplotlib/backends/backend_qt5.py in ()
29 figureoptions = None
30
---> 31 from .qt_compat import QtCore, QtGui, QtWidgets, _getSaveFileName, version
32 from matplotlib.backends.qt_editor.formsubplottool import UiSubplotTool
33

/opt/conda/lib/python3.4/site-packages/matplotlib/backends/qt_compat.py in ()
89 if QT_API in [QT_API_PYQT, QT_API_PYQTv2]: # PyQt4 API
90
---> 91 from PyQt4 import QtCore, QtGui
92
93 try:

ImportError: libXrender.so.1: cannot open shared object file: No such file or directory

Use "NB_" prefix for environment variables

I ran the jupyter/all-spark-notebook container on a Mesos cluster with the notebook container port (8888) mapped to a random port on the host. In this configuration, Mesos sets a "PORT" environment variable when running the container to the host port (e.g., 31674), and the notebook server ends up starting on http://localhost:31674 within the container.

To avoid potential conflicts with other environment variables, it may be a good idea to prefix the environment variables with "NB_" or similar, like we already do with NB_UID, NB_USER, etc.

Consider generating letsencypt certificates instead of self-signed certificates?

See:

(apologies if this issue should go to one of the upstream Jupyter repos rather than here in docker-stacks, not sure at what level in the stack this part is being handled).

Add live slideshow plugin inside the minimal notebook image

There is a great repo called RISE which allow via extension to create live slideshows of your notebooks, with no conversion, adding javascript Reveal.js.

I like it a lot, and find my self often adding this feature on top of your official images.
Since the plugin is great, works with python2 and python3, is light and usefull, you might consider to add it inside the base Dockerfile.

As a quick example of how you could do it, taken from my personal repo:

# Add Live slideshows with RISE
RUN wget https://github.com/pdonorio/RISE/archive/master.tar.gz \
    && tar xvzf *.gz && cd master && python3 setup.py install

Very simple.

Inside that code the github link is my fork of RISE, where i changed keyboard shortcuts for my convenience.

edit: of course you should use the original repo, thanks to Damian Avila.

Paolo

Don't use supervisor as the init process

See jupyter/notebook#334 (comment) for background.

Possible fixes ...

Switch to phusion/baseimage. Concern: size and change of OS.

phusion/baseimage              latest              e9f50c1887ea        7 weeks ago         237.7 MB
ubuntu                         14.04               d2a0ecffe6fa        8 weeks ago         188.4 MB
debian                         wheezy              065218d54d7d        11 weeks ago        84.97 MB

Use runit like phusion/baseimage does (http://www.sourcediver.org/blog/2014/11/17/using-runit-in-a-docker-container/). Concern: env var workaround described therein.
Port phusion/baseimage to debian. Concern: yet another thing to maintain.
Wait for docker to give reaping "for free" (moby/moby#11529).

SparkUI

Hi,

How to get to the SparkUI, i tried starting the docker image with the standard UI port open but it didn't work. Any help, appreciated, I would like to have a look at the Spark UI visualizations..

Regards, Dieter

jupyter_notebook.config.py fails to write PEM file

I tried running the jupyter/all-spark-notebook:latest (4.0.x tag). I specify the USE_HTTPS option without providing a PEM/cert file. The expected behavior is that a self-signed cert gets created at notebook server startup. However, I'm seeing the notebook server fail to load /home/jovyan/.jupyter/jupyter_notebook_config.py:

root@50c2766b0617:/# cat /var/log/supervisor/notebook.log 
Generating a 2048 bit RSA private key
..................................................+++
..........................+++
writing new private key to '/home/jovyan/.local/share/jupyter/notebook.pem'
/home/jovyan/.local/share/jupyter/notebook.pem: No such file or directory
139952694265488:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('/home/jovyan/.local/share/jupyter/notebook.pem','w')
139952694265488:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:
[E 14:03:12.198 NotebookApp] Exception while loading config file /home/jovyan/.jupyter/jupyter_notebook_config.py
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.4/site-packages/traitlets/config/application.py", line 535, in _load_config_files
        config = loader.load_config()
      File "/opt/conda/lib/python3.4/site-packages/traitlets/config/loader.py", line 432, in load_config
        self._read_file_as_dict()
      File "/opt/conda/lib/python3.4/site-packages/traitlets/config/loader.py", line 464, in _read_file_as_dict
        py3compat.execfile(conf_filename, namespace)
      File "/opt/conda/lib/python3.4/site-packages/ipython_genutils/py3compat.py", line 185, in execfile
        exec(compiler(f.read(), fname, 'exec'), glob, loc)
      File "/home/jovyan/.jupyter/jupyter_notebook_config.py", line 20, in <module>
        '-keyout', PEM_FILE, '-out', PEM_FILE])
      File "/opt/conda/lib/python3.4/subprocess.py", line 561, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['openssl', 'req', '-new', '-newkey', 'rsa:2048', '-days', '365', '-nodes', '-x509', '-subj', '/C=XX/ST=XX/L=XX/O=generated/CN=generated', '-keyout', '/home/jovyan/.local/share/jupyter/notebook.pem', '-out', '/home/jovyan/.local/share/jupyter/notebook.pem']' returned non-zero exit status 1
[I 14:03:12.225 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret
[I 14:03:12.290 NotebookApp] Serving notebooks from local directory: /home/jovyan/work
[I 14:03:12.291 NotebookApp] 0 active kernels 
[I 14:03:12.291 NotebookApp] The IPython Notebook is running at: http://localhost:8888/
[I 14:03:12.291 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 14:03:12.291 NotebookApp] No web browser found: could not locate runnable browser.

It looks like the /home/jovyan/.local/share/jupyter directory structure does not exist, so the self-signed PEM cannot be written.

Provide animation engine

Could we add an animation engine (avconv or mencoder) to the build so that matplotlib animations would work?

When running the notebook from https://github.com/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb, in the annimations sections, you get this error:

/opt/conda/lib/python3.4/site-packages/matplotlib/animation.py:742: UserWarning: MovieWriter avconv unavailable
  warnings.warn("MovieWriter %s unavailable" % writer)

Thanks,

3.2 tagged images with R now run notebook 4.0

Inverse problem of the other day. The conda R package now prereqs jupyter 4.0 which winds up upgrading the notebook server. Any way to get these images back on 3.2.1?

chown very slow

The docker file for the minimal-notebook runs

chown -R $NB_USER:users $CONDA_DIR && \
chown -R $NB_USER:users /home/$NB_USER

in the file. I don't know why, but this is very slow. If I build a new docker image and pass a different NB_USER value (so this step isn't cached) it takes ~10 minutes to finish running the chown. This may have to do with how aufs is working behind the scenes (I'm running on Debian, using aufs for docker storage). Since the file is cached on rebuilds, this isn't a big deal, but does the jovyan user need to be owner of /opt/conda? If so, maybe you could create that user first and install conda as user jovyan?

Julia not available via IPython kernel magic - in datascience-notebook

I tried to run Fernando Perez's "mixed languages" notebook
(https://github.com/fperez/talk-1504-boulder/blob/master/Multiple%20languages%20from%20inside%20IPython.ipynb)

At the first cell trying to load julia (see below), an error message was reported
%load_ext julia.magic
%julia @pyimport matplotlib.pyplot as plt
%julia @pyimport numpy as np
gave the error:
ImportError: No module named 'julia'

I performed a
pip install julia

but was still not able to run the cell due to a `GLIBCXX_3.4.20' library error, see below.

%load_ext julia.magic
%julia @pyimport matplotlib.pyplot as plt
%julia @pyimport numpy as np
Initializing Julia interpreter. This may take some time...
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-8-2e5a2b66155b> in <module>()
----> 1 get_ipython().magic('load_ext julia.magic')
      2 get_ipython().magic('julia @pyimport matplotlib.pyplot as plt')
      3 get_ipython().magic('julia @pyimport numpy as np')

/opt/conda/lib/python3.4/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2334         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2335         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2336         return self.run_line_magic(magic_name, magic_arg_s)
   2337 
   2338     #-------------------------------------------------------------------------

/opt/conda/lib/python3.4/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2255                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2256             with self.builtin_trap:
-> 2257                 result = fn(*args,**kwargs)
   2258             return result
   2259 

/opt/conda/lib/python3.4/site-packages/IPython/core/magics/extension.py in load_ext(self, module_str)

/opt/conda/lib/python3.4/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/opt/conda/lib/python3.4/site-packages/IPython/core/magics/extension.py in load_ext(self, module_str)
     64         if not module_str:
     65             raise UsageError('Missing module name.')
---> 66         res = self.shell.extension_manager.load_extension(module_str)
     67 
     68         if res == 'already loaded':

/opt/conda/lib/python3.4/site-packages/IPython/core/extensions.py in load_extension(self, module_str)
     89                     __import__(module_str)
     90             mod = sys.modules[module_str]
---> 91             if self._call_load_ipython_extension(mod):
     92                 self.loaded.add(module_str)
     93             else:

/opt/conda/lib/python3.4/site-packages/IPython/core/extensions.py in _call_load_ipython_extension(self, mod)
    136     def _call_load_ipython_extension(self, mod):
    137         if hasattr(mod, 'load_ipython_extension'):
--> 138             mod.load_ipython_extension(self.shell)
    139             return True
    140 

/opt/conda/lib/python3.4/site-packages/julia/magic.py in load_ipython_extension(ip)
     80 def load_ipython_extension(ip):
     81     """Load the extension in IPython."""
---> 82     ip.register_magics(JuliaMagics)

/opt/conda/lib/python3.4/site-packages/IPython/core/magic.py in register(self, *magic_objects)
    389             if type(m) in (type, MetaHasTraits):
    390                 # If we're given an uninstantiated class
--> 391                 m = m(shell=self.shell)
    392 
    393             # Now that we have an instance, we can register it and update the

/opt/conda/lib/python3.4/site-packages/julia/magic.py in __init__(self, shell)
     47         # Flush, otherwise the Julia startup will keep stdout buffered
     48         sys.stdout.flush()
---> 49         self.julia = Julia(init_julia=True)
     50         print()
     51 

/opt/conda/lib/python3.4/site-packages/julia/core.py in __init__(self, init_julia, jl_init_path)
    252                     raise JuliaError("Julia sysimage (\"sys.ji\") not found! {}".format(sysimg_relpath))
    253 
--> 254             self.api = ctypes.PyDLL(libjulia_path, ctypes.RTLD_GLOBAL)
    255             self.api.jl_init_with_image.arg_types = [char_p, char_p]
    256             self.api.jl_init_with_image(JULIA_HOME.encode("utf-8"),

/opt/conda/lib/python3.4/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    349 
    350         if handle is None:
--> 351             self._handle = _dlopen(self._name, mode)
    352         else:
    353             self._handle = handle

OSError: /opt/conda/lib/python3.4/site-packages/zmq/backend/cython/../../../../.././libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /usr/bin/../lib/x86_64-linux-gnu/julia/libjulia.so)

Update Spark to version 1.5.0

Just passing by to say that, today, I tried to use all-spark-notebook for Spark 1.5.0 and it got errors. I guess the jar repositories don't have all the packages yet.

Got some "not found" during build, like

[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::              FAILED DOWNLOADS            ::
[warn]  :: ^ see resolution messages for details  ^ ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.spark#spark-core_2.10;1.5.0!spark-core_2.10.jar
[warn]  :: org.apache.spark#spark-launcher_2.10;1.5.0!spark-launcher_2.10.jar
[warn]  :: org.apache.spark#spark-network-common_2.10;1.5.0!spark-network-common_2.10.jar
[warn]  :: org.apache.spark#spark-network-shuffle_2.10;1.5.0!spark-network-shuffle_2.10.jar
[warn]  :: org.apache.spark#spark-unsafe_2.10;1.5.0!spark-unsafe_2.10.jar
[warn]  :: org.apache.spark#spark-streaming_2.10;1.5.0!spark-streaming_2.10.jar
[warn]  :: org.apache.spark#spark-sql_2.10;1.5.0!spark-sql_2.10.jar
[warn]  :: org.apache.spark#spark-catalyst_2.10;1.5.0!spark-catalyst_2.10.jar
[warn]  :: org.apache.spark#spark-mllib_2.10;1.5.0!spark-mllib_2.10.jar
[warn]  :: org.apache.spark#spark-graphx_2.10;1.5.0!spark-graphx_2.10.jar
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::

Let's watch the packages repo to see when they become available.

Remove version info from READMEs

Now that latest tag tracks master and the READMEs on GitHub are the doc of reference, we should remove the version specific information from them. Before, we were keeping master up to date with information about the options for all branches. Now, we've put a stake in the ground telling users to check the SHA / branch of interest to find out how to use the associated image.

all-spark-notebook does not work when using spark on mesos

Hi there,

I have tried this for hours and I cannot use spark on mesos with the current configurations defined in the Dockerfile. Have anyone tried using this docker image with mesos?

I hacked around those configurations inside the container and the only way to make this to work is to use root instead of a user name "jov*".

--net=host prevents su to jovyan user

To overcome the problem in the title, need to set --pid=host as a workaround to moby/moby#5899. But then, tini reports:

[WARN ] Tini is not running as PID 1 and isn't registered as a child subreaper.
       Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
       To fix the problem, use -s or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.

To overcome this problem, tini needs to be started with -s. Maybe tini should always be started with -s in minimal-notebook?

Document command line arg passthrough

We need to document that all Jupyter command line args can be passed via the start command for the container. This obviates the need to add env vars for everything.

We might also consider deprecating and/or eventually removing the env vars that can be passed through trivially: INTERFACE, PORT.

Ref issue #45, PR #33

datascience-notebook: Julia kernel not available for selection

Julia kernel doesn't appear in the list of available kernels in the Jupyter interface. (Only Python2,3 and R are listed)

UTF-8 Characters (non-ASCII) in plot titles and axes labels in r-notebook are mangled

In r-notebook (both a local docker container as well as on try.jupiter.org), Umlaute in plot titles in R plots are mangled. Example:

Input:

plot(stats::rnorm(50)) +
title(main = "Übertragungszeiten")

The output will be a chart with title "<9c>bertragungszeiten". I.e. the two-byte UTF-8 character Ü is printed out with the ascii char numbers of each byte. This makes r-notebook for German users quite broken.

Untitled.html.zip

Rebuild jupyter/all-spark-notebook to include spark kernel fixes

The jupyter/all-spark-notebook image needs to be rebuilt and pushed after ibm-et/spark-kernel#203 is merged

The kernel PR contains a fix that's necessary for jupyter declarative widgets, a project that uses jupyter/all-spark-notebook, to move to jupyter 4 (see this issue)

Trying to install Java

Hi, I am trying to install Java but I have problem with root permission. Any idea about how to solve this?

Shorten path to notebook work directory?

Currently, the notebook working directory is in /home/jovyan/work which is borrowed from the original setup from the original http://github.com/jupyter/docker-demo-images Dockerfile. It has been raised that this path is a bit long (#21 (comment)) and that it introduces an extra if block check in the start-notebook.sh script to copy in skeleton files post jovyan user creation in the case where the work directory is host mounted.

The first problem might be something we can change albeit at the cost of breaking existing users of the image. The second problem is not exclusive to the notebook directory location, however: if the user host mounts a Jupyter config or any other file in the home directory, the skeleton file copy logic is required.

Anything we can / should do?

start-notebook.sh should be split into config-notebook.sh and start-notebook.sh

Related to my comment in issue #25 it would be useful to allow derivation of the various docker-stacks.

The start-notebook.sh script actually has 2 distinct functions

configure the notebook (most of the script)
starting the notebook (currently via supervisord)

We should have a CMD in the Dockerfile such as config-start.sh (you can surely think of a better name!) which then calls

config-notebook.sh
start-notebook.sh

In this way anyone can cleanly derive from the docker-stacks image by modifying config-start.sh to call

config-notebook.sh
derived-config.sh
start-notebook.sh

Advice on running these images behind nginx?

Curious if you have any examples or advice on running one of these images behind a nginx proxy? I tried this nginx configuration but this doesn't appear to work.

(Yes, I see the image already has a USE_HTTPS option, but would like to map the port to a path on the server, e.g. SERVER/jupyter instead of SERVER:8888. Also it seems sensible to have the image running behind the same nginx proxy handling other apps on the server).

Thanks for your help and apologies if this is out of scope for issues here.

jupyter_notebook_config.py not loaded in minimal-notebook image

This may be a problem with the other images too, but I'm only using the minimal image.

jupyter_notebook_config.py is not getting loaded by the notebook because it is being installed to /users/jovyan/.jupyter/ and the server is being launched using the root account. The server expects the config to be in /root/.jupyter/.

Side effects of the config not being loaded include broken password and encryption features. The caveman method I used to test this was to add a print statement to the config and use docker logs to look at the output of the container.

cc: @rgbkrk who I chatted with about this
and cc: @parente , who I think is the man in control of this

Builds on DockerHub

I'd like to get stack builds cranking on DockerHub as soon as there's a few more images worth building. (Note to self: hurry up with more images. :) To do so, though, we need to answer a few questions.

How should builds from Docker image definitions in this git repo surface on Docker Hub? Take the minimal-notebook and r-notebook stacks as examples. Do they appear as jupyter/minimal-notebook and jupyter/r-notebook on DockerHub? The naming is consistent with everything else in the Jupyter project, but will users tell jupyter/minimal-notebook apart from jupyter/notebook (setup for dev / test) or jupyter/minimal-demo (base image for tmpnb.org? Do we put them in a new Docker repository like docker-stacks/minimal-notebook, docker-stacks/r-notebook, etc.? Can we even use a different name Docker repository name or is it linked to the GitHub user name?
When should builds kick-off? Automatically? Manually? A single GitHub repo for multiple images is not how Docker Hub wants to work. Docker Hub builds are either automatic on push or manual. If we set all the images to automatic, every image rebuilds immediately when there's a push to the repo, even if unnecessary. Worse, if an image starts rebuilding before its parent image has built, it's possible to wind up with a failed build or incorrect image. We can configure linked builds on Docker Hub to trigger child container rebuilds after the parent build succeeds, but since child lives in the same git repo as parent, there will still be a wasted build. (Maybe we don't care?)
Does Docker tagging help us with anything? We could tag images by the version of the Jupyter project contained therein. But it's a myopic reflection when there's Python 2.x, Python 3.x, Scala 2.x, Spark 1.4.x, Mesos x.x. etc. all in a container and potentially varying from build to build. Do we just stick with latest?

Spark hanging on docker-machine 0.5.1

I'm having some trouble with both pyspark-notebook and all-spark-notebook on version 0.5.1 of docker-machine. It's working fine for me on an earlier version, 0.4.1 and that's the only system difference.

When I run the test code to see it's all up and running:

import pyspark
sc = pyspark.SparkContext('local[*]')

It hangs on trying to load the context. This also eventually brings down the whole machine.

Anyone else seen this on 0.5.1?

Doc maintainer process

There's a mix of manual / automated steps. Write them down!

how to trigger docker build after a git merge (today)
how to backport to a versioned branch
how to move up to a new version of the main container process (e.g., notebook 4.1.x)

Can someone explain why SBT is specifically removed from this image?

Seems that it may have use for various Scala/Spark functions? (I'm over my head here though!)
I just noticed it was specifically removed form the image.. and I was looking to use it in the Jupyter console shell?

Reduce image sizes

The minimal-notebook installs texlive, which pulls in x11 for a hefty increase in image size. It would be nice to install a version that doesn't require those extra dependencies, or to manually remove them after install, if that won't break anything.

Docker Hub automated builds and undesirable rebuilds

I've been watching how the automated builds execute on Docker Hub as the number of branches and stacks have grown. With jupyter/minimal-notebook autobuild-on-push turned off, autobuilds for all other stacks turned on, and manual rebuilds of minimal triggering autobuild of all other stacks, there are numerous wasted rebuilds. For example:

If minimal-notebook changes on git branch master, DH rebuilds latest from master, 3.2 from branch 3.2.x, and 4.0 from branch 4.0.x. The latter two should have been left alone.
When the above builds complete successfully, each one triggers a rebuild of tag latest from master, tag 3.2 from branch 3.2.x, and tag 4.0 from branch 4.0.x for every other stack. That's (# of branches)^2 x (# of stacks) rebuilds when just one tagged image for each should have been touched.

This appears to be a fact of life on Docker Hub. We can cut down on the branches^2 noise by removing the automated triggers from minimal-notebook to all the others if we remember to trigger each one manually every time minimal changes. But even this results in wasted builds.

Worse, every rebuild potentially creates new image layer SHAs from different build machine which means poor users that decide to pull and pull again will wind up downloading the entire image over again, even if the layers are binary equivalent. Worse, differences may sneak in during undesirable rebuilds since we don't pin OS packages and lead to two equivalently tagged images having slightly different contents.

It appears this behavior is a fact of life with automated builds on DH which is unfortunate because they are more transparent than the build-and-push-yourself flavor. The only way I see around these problems completely is automating the builds more intelligently outside of DH using a third party service or rolling our own with a little GitHub listener.

Automated builds and testing

Would be nice to have a way to test if builds are functional or not, not just completed and available on Docker Hub. Is there something we can do with Travis or another CI offering? Something custom?

Notebook path setting (probably doc understanding issue)

I get it to start up fine but it is in it's own world. I want to access the files on my local machine.

The instructions indicate to start it with the following to set a custom Base url

docker run -d -p 8888:8888 jupyter/all-spark-notebook start-notebook.sh --NotebookApp.base_url=/some/path

However I cannot figure out a valid path syntax

I have tried /Home, ~/Home, /home, ~/home as well as longer versions. I keep getting the Jupyter notebook page saying 404??

Any Ideas?

Using Ubuntu 14.04

A user on SO suggest that I need to set a virtual drive for the image like,,,

If you're looking to access the files on your local machine, you'll need to mount a volume to the docker container. Your command for start-notebook.sh will have to reference a path that is relevant to your docker container, which would be the right-side of the volume argument:

docker run -d -p 8888:8888 -v /some/path/on/my/local:/some/path/on/my/container jupyter/all-spark-notebook start-notebook.sh --NotebookApp.base_url=/some/path/on/my/container

Posted on SO as http://stackoverflow.com/questions/34098610/docker-jupyter-notebook-setting-base-url

Please update to pandas 0.17

It would make my life easier to be able to use some of the new features in pandas 0.17 (current stacks are pegged at 0.16.2).

Proposal for versioning, immutable historical builds

Current Setup

Docker Hub (DH) automatically builds git master and tags it with latest
DH automatically builds version branches like 4.0.x and tags them with 4.0
We only ever roll the latest master branch and latest version branch forward with fixes.

Problems

We don't tag per git commit so that users can easily roll back to prior Docker image versions. This is important when major libraries change (e.g., Spark). Some users want the latest which should go into master, while others want to stay on the current version.
Changes on any branch in the git repo causes a build storm on Docker Hub. (See issue #15). All tags jump to the latest build even when nothing changed in the build definition.

Proposed Improvement

Stop relying on Docker Hub automated builds. (We need more control.)
Stop relying on the 3.2.x and 4.0.x branches and branches in general. (Too coarse grained.)
Adopt the mantra that when a PR wants to bump a library version in one of the stacks, we take it without question.
Setup a build VM that maintainers have access to easily pull from this repo and build images. (Manually for now. We can automate later.)
Beef up the Makefile so that a make latest (or some such) does the following:
- Builds the necessary stacks (change in minimal = build everything, change in scipy = just that stack, etc.) from master HEAD
- Tags the latest built images (even if they were not rebuilt just now) with latest
- Tags those same images with the current git commit SHA
- Pushes all of those tags and new builds to Docker Hub (Note: The client will be smart about only sending deltas / tag metadata for images that did not change.)

Net Result for End Users

For people that want to walk-up-and-use the latest and greatest:

docker run jupyter/some-stack-name

For people that want to depend on a specific container image configuration tied to some point in time in the docker-stacks git history:

docker run jupyter/some-stack-name:<some-git-sha>

where GitHub / git makes the contents of that particular tagged image visible to the user (i.e., find the SHA in git and look at the Dockerfiles).

add base_url config as an ENV option

I think that its common to run these behind a reverse proxy, so it would be great to be able to set the base_url through an environmental config option

No Magic or ! access from Scala kernel?

Just noticed I get no magic commands from a Scala NB nor can I access the system using ! both are pretty fundamental to Jupyter useage, and the lack of magic prevents interoperating with other kernels eg R/Python?

gfortran is required [in all-spark-notebook] for some R package installation

The following triggers an error when executed in a R notebook:
install.packages("randomForest", lib='/opt/conda/lib/R/library', repo="http://cran.us.r-project.org")
The error says installation had non-zero exit status.

I had to run R directly in the container to find the issue, see attached file.

Installing gfortran with the following fixed the issue.

sudo apt-get install gfortran

I think gfortran should be installed in the image if R is to be supported.

libXrender.so.1 is required to display R plots inline

R plots such as the following only create a RPlots.pdf file in the working directory, then issue an error:

x<-1:10 plot(x)

output is:
Error in file(con, "rb"): cannot open the connection

Error in file(con, "rb"): cannot open the connection

plot without title

Installing the lib in the container with the following fixes the error

sudo apt-get install libxrender1

This issue is linked to #51

feature request: auto-tag or more frequent manual tagging of builds

There's a lot of activity on docker-stacks and our team is having a tough time building off minimal-notebook as a base image. We could fork and build our own, but I'd rather just be able to bump the tag pointing into jupyter/minimal-notebook.

The trouble is, there are only two pinnable tags right now - 3.2 and 4.0
Right now, our container only works based off more recent containers.
But we don't want to build off latest as it's to unstable.
This would be resolved if you had auotomated tagging (a third digit autoincrementing) or at least more frequent tagging.

thanks!

rzmq requires libzmq.so.4, conda installs libzmq.so.5 by default now

Short term, need to see if we can pin libzmq back to 4 or install both versions of the lib maybe.

This prevents rebuilds of any containers containing R at the moment.

conda irkernel downgrades ipython-notebook to 3.2.1

Whelp, conda is downgrading the ipython package back to 3.2.1. Everything appeared to be working smoothly and so I missed the fact that the r-notebook and all-spark-notebook images were actually running notebook server v3.2.1 in the 4.x branch and builds. Doh!

Scrolling back through the build logs, I definitely see the message from conda stating that it was downgrading the package. I'll have to see if there's a flag to error on this case instead of proceeding unabated.

@takluyver Is the R kernel compatible with notebook 4.0.x? Is this just a conda packaging update problem?

GraphViz's executables not found

I've tried installing and removing pydot, graphviz, and pyparsing to get this to work.. but haven't had any luck.

I'm trying to get through the demonstration on sklearn:
http://scikit-learn.org/stable/modules/tree.html

Thanks for any help!

InvocationException Traceback (most recent call last)
in ()
10 tree.export_graphviz(clf, out_file=dot_data)
11 graph = pydot.graph_from_dot_data(dot_data.getvalue())
---> 12 graph.write_pdf("iris.pdf")

/opt/conda/envs/python2/lib/python2.7/site-packages/pydot.pyc in (path, f, prog)
1807 self.setattr(
1808 'write_'+frmt,
-> 1809 lambda path, f=frmt, prog=self.prog : self.write(path, format=f, prog=prog))
1810
1811 f = self.dict['write_'+frmt]

/opt/conda/envs/python2/lib/python2.7/site-packages/pydot.pyc in write(self, path, prog, format)
1909 dot_fd.write(data)
1910 else:
-> 1911 dot_fd.write(self.create(prog, format))
1912 dot_fd.close()
1913

/opt/conda/envs/python2/lib/python2.7/site-packages/pydot.pyc in create(self, prog, format)
1951 if self.progs is None:
1952 raise InvocationException(
-> 1953 'GraphViz's executables not found' )
1954
1955 if not self.progs.has_key(prog):

InvocationException: GraphViz's executables not found

add the `datascience` module to the jupyter/datascience-notebook stack?

As you may know, UC Berkeley is teaching an introductory data science course entirely in Jupyter notebooks (see http://data8.org) ; which makes heavy use of the datascience module that is being developed in parallel with the course (https://github.com/dsten/datascience). Seems like it would thus be a good fit for datascience stack.

Thanks for considering!

Check and update documentation and potentially stale info about Docker images [Cross filed in jupyter/jupyter]

FYI to developers working on docker-stacks. We will be working on documentation freshness and accuracy. See details at jupyter/jupyter#87

Any documentation update work needed on docker-stacks can be left in this issue.

add rpy2 in datascience-notebook

Really enjoying these images, thanks for the great work. Would you consider adding rpy2 to the datascience-notebook image? Given that it has R and python engines already, it would be nice to be able to mix and match in a notebook.

Unable to automate installation of bash_kernel (on top of jupyter/minimal-notebook)

OK, so this is more of a question ... than an issue.

I wanted to add bash_kernel to jupyter/minimal-notebook.

Not quite sure about how start-notebook.sh is being used I decided to make a modified copy into which I added the following lines before the final exec line.

    CBIN=/opt/conda/bin                                                                                                                            

    conda install pip                                                                                                                              

    sudo -u $NB_USER $CBIN/pip install --user bash_kernel                                                                                          
    sudo -u $NB_USER $CBIN/python -m bash_kernel.install

However these commands don't seem to be executed when invoked by the CMD of the Dockerfile.
Connecting to the docker container and rerunning /start-notebook.sh installs the kernel as expected.

Why is this manual step necessary, and how what should I do so that it's not necessary?

Update notebook stacks for 4.0

Get builds going with 3.2.1 first (Issue #2 ).
Tag and/or branch as v3.2.1.
Get 3.2.1 tag/branch build going on Docker Hub.
Then get new container images on master and 4.0 branch working.

RuntimeError: Invalid DISPLAY variable

Can't get any plots to work!

RuntimeError Traceback (most recent call last)
in ()
----> 1 merged.loc[:,["GDPC1","INDPRO","UNRATE"]].plot()

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/tools/plotting.pyc in plot_frame(data, x, y, kind, ax, subplots, sharex, sharey, layout, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, secondary_y, sort_columns, *_kwds)
2486 yerr=yerr, xerr=xerr,
2487 secondary_y=secondary_y, sort_columns=sort_columns,
-> 2488 *_kwds)
2489
2490

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _plot(data, x, y, subplots, ax, kind, *_kwds)
2322 plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, *_kwds)
2323
-> 2324 plot_obj.generate()
2325 plot_obj.draw()
2326 return plot_obj.result

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/tools/plotting.pyc in generate(self)
911 self._args_adjust()
912 self._compute_plot_data()
--> 913 self._setup_subplots()
914 self._make_plot()
915 self._add_table()

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/tools/plotting.pyc in _setup_subplots(self)
958 else:
959 if self.ax is None:
--> 960 fig = self.plt.figure(figsize=self.figsize)
961 axes = fig.add_subplot(111)
962 else:

/opt/conda/envs/python2/lib/python2.7/site-packages/matplotlib/pyplot.pyc in figure(num, figsize, dpi, facecolor, edgecolor, frameon, FigureClass, *_kwargs)
433 frameon=frameon,
434 FigureClass=FigureClass,
--> 435 *_kwargs)
436
437 if figLabel:

/opt/conda/envs/python2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.pyc in new_figure_manager(num, _args, *_kwargs)
45 FigureClass = kwargs.pop('FigureClass', Figure)
46 thisFig = FigureClass(_args, *_kwargs)
---> 47 return new_figure_manager_given_figure(num, thisFig)
48
49

/opt/conda/envs/python2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.pyc in new_figure_manager_given_figure(num, figure)
52 Create a new figure manager instance for the given figure.
53 """
---> 54 canvas = FigureCanvasQTAgg(figure)
55 return FigureManagerQT(canvas, num)
56

/opt/conda/envs/python2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.pyc in init(self, figure)
70 if DEBUG:
71 print('FigureCanvasQtAgg: ', figure)
---> 72 FigureCanvasQT.init(self, figure)
73 FigureCanvasAgg.init(self, figure)
74 self._drawRect = None

/opt/conda/envs/python2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.pyc in init(self, figure)
66 if DEBUG:
67 print('FigureCanvasQt qt4: ', figure)
---> 68 _create_qApp()
69
70 # Note different super-calling style to backend_qt5

/opt/conda/envs/python2/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.pyc in _create_qApp()
136 display = os.environ.get('DISPLAY')
137 if display is None or not re.search(':\d', display):
--> 138 raise RuntimeError('Invalid DISPLAY variable')
139
140 qApp = QtWidgets.QApplication([str(" ")])