mrmino / dockernel Goto Github PK

View Code? Open in Web Editor NEW

50.0 3.0 15.0 51 KB

Dockerized Jupyter kernels.

License: MIT License

Python 100.00%

jupyter docker ipython-kernel jupyter-notebook-kernel jupyter-notebook

dockernel's Introduction

Dockernel

Makes it possible to utilize arbitrary docker images as Jupyter kernels.

Installation, prerequisites

You will need Docker (obviously). For detailed instructions on how to install it, see Get Docker page.

To install Dockernel, use Pip.

pip install dockernel

Make sure that the jupyter installation you wish to use with dockerized kernels is in the same environment as dockernel. Keep in mind, that kernels installed with Dockernel in one version may not necessarily work with a different one.

Usage

_Note for Linux users:

If you run into permission errors with docker or dockernel - either use sudo, or follow the steps outlined in Manage Docker as a non-root user guide._

Creating a Dockernel image

First, create a docker image that will host your kernel. This will require a proper dockerfile. A full example for IPython kernel can be seen here.

Most kernels take a path to a "connection file" (also called "control file" by some kernels) as a CLI argument. This file contains all of the information necessary to start up a kernel, including TCP ports to use, IP address, etc.

When running your container, Dockernel will supply this file and put it into a predefined path in the container. This path will be given via an environment variable visible in the container as $DOCKERNEL_CONNECTION_FILE.

Therefore, in order for the kernel to know the connection settings it should use, you need to pass the contents of this variable in CMD of the container. For example, for IPython kernel:

CMD python -m ipykernel_launcher -f $DOCKERNEL_CONNECTION_FILE

Or for the Rust kernel (Evcxr, see the example Rust dockerfile):

CMD evcxr_jupyter --control_file $DOCKERNEL_CONNECTION_FILE

To build your image, use docker build. E.g. to build the example mentioned above:

docker build --tag my_kernel - < example_dockerfile

Installing your image as a Jupyter Kernel

After that, use Dockernel to install the docker image as a Jupyter kernel:

dockernel install my_kernel --name dockerized_kernel

... and you should be ready to go! Fire up jupyter notebook and you should see dockerized_kernel under "New" menu, or in the "Notebook" section of the Launcher tab in jupyter lab.

Issues or questions?

Post a new issue in the Dockernell Issue Tracker at GitHub.

dockernel's People

Contributors

Stargazers

Watchers

Forkers

vmenger peccator085 tompccs naklama environmental-omics-group cduck okojoalg thenx straydragon riccardosabatini specter119 bio-grids milesial

dockernel's Issues

Command 'None' in image 'my-docker-image' returned non-zero exit status when running on Windows

Hi,

Very nice project. I'm trying to run on Windows using the example I found here:

https://stackoverflow.com/questions/63702536/can-you-make-jupyter-start-up-a-kernel-in-a-docker-container/63715102#63715102

Dockerfile contents:

FROM python:3.7-slim-buster

RUN pip install --upgrade pip ipython ipykernel
CMD python -m ipykernel_launcher -f $DOCKERNEL_CONNECTION_FILE

Install command:

docker build --tag my-docker-image /path/to/the/dockerfile/dir
pip install dockernel
dockernel install my-docker-image

This part works fine, but when I try to create a jupyter notebook, it cannot connect to kernel. From the jupyter notebook terminal:

[I 11:51:23.235 NotebookApp] Kernel started: dc7e2d74-9605-4ebf-a093-cacbce3035f2, name: c3a0179b8100
Traceback (most recent call last):
  File "C:\Users\VincentMenger\miniconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\VincentMenger\miniconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\VincentMenger\miniconda3\lib\site-packages\dockernel\__main__.py", line 10, in <module>
    sys.exit(main())
  File "C:\Users\VincentMenger\miniconda3\lib\site-packages\dockernel\__main__.py", line 6, in main
    return run(sys.argv)
  File "C:\Users\VincentMenger\miniconda3\lib\site-packages\dockernel\app.py", line 7, in run
    return run_subcommand(parsed_args)
  File "C:\Users\VincentMenger\miniconda3\lib\site-packages\dockernel\cli\main.py", line 18, in run_subcommand
    return parsed_args.func(parsed_args)
  File "C:\Users\VincentMenger\miniconda3\lib\site-packages\dockernel\cli\start.py", line 63, in start
    stderr=True
  File "C:\Users\VincentMenger\miniconda3\lib\site-packages\docker\models\containers.py", line 839, in run
    container, exit_status, command, image, out
docker.errors.ContainerError: Command 'None' in image 'my-docker-image' returned non-zero exit status 1
[I 11:51:26.230 NotebookApp] KernelRestarter: restarting kernel (1/5), new random ports

After which the same message is repeated 4 more times.

I also noticed a folder that is literally called {connection_file} created in my windows home folder.

Is this a Windows compatibility issue? Or does the dockerfile not contain the appropriate commands? Happy to help solve the issue / debug some things on Windows but not sure where this issue arises. Any help is much appreciated.

Subcommand for adding a kernel on top of an already built image

To make truly arbitrary Dockernel images runnable as kernels, it would be nice to have a way of extending them with e.g. bash kernel.

Create a subcomand that takes a name of an image and adds a kernel layer on top of it, creating another image in the process.

Images by Dockernel

One of the killer features would be to have some predefined images that can be pulled using dockernel install.

Things like fast Haskell setup - dockernel install haskell?

Maybe introduce subcommand for it - dockernel get haskell looks nice

Languages that could use a builtin image:

Bundles:

Pandas / SciPy stack
99 Questions series - in all available languages, if possible to ship with a notebook (and if licensing permits)

Afterwards, maybe test all of the hard-to-install AI libraries that utilize a GPU, and if possible to passthrough to Docker - image them up too.

Stop docker container after closing kernel

Currently, docker containers keep running when a kernel/notebook is closed. This can start eating up a lot of resources over time.

To investigate:

How does jupyter arrange cleanup of kernels?
Can this command be integrated to stop docker container?

How to mount a local directory?

How can I mount a local directory to the kernel? Normally I'd use the "docker run -v" switch. But how to do this with dockernel? Thx.

Image pulling

Currently, when the image is not found by the docker daemon, dockernel install fails with a traceback, without even trying to search for the image on the web.

Make it convenient to download Dockernel images, without having to do any docker pull. Might be a separate subcommand - #2 proposed dockernel get. This would also make the subcommand issue the install() afterwards.

Dockernel images would get some priority on naming, so that people don't have to type dockernel get dockernelized/rust in order to get the Rust kernel, a simple dockernel get rust should suffice.

Some metadata might be stored with the image, so that things like --language can be populated automatically. It might be also a good idea to restrict pulling to images that contain some predefined "Dockernel-compatible" marker. With an override, of course.

Restart docker container when restarting kernel

Currently, when a kernel is restarted, the docker container keeps running and the kernel then reconnects to it. This is not consistent with normal restarting behaviour, when all variables are lost.

Figure out:

How jupyter handles a kernel restart
If possible, integrate with dockernel so that docker container is restarted as well

Improve CI

I have little experience with GitHub actions, so current CI is just publishing step, taken from my other project. This has to be improved.

In order for this CI to be any good, the following has to be added:

Linting step
Blackenning step
PyTest unit tests
Some E2E testing, incl. Docker API calls, if possible
Proper artifact management: sdist and whl files should show up as artifacts.

Using with Podman

Attempting to use with Podman on RHEL8 results in the following error:

(hub_env) [root@*******]# dockernel install localhost/pyspark:3.2 --name spark-env
Traceback (most recent call last):
  File "/data/mambaforge-pypy3/envs/hub_env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/data/mambaforge-pypy3/envs/hub_env/lib/python3.9/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/data/mambaforge-pypy3/envs/hub_env/lib/python3.9/http/client.py", line 1257, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/data/mambaforge-pypy3/envs/hub_env/lib/python3.9/http/client.py", line 1303, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/data/mambaforge-pypy3/envs/hub_env/lib/python3.9/http/client.py", line 1252, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/data/mambaforge-pypy3/envs/hub_env/lib/python3.9/http/client.py", line 1012, in _send_output
    self.send(msg)
  File "/data/mambaforge-pypy3/envs/hub_env/lib/python3.9/http/client.py", line 952, in send
    self.connect()
  File "/data/mambaforge-pypy3/envs/hub_env/lib/python3.9/site-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Any idea on this? Podman is a daemonless, non-root process so I thought it would integrate well. I did alias docker=podman to no avail

Location of karnels directory

Thanks for your great job.

On Linux, the location of the karnels directory seems to be fixed at ~/.local/share/jupyter/kernels. It would be nice if there was an option to change it to any location.

.io page

Come up with a good layout for a static home page on GitHub pages
Create the page

Extensions installed via pip not visible in "host" Jupyter

I love the idea to simply use a process in a docker container as kernel.
I see an issue, though:
When you run pip within this kernel, you install stuff in the docker container, right?

Some pip packages come with jupyter extensions.
If you install them in the virtual environment within the container, the jupyter instance running "outside" has no access to those.

If I install stuff via ! pip in jupyterlab and run docker diff on the container running my kernel, I see that stuff was installed to e.g.

/usr/local/lib/python3.7/site-packages/
/usr/local/etc/jupyter/nbconfig/notebook.d/widgetsnbextension.json
/usr/local/share/jupyter/labextensions/@jupyter-widgets/jupyterlab-manager/schemas/@jupyter-widgets/jupyterlab-manager/plugin.json

(all in the container).

As far as I understand, I can supply additional directories, where jupyter lab searches for extensions.
The documentation mentions JUPYTER_CONFIG_PATH and JUPYTER_PATH.

The proposal would be:
If we mount a directory on the host as volume in the container running the kernel and then modify the environmental variables (or modify the paths by some other configuration setting), then jupyter should get access to the thus installed extensions.

Document architecture

Dockernel uses lots of arcane details of how Jupyter / IPython works, and does so in a bit of a "magical" manner.

Document its architecture & how it works under the hood.

Ideas:

Maybe do it inside a Jupyter notebook, so that cells can calls to the actual Dockernel code?
Make it available from dockernel itself? E.g. dockernel help architecture?

Add instructions to readme

Current README.md contents lack:

A nice elevator pitch for the project
Installation instructions (links to Docker installation guide, pip install dockernel)
How to install a kernel into Jupyter using Dockernel (usage)
Fast installation of different kernels with Images by Dockernel (#2)
How to create a proper Docker image for use with Dockernel (detailed instructions)

Cannot install example dockerfile

I am unable to install any docker images as jupyter kernels. All attempts end with ValueError: kernelspec already exists: /Users/ben/Library/Jupyter/kernels..

Steps to reproduce:

Create Dockerfile with contents from https://github.com/MrMino/dockernel/blob/master/example_dockerfile
Run the following:

docker build --tag dockerneltest .
pip install dockernel
dockernel install dockerneltest

Results:

Traceback (most recent call last):
  File "/Users/ben/opt/anaconda3/bin/dockernel", line 8, in <module>
    sys.exit(main())
  File "/Users/ben/opt/anaconda3/lib/python3.7/site-packages/dockernel/__main__.py", line 6, in main
    return run(sys.argv)
  File "/Users/ben/opt/anaconda3/lib/python3.7/site-packages/dockernel/app.py", line 7, in run
    return run_subcommand(parsed_args)
  File "/Users/ben/opt/anaconda3/lib/python3.7/site-packages/dockernel/cli/main.py", line 23, in run_subcommand
    return parsed_args.func(parsed_args)
  File "/Users/ben/opt/anaconda3/lib/python3.7/site-packages/dockernel/cli/install.py", line 77, in install
    install_kernelspec(location, kernelspec)
  File "/Users/ben/opt/anaconda3/lib/python3.7/site-packages/dockernel/kernelspec.py", line 163, in install_kernelspec
    raise ValueError(f"kernelspec already exists: {kernelspec_dir}.")
ValueError: kernelspec already exists: /Users/ben/Library/Jupyter/kernels.

Package virtual environments / freeze (local work) into docker images?

On TalkPython #308 the host mentioned reproducing scientific work using docker images.

Dockernel could generate required dockerfiles / build them with a kernel from a freeze or virtualenv, so that one could package their local work into an image and run a kernel from that.

Consider alternative docker image atttribute as kernel_id

First of all great work of yours! appreciate it.

Though I met some problems when trying to install a docker image as a KernelSpec (tensorflow/tensorflow:latest-gpu to be precise).
got an Error

ValueError: kernelspec already exists: /home/lubricy/.local/share/jupyter/kernels

it turns out that image.attrs['ContainerConfig']['Hostname'] is just an empty string.

I'm not quite familiar with docker image attrs but perhaps a sanitized image tag (e.g. tensorflow_tensorflow_latest-gpu)would work better here? or perhaps rather let user to decide the name/id of there kernel?

Another unrelated thing is I think it would be nice if dockernel start (and dockernel install) could accept docker cli flags (e.g. -v VOLUME , -p PORT, -u USER etc.). a particular flag I'd like to use is --gpus all but it seems like it's not natively supported by py-docker upstream. Maybe give python-on-whales a try?

Can't install

I have a problem that:

Traceback (most recent call last):
  File "/data/yyf-qrn/conda/envs/qrn/bin/dockernel", line 8, in <module>
    sys.exit(main())
  File "/data/yyf-qrn/conda/envs/qrn/lib/python3.8/site-packages/dockernel/__main__.py", line 6, in main
    return run(sys.argv)
  File "/data/yyf-qrn/conda/envs/qrn/lib/python3.8/site-packages/dockernel/app.py", line 7, in run
    return run_subcommand(parsed_args)
  File "/data/yyf-qrn/conda/envs/qrn/lib/python3.8/site-packages/dockernel/cli/main.py", line 23, in run_subcommand
    return parsed_args.func(parsed_args)
  File "/data/yyf-qrn/conda/envs/qrn/lib/python3.8/site-packages/dockernel/cli/install.py", line 77, in install
    install_kernelspec(location, kernelspec)
  File "/data/yyf-qrn/conda/envs/qrn/lib/python3.8/site-packages/dockernel/kernelspec.py", line 163, in install_kernelspec
    raise ValueError(f"kernelspec already exists: {kernelspec_dir}.")
ValueError: kernelspec already exists: /home/jupyter-yyf-qrn/.local/share/jupyter/kernels.

Could you tell me how to solve it?

Can't interrupt kernel cell

I use the example docker file to build a kernel: https://github.com/MrMino/dockernel/blob/master/example_dockerfile

However, I notice whenever I interrupt a cell, it will restart the kernel automatically. For the interrupt mode, I use the default interrupt mode (the SIGINT one).

Any suggestion why this happen?

Make configurable file/directory bindings

It would be nice to be able to mount local directories to the docker image so that files can be used in the notebook.

I could see it work:

As a parameter to the docker install command e.g. dockernell install image_1 --bind c:/data /opt/data/ ~/.aws ~/.aws
In an optional config file stored in ~/.dockernel (e.g. a json file with {"name": "image_1", "bindings" : {"C:/data": "/opt/data/", "~/.aws": "~/.aws"}})

Second one would be especially useful for mounting config files like .aws or pip.ini in some docker kernel. Nice addition moreover would be to add the option to mount to all dockernel installs.

Remove / list dockernels

Come up with a way of marking kernelspec as created by Dockernel, possibly with version specs - kernel.json has metadata field
Add a subcommand, that list dockernels and their details
Add a convenience subcommand, that removes a dockernel-spec from Jupyter.

Improve error messages

Make the following errors show an user-friendly message instead of the traceback:

When installing same image twice
When installing image that cannot be found
When Jupyter kernel directory is not in the user's home directory (Jupyter is not installed?)
When docker module runs into "Permission Denied" (maybe add instructions on how to run docker without sudo?)

Output the traceback only when a --verbose or --debug flag is given

Kernel not found after selecting it

Hi, I just followed the instructions but after selecting the kernel it shows Kernel Not Found

The kernel I created with dockernel is:

bcfe8c993c05 /home/azureuser/.local/share/jupyter/kernels/bcfe8c993c05

Which shows up as "prueba_dockernel"

Could it have to do with the location where azureML is storing kernels?

Thanks!

Support for "--gpus all" and other possible container properties

Hi!

First of all, great project! Is it still alive? There is no activity since April.

In continue of discussion in gh-19.

I looked into possible ways of implementing support for --gpus all cli argument and other arguments of docker.containers.run.

There are a lot of arguments for docker.containers.run, besides device_requests (throught which --gpus all implemented). Most of them are useful in one or another way. It seems to me, there is no good way to specify them all as command-line arguments for dockernel .

What do you think about other ways?

Command-line argument to specify custom runner script for container, where user can configure all required properties;
Command-line argument to specify separated config for the container, from which dockernel will get arguments for docker.containers.run.

The config file may need to use Python, since docker.containers.run requires complex data structures, like docker.types.DeviceRequest. So, the first approach fills easier to implement and use.