opendatahub-io-contrib / workbench-images Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 24.0 6.22 MB

Various custom Workbenches and Runtimes for Open Data Hub and OpenShift Data Science

License: MIT License

Dockerfile 17.71% Python 50.50% Shell 17.87% Makefile 6.81% Jupyter Notebook 0.84% sed 0.95% Vim Snippet 5.33%

workbench-images's People

Contributors

Stargazers

Watchers

workbench-images's Issues

Change codeflare-sdk allowed updates

As codeflare-sdk is under heavy development, version requirements should be less restrictive, from ~=0.7.1 (allowing only 0.7.z) to ~=0.7 (allowing 0.y).
@anishasthana would that be ok with you?

add Oracle 12 drivers to base images

DB Connectivity to Postgres and SQL Server is working nicely, thank you very much for having done a lot of heavy lifting in 2023c.

Could you also add drivers for Oracle 12 ff, ie 19c, 21c, seems to be mix of python-oracledb and Oracle Instant Client, to the base images, please?

"The first step is to install and configure the necessary tools to enable communication between Jupyter Notebook and Oracle Database. This typically involves installing the Oracle Instant Client, which provides the required libraries and drivers, and configuring the environment variables."

Looks like it can be installed via rpm or yum on Oracle Linux, but on Centos, one has to download the RPM.

https://www.oracle.com/in/database/technologies/instant-client/linux-x86-64-downloads.html#ic_x64_inst

https://python-oracledb.readthedocs.io/en/latest/user_guide/installation.html

whether or not thin (python-oracledb) or thick mode (python-oracledb plus Oracle Instant Client), I still need to figure out. Maybe someone in your vicinity knows more. It looks to me from the feature comparison matrix as though if backward compatiblity less than Oracle 12 is not needed, then thin mode, just using python-oracledb, is enough.

https://python-oracledb.readthedocs.io/en/latest/user_guide/appendix_a.html#featuresummary

As mentioned, thick mode including oracle instant client
https://www.oracle.com/database/technologies/instant-client.html
should not be necessary, meaning this part here is optional:
https://python-oracledb.readthedocs.io/en/latest/user_guide/initialization.html#enablingthick

The documentation itself on python-oracledb is very extensive.

Best regards

Sven

python data path is where code lives, so pylint assumes all project files are system files and ignores them

2023-09-28 18:58:38.363 [info] [Warn  - 6:58:38 PM] Skipping standard library file: /opt/app-root/src/llamaindex-rag-example/react_example.py

sysconfig.get_paths()
{'stdlib': '/usr/lib64/python3.11', 'platstdlib': '/opt/app-root/lib64/python3.11', 'purelib': '/opt/app-root/lib/python3.11/site-packages', 'platlib': '/opt/app-root/lib64/python3.11/site-packages', 'include': '/usr/include/python3.11', 'platinclude': '/usr/include/python3.11', 'scripts': '/opt/app-root/bin', 'data': '/opt/app-root'}

microsoft/vscode-black-formatter#272
https://stackoverflow.com/questions/76187582/python-file-not-formating-in-vscode-due-to-it-being-skipped-by-formatter

RStudio and VSCode try to connect to HTTP

In the connection flow, there is a redirect to an http session (before being sent back to https) that makes thing impossible to work on networks filtering http traffic (the first call, normally answered with a redirect, never makes it to OpenShift).
Investigate NGINX configuration.

Rstudio complaining about missing header files when installing specific packages

Hello.

Running install.packages("tidyverse) in Rstudio generates in image quay.io/opendatahub-contrib/workbench-images:rstudio-r-c9s-py311_2023c_latest:

Package fribidi was not found in the pkg-config search path.
Perhaps you should add the directory containing `fribidi.pc'
to the PKG_CONFIG_PATH environment variable
Package 'fribidi', required by 'virtual:world', not found
Using PKG_CFLAGS=
Using PKG_LIBS=-lfreetype -lharfbuzz -lfribidi -lpng
--------------------------- [ANTICONF] --------------------------------
Configuration failed to find the harfbuzz freetype2 fribidi library. Try installing:

deb: libharfbuzz-dev libfribidi-dev (Debian, Ubuntu, etc)
rpm: harfbuzz-devel fribidi-devel (Fedora, EPEL)
csw: libharfbuzz_dev libfribidi_dev (Solaris)
brew: harfbuzz fribidi (OSX)
If harfbuzz freetype2 fribidi is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a harfbuzz freetype2 fribidi.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
-------------------------- [ERROR MESSAGE] ---------------------------
:1:10: fatal error: hb-ft.h: No such file or directory
compilation terminated.

Elyra workbench with Airflow support

The current ODH/RHOAI workbenches come with Elyra-KFP for out-of-the-box integration with Data Science Pipelines.
Upstream Elyra supports Airflow as an additional backend. I'm proposing a new community workbench image for the explicit purpose of developing and submitting Airflow pipelines through Elyra, with the option of integrating Github or Gitlab based git servers.

Add sqlalchemy with the default DB packages

vscode default python interpreter is not the terminal python interpreter

The default available/selected interpreter for VSCode itself is /bin/python3.11
The terminal default interpreter is /opt/app-root/bin/python

Because of this, when you install packages with pip install in the terminal (which you have to do), they don't go where vscode is looking for them. This results in pylint using the wrong python which then can't find any of the modules you installed, so all the intellisense things break.

data science snippet includes packages like nvidia* that blow up image to 6GB even when no Torch or Cuda / GPU is needed

I noticed that with the latest addition of packages on October 25 to the bundle 2-datascience snippet, image size has increased, when using the regular base image, to 6B from 1.3 GB before.

Is that intended?

https://github.com/opendatahub-io-contrib/workbench-images/blob/main/snippets/bundles/2-datascience/py39/requirements.txt

I noticed that torch dependencies, nvidia stuff are all added during a normal image build.

I don't know, wondering whether for example the nvidia* packages could be added conditionally only in case torch or Tensorflow or Cuda or GPU is needed / selected?

Issues with interactive-image-buider "typing"

When working with the interactive-image-builder if you attempt to type a number before the script finishing "typing" the number you type will not appear in the UI, but it appears to still register it. So if you attempt to type the number it throws an error because it thinks you provided an invalid option:

Add pandoc to the base image

Pandoc is an OS package needed by nbconvert to export notebooks to various formats like PDF.
As it's missing nbconvert does not work for at least some formats.

Spark Image Doesn't Start

The current latest spark image doesn't appear to start successfully.

quay.io/opendatahub-contrib/workbench-images:runtime-spark-c9s-py311_2023c_latest

The pod goes into a crashLoopBackoff state and the logs for the primary container report the following:

This is a S2I python-3.11 centos base image:
There are multiple ways how to run the image, see documentation at:
https://github.com/sclorg/s2i-python-container/blob/master/3.11/README.md
To use it in Openshift, run:
oc new-app python:3.11~https://github.com/sclorg/s2i-python-container.git --context-dir=3.11/test/setup-test-app/
You can then run the resulting image via:
oc get pods
oc exec <pod> -- curl 127.0.0.1:8080

predefined pipeline runtime images reference by sha256 hash

https://github.com/opendatahub-io-contrib/workbench-images/tree/main/snippets/ides/1-jupyter/files/runtime-images

Consider referencing the images by digest instead of by tag. On the one hand, this really ensures immutable image references during packaging. On the other hand, using digest format is better for ImageContentSourcePolicy in disconnected environments.

@harshad16 @VaishnaviHire @Jooho relevant for list of images to include in operator bundle packaging, too. It is best practice to use digest format throughout.

You can instead add the full tag including RELEASE under tags instead of image_name. Basically just a different place to use for the tag. image_name is used in pulling images. @harshad16 tags are shown in Elyra GUI, just like display_name.

Also, since sha256 unique digests are used now, add pull_policy with value ifNotPresent (which in pipeline runtime containers translates to imagePullPolicy).

i.e.

{
    "display_name": "CUDA Datascience with Python 3.11 (CentOS Stream 9)",
    "metadata": {
        "tags": ["cuda-runtime-datascience-c9s-py39_RELEASE_latest"],
        "display_name": "CUDA Datascience with Python 3.11 (CentOS Stream 9)",
        "image_name": "quay.io/opendatahub-contrib/workbench-images@sha256:bd565c0e8b4e71d7ae3fd556a2f874e6251d80db1c4c36bfe1d490f04df2bdb1",
        "pull_policy": "IfNotPresent"
    },
    "schema_name": "runtime-image"
}

See

typhoonzero/elyra@f369c96#diff-1a8d23b67da009a738fc0755eac24d28363fd900a989eaa32d58058c9c8771e5

Regarding automation: skopeo inspect can get an image's digest it seems.

skopeo inspect --format "{{ .Digest }}" docker://quay.io/opendatahub-contrib/workbench-images:cuda-runtime-datascience-c9s-py311_2023c_latest
sha256:bd565c0e8b4e71d7ae3fd556a2f874e6251d80db1c4c36bfe1d490f04df2bdb1

When used to dynamically add json files containing the runtime images, skopeo could be used as well, depending on user input. Question is if the list of usable runtime images should even be static, is there i.e. a Github actions workflow that keeps runtime images list in sync with Jupyter GUI IDE images? For contrib, I could even imagine assembling that list of jsons on the fly with image builder command line utility, see linked issue.

The release search and replace can stay as it is at https://github.com/opendatahub-io-contrib/workbench-images/blob/main/snippets/ides/1-jupyter/jupyter.snippet#L62

Update the url for the spawn-fcgi package on code-server and r-studio images

The R Studio and Code Server Dockerfiles are using an obsolete URL to fetch the spawn-fcgi package.

You may update it to: https://www.rpmfind.net/linux/fedora/linux/releases/37/Everything/x86_64/os/Packages/s/spawn-fcgi-1.6.3-23.fc37.x86_64.rpm

issues with path when launching streamlit

I saw this when testing a streamlit app that was using a local csv file.

The command to launch streamlit looks like:

streamlit-launcher.sh -f streamlit-showcase/app.py -p 8503 -s https://jupyterhub-redhat-ods-applications.apps.cluster.openshiftapps.com/user/mysuer/proxy/8503/

The problem is that if I have the following structure:

(app-root) (app-root) ls -al streamlit-showcase/
-rw-r--r--.  1 1000970000 1000970000      0 Jun 23 18:16 app.py
-rw-r--r--.  1 1000970000 1000970000     66 Jun 23 18:15 file.csv

it would be safe to assume that app.py will refer to either file.csv or ./file.csv.

But because we launched from one folder up (streamlit-launcher.sh -f streamlit-showcase/app.py) we don't find the csv file.

Therefore, if it is possible, I think it would be better if the launch sequence was:

cd streamlit-showcase/
streamlit-launcher.sh -f app.py -p 8503 -s https://jupyterhub-redhat-ods-applications.apps.cluster.openshiftapps.com/user/mysuer/proxy/8503/

Jupyter CUDA Tensorflow: Git client and Elyra broken

The Jupyter CUDA Tensorflow image (repository) needs to be fixed:

The JupyterLab git client is unable to clone git repositories.
Elyra is unable to connect to RHOAI pipeline servers.

More recent version of Streamlit.

version 1.10.0 of Streamlit has some pretty nice features (like multi-page).

R Studio and VS Code Logout from Openshift Oauth possible?

with oauth proxy and redirects, there is usually also a logout url that in essence terminates the "let me pass" and forces a new show of the oauth openshift proxy / login dialog of Openshift.

I know that ODH Dashboard allows for that. Have not tested whether regular Jupyter Notebooks logout lands back on the openshift ose-oauth proxy Openshift OAuth Page and destroys all session cookies.

But regarding R Studio and VS Code:

Singing out of ose-oauth container / openshift oauth.

Is there a possibility to call that somehow? I guess it would need a change to R Studio itself ... instead of hiding the menus in R, changing their target url.

include assembly logic for list of Elyra pipeline runtime image refs in jsons in interactive-image-builder.sh

There is currently a list of release (i.e. 2023c) -specific Elyra pipeline runtime images that automatically get added to Jupyter IDE images config (not the images themselves, merely the references in json format)

https://github.com/opendatahub-io-contrib/workbench-images/tree/main/snippets/ides/1-jupyter/files/runtime-images

Consider adding steps in interative image builder script to on-the-fly add description, image_name and tags for runtime images instead of having this very static list of runtime images.

Related to #44

requested access to the resource is denied, interactive-image-builder.sh

I am seeing the following error, when I try to build the image.

127.0.0.1 $ podman build -t workbench-images:jupyter-pytorch-c9s-py311_2023c_20240110 .
STEP 1/25: FROM workbench-images:base-c9s-py311_2023c_20240110
Resolving "workbench-images" using unqualified-search registries (/etc/containers/registries.conf.d/999-podman-machine.conf)
Trying to pull docker.io/library/workbench-images:base-c9s-py311_2023c_20240110...
Error: creating build container: initializing source docker://workbench-images:base-c9s-py311_2023c_20240110: reading manifest base-c9s-py311_2023c_20240110 in docker.io/library/workbench-images: requested access to the resource is denied

which looks valid error because the base image doesn't exist. Here is the content of the produced Containerfile.

FROM workbench-images:base-c9s-py311_2023c_20240110

LABEL name="workbench-images:jupyter-pytorch-c9s-py311_2023c_20240110" \
    summary="jupyter-pytorch workbench image with Python py311 based on c9s" \
    description="jupyter-pytorch workbench image with Python py311 based on c9s" \
    io.k8s.description="jupyter-pytorch workbench image  with Python py311 based on c9s for ODH or RHODS" \
    io.k8s.display-name="jupyter-pytorch workbench image  with Python py311 based on c9s" \
    authoritative-source-url="https://github.com/opendatahub-contrib/workbench-images" \
    io.openshift.build.commit.ref="2023c" \
    io.openshift.build.source-location="https://github.com/opendatahub-contrib/workbench-images" \
    io.openshift.build.image="https://quay.io/opendatahub-contrib/workbench-images:jupyter-pytorch-c9s-py311_2023c_20240110"

...

I have been using the default selection while running interactive-image-builder.sh. My expectation is for the script to generate a runnable Containerfile. Is this the correct approach, or should I be using a different base image?

opendatahub-io-contrib / workbench-images Goto Github PK

workbench-images's People

Contributors

Stargazers

Watchers

Forkers

workbench-images's Issues

Recommend Projects

Recommend Topics

Recommend Org