Code Monkey home page Code Monkey logo

Comments (21)

betolink avatar betolink commented on August 30, 2024 2

I think this issue can be closed. We ended up managing it at the custom base image level, another option for future deployments would be for the configuration to allow multiple user images (Jupyterhub profiles)

from docs.

damianavila avatar damianavila commented on August 30, 2024 1

Your Hub administrator should be able to set up a customized environment: https://pilot.2i2c.org/en/latest/admin/howto/environment.html. If the environment persistence is useful/needed in your use case, a custom Dockerfile adding the lines @consideRatio suggested should be enough to support it, IMHO.
Btw, we are in fact testing some new tooling to allow admins to self-serve the creation of the environment they are going to put in front of the users so they do not need to build it by themselves, just configure it.

from docs.

damianavila avatar damianavila commented on August 30, 2024 1

@betolink, FYI, we are discussing the pros vs cons of shipping this by default.
In the meantime, I encourage you to ping your hub admin so they can customize the image with the snippet @consideRatio shared above. In that way, we decouple the current technical discussion about this change from the customization you may need (that could be done by your hub admin without us being a blocker for your use case).

from docs.

sgibson91 avatar sgibson91 commented on August 30, 2024 1

Thanks @sgibson91! is 783616723547.dkr.ecr.us-west-2.amazonaws.com/user-image coming from https://github.com/2i2c-org/openscapes-image/? Oh I have so many questions and I don't want to spam you all.

Yes is does look like that repository is the source of the image.

I assume there is a good reason why the image is being pushed to AWS ECR instead of Dockerhub.

I am not sure actually. Our default image repository is quay.io as that doesn't have the same rate limiting issues as DockerHub has.

from docs.

betolink avatar betolink commented on August 30, 2024

Looks like this works in user space so no sudo required.

conda activate {environment}
ipython kernel install --name "{environment}" 

However this doesn't seem like a persistent operation, when my instance was restarted the kernel was not there.

from docs.

consideRatio avatar consideRatio commented on August 30, 2024

I think by adding this section to the Dockerfile, conda will default to creating environments in the home folder that is persistent.

# Configure conda/mamba to create new environments within the home folder by
# default. This allows the environments to remain in between restarts of the
# container if only the home folder is persisted.
RUN conda config --system --prepend envs_dirs '~/.conda/envs'

Example from the hub.jupytearth.org's Dockerfile for the user environment image.

@2i2c-org/2i2c-team is this perhaps sensible to add for the pilot hubs default Dockerfile?

from docs.

damianavila avatar damianavila commented on August 30, 2024

@2i2c-org/2i2c-team is this perhaps sensible to add for the pilot hubs default Dockerfile?

IMHO, that should be something users should decide upon.
There are several reproducibility/replicability workflows where starting from a fresh environment that is "codified" in some image/dockerfile somehow helps that others can do the same as you did...
In fact, I would be surprised to see some environments persisted by default after restarting my pod 😉

from docs.

betolink avatar betolink commented on August 30, 2024

I think some Pangeo deployments let you pick the user base image? (with Openscapes we only pick the EC2 instance type). Maybe something like that would be useful. A hub administrator could add different repos for different environments.

At the moment OpenScapes is mainly working on https://github.com/NASA-Openscapes/earthdata-cloud-cookbook which requires some initial prototyping. Environment persistence between restarts would be handy to have until we are in "production mode"

from docs.

betolink avatar betolink commented on August 30, 2024

I guess I need to find out who is our hub admin and see if we can get the Dockerfile approach + persistence. One thing I noticed from the documentation is that you discourage the use of quay.io/my-user/my-image:latest and for prototyping I was precisely thinking about having something like that (so if I modify the environment a hub admin doesn't have to update the build tag).

from docs.

damianavila avatar damianavila commented on August 30, 2024

One thing I noticed from the documentation is that you discourage the use of quay.io/my-user/my-image:latest

Yes, having specific references (tags) is important to really know the environment are you working with.

and for prototyping I was precisely thinking about having something like that (so if I modify the environment a hub admin doesn't have to update the build tag).

As I said before, we are currently testing some new tooling to prototype/test and eventually self-serve the environment customization. Currently, the process looks like this: https://github.com/2i2c-org/peddie-image

Would you be interested to have something like this for openscapes?

from docs.

betolink avatar betolink commented on August 30, 2024

Just read this tooling and looks like step 4 is what I wanted to avoid, since it requires a hub admin.

Open the Configurator for the peddie hub (you need to be logged in as an admin).

The important part would be to have an agile way of altering the environment while we are prototyping. I think just persisting my home directory as @consideRatio suggested would be enough for now.

from docs.

betolink avatar betolink commented on August 30, 2024

@damianavila, I just got admin credentials this morning and went to the "configurator" page. I see a box to enter a docker image name for the users and the default interface (RStudio, Lab or classic notebooks) but I don't see what image the users are running now. I don't want to disrupt what other users are doing by just entering my customized image. Is there a way to find out what image users are running now? so at least I can clone those dependencies and add the edit to persist the environment.

from docs.

sgibson91 avatar sgibson91 commented on August 30, 2024

Hi @betolink - you can see the image reference in these lines of the config file

https://github.com/2i2c-org/pilot-hubs/blob/a6f2e354399cc08275c16f49b0d92f75e11e6030/config/hubs/openscapes.cluster.yaml#L63-L65

from docs.

betolink avatar betolink commented on August 30, 2024

Thanks @sgibson91! is 783616723547.dkr.ecr.us-west-2.amazonaws.com/user-image coming from https://github.com/2i2c-org/openscapes-image/? Oh I have so many questions and I don't want to spam you all.

I guess I could open an issue on the openscapes image repository to add what @consideRatio suggested. I assume there is a good reason why the image is being pushed to AWS ECR instead of Dockerhub.

from docs.

betolink avatar betolink commented on August 30, 2024

One last thing (perhaps) I noticed a substantial performance hit when I installed a conda environment on my home directory. My guess is that this may be related to the home directory being mounted on EFS?

How to reproduce?

mamba env create -f environment.yml 

vs

mamba env create -f environment.yml -p /home/jovyan/{environment}

from docs.

choldgraf avatar choldgraf commented on August 30, 2024

Hey all - just wanted to boost this comment as well, which might be an interesting option for managing different conda environments from within Jupyter: 2i2c-org/infrastructure#562 (comment)

from docs.

betolink avatar betolink commented on August 30, 2024

nb-conda-kernels sounds like a good option. We would still need some form of persistence right? otherwise we'll have to install an environment every time we start our instance. I wonder, is there a way for Jupyter hubs to configure base images per user and not hub-wide? a bit like binder + user space persistence?

from docs.

damianavila avatar damianavila commented on August 30, 2024

My guess is that this may be related to the home directory being mounted on EFS?

Most likely that is the case, EFS is slow for this kind of conda things. So you have persistence at the cost of performance...

I wonder, is there a way for Jupyter hubs to configure base images per user and not hub-wide?

Not a per-user option, but maybe using different profiles pointing to different images that you, as a specialized user, can customize?

https://zero-to-jupyterhub.readthedocs.io/en/latest/jupyterhub/customizing/user-environment.html#using-multiple-profiles-to-let-users-select-their-environment

I imagine your use-case as-is:

One X profile in addition to the base one.
That X profile loads a docker image that is actually creating the environments you may need in the Dockerfile (and maybe installing nb_conda_kernels to manage them). In addition, that Dockerfile could contain all the customizations that Erik proposed so your conda envs are saved in /home (and persisted).
The user who wants that experience would select that X profile and they will have all the environments predefined in the Dockerfile + all the new ones that are created "live" by the user and persisted at /home.
If the user modifies one of the environments "coming" from the Dockerfile, they can "promote" the customization by just modifying the Dockerfile, pushing it, and using the Configurator to update the reference (you could even think about using a latest reference and the Configurator step would be not needed, although it is not recommended to use latest unless you have a real good reason for that 😜 ).
If the user works with one of their /home-backed environment, that would be automatically persistent (at the EFS slowness cost) but that one could be "promoted" to the Dockerfile when the user is enough happy about it...

from docs.

sgibson91 avatar sgibson91 commented on August 30, 2024

I think this is another use case where bringing the JupyterHub and BinderHub helm charts closer together will provide a solution, as we will be able to provide workflows closer to what the persistent BinderHub helm chart does https://github.com/gesiscss/persistent_binderhub i.e. a user can create an environment on the fly from a repo using repo2docker and these environments are persisted

from docs.

betolink avatar betolink commented on August 30, 2024

I think having something like you both described would simplify many workflows. A Hub admin would be responsible for infrastructure, (i.e. credentials, shared mounts, instance types). Researchers will build their environment from a github repo(using repo2docker or similar.) and select the instance type they want to run this environment on. I think just having the flexibility to bootstrap an environment like Binder will reduce the need for persisting changes to the base image, since we can make those changes in the original repository and presumably persistence will be used for just work in progress or sample data but not whole Conda environments.

from docs.

jules32 avatar jules32 commented on August 30, 2024

Hi 2i2c team, thanks for all the discussion here and in 2i2c-org/infrastructure#562. Does this sound like something 2i2c can support? @betolink and @amfriesz can start coordinating/preparing stuff on our end but we wanted to first confirm if this is something you'll be moving forward with, and if you know a rough timeline. @choldgraf I'm happy to chat about it too if you'd like

from docs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.