Comments (21)
I think this issue can be closed. We ended up managing it at the custom base image level, another option for future deployments would be for the configuration to allow multiple user images (Jupyterhub profiles)
from docs.
Your Hub administrator should be able to set up a customized environment: https://pilot.2i2c.org/en/latest/admin/howto/environment.html. If the environment persistence is useful/needed in your use case, a custom Dockerfile adding the lines @consideRatio suggested should be enough to support it, IMHO.
Btw, we are in fact testing some new tooling to allow admins to self-serve the creation of the environment they are going to put in front of the users so they do not need to build it by themselves, just configure it.
from docs.
@betolink, FYI, we are discussing the pros vs cons of shipping this by default.
In the meantime, I encourage you to ping your hub admin so they can customize the image with the snippet @consideRatio shared above. In that way, we decouple the current technical discussion about this change from the customization you may need (that could be done by your hub admin without us being a blocker for your use case).
from docs.
Thanks @sgibson91! is
783616723547.dkr.ecr.us-west-2.amazonaws.com/user-image
coming from https://github.com/2i2c-org/openscapes-image/? Oh I have so many questions and I don't want to spam you all.
Yes is does look like that repository is the source of the image.
I assume there is a good reason why the image is being pushed to AWS ECR instead of Dockerhub.
I am not sure actually. Our default image repository is quay.io as that doesn't have the same rate limiting issues as DockerHub has.
from docs.
Looks like this works in user space so no sudo
required.
conda activate {environment}
ipython kernel install --name "{environment}"
However this doesn't seem like a persistent operation, when my instance was restarted the kernel was not there.
from docs.
I think by adding this section to the Dockerfile, conda
will default to creating environments in the home folder that is persistent.
# Configure conda/mamba to create new environments within the home folder by
# default. This allows the environments to remain in between restarts of the
# container if only the home folder is persisted.
RUN conda config --system --prepend envs_dirs '~/.conda/envs'
Example from the hub.jupytearth.org's Dockerfile for the user environment image.
@2i2c-org/2i2c-team is this perhaps sensible to add for the pilot hubs default Dockerfile?
from docs.
@2i2c-org/2i2c-team is this perhaps sensible to add for the pilot hubs default Dockerfile?
IMHO, that should be something users should decide upon.
There are several reproducibility/replicability workflows where starting from a fresh environment that is "codified" in some image/dockerfile somehow helps that others can do the same as you did...
In fact, I would be surprised to see some environments persisted by default after restarting my pod 😉
from docs.
I think some Pangeo deployments let you pick the user base image? (with Openscapes we only pick the EC2 instance type). Maybe something like that would be useful. A hub administrator could add different repos for different environments.
At the moment OpenScapes is mainly working on https://github.com/NASA-Openscapes/earthdata-cloud-cookbook which requires some initial prototyping. Environment persistence between restarts would be handy to have until we are in "production mode"
from docs.
I guess I need to find out who is our hub admin and see if we can get the Dockerfile approach + persistence. One thing I noticed from the documentation is that you discourage the use of quay.io/my-user/my-image:latest
and for prototyping I was precisely thinking about having something like that (so if I modify the environment a hub admin doesn't have to update the build tag).
from docs.
One thing I noticed from the documentation is that you discourage the use of quay.io/my-user/my-image:latest
Yes, having specific references (tags) is important to really know the environment are you working with.
and for prototyping I was precisely thinking about having something like that (so if I modify the environment a hub admin doesn't have to update the build tag).
As I said before, we are currently testing some new tooling to prototype/test and eventually self-serve the environment customization. Currently, the process looks like this: https://github.com/2i2c-org/peddie-image
Would you be interested to have something like this for openscapes?
from docs.
Just read this tooling and looks like step 4 is what I wanted to avoid, since it requires a hub admin.
Open the Configurator for the peddie hub (you need to be logged in as an admin).
The important part would be to have an agile way of altering the environment while we are prototyping. I think just persisting my home directory as @consideRatio suggested would be enough for now.
from docs.
@damianavila, I just got admin credentials this morning and went to the "configurator" page. I see a box to enter a docker image name for the users and the default interface (RStudio, Lab or classic notebooks) but I don't see what image the users are running now. I don't want to disrupt what other users are doing by just entering my customized image. Is there a way to find out what image users are running now? so at least I can clone those dependencies and add the edit to persist the environment.
from docs.
Hi @betolink - you can see the image reference in these lines of the config file
from docs.
Thanks @sgibson91! is 783616723547.dkr.ecr.us-west-2.amazonaws.com/user-image
coming from https://github.com/2i2c-org/openscapes-image/? Oh I have so many questions and I don't want to spam you all.
I guess I could open an issue on the openscapes image repository to add what @consideRatio suggested. I assume there is a good reason why the image is being pushed to AWS ECR instead of Dockerhub.
from docs.
One last thing (perhaps) I noticed a substantial performance hit when I installed a conda environment on my home directory. My guess is that this may be related to the home directory being mounted on EFS?
How to reproduce?
mamba env create -f environment.yml
vs
mamba env create -f environment.yml -p /home/jovyan/{environment}
from docs.
Hey all - just wanted to boost this comment as well, which might be an interesting option for managing different conda environments from within Jupyter: 2i2c-org/infrastructure#562 (comment)
from docs.
nb-conda-kernels
sounds like a good option. We would still need some form of persistence right? otherwise we'll have to install an environment every time we start our instance. I wonder, is there a way for Jupyter hubs to configure base images per user and not hub-wide? a bit like binder + user space persistence?
from docs.
My guess is that this may be related to the home directory being mounted on EFS?
Most likely that is the case, EFS is slow for this kind of conda things. So you have persistence at the cost of performance...
I wonder, is there a way for Jupyter hubs to configure base images per user and not hub-wide?
Not a per-user option, but maybe using different profiles pointing to different images that you, as a specialized user, can customize?
I imagine your use-case as-is:
One X profile in addition to the base one.
That X profile loads a docker image that is actually creating the environments you may need in the Dockerfile (and maybe installing nb_conda_kernels to manage them). In addition, that Dockerfile could contain all the customizations that Erik proposed so your conda envs are saved in /home (and persisted).
The user who wants that experience would select that X profile and they will have all the environments predefined in the Dockerfile + all the new ones that are created "live" by the user and persisted at /home.
If the user modifies one of the environments "coming" from the Dockerfile, they can "promote" the customization by just modifying the Dockerfile, pushing it, and using the Configurator to update the reference (you could even think about using a latest
reference and the Configurator step would be not needed, although it is not recommended to use latest
unless you have a real good reason for that 😜 ).
If the user works with one of their /home
-backed environment, that would be automatically persistent (at the EFS slowness cost) but that one could be "promoted" to the Dockerfile when the user is enough happy about it...
from docs.
I think this is another use case where bringing the JupyterHub and BinderHub helm charts closer together will provide a solution, as we will be able to provide workflows closer to what the persistent BinderHub helm chart does https://github.com/gesiscss/persistent_binderhub i.e. a user can create an environment on the fly from a repo using repo2docker and these environments are persisted
from docs.
I think having something like you both described would simplify many workflows. A Hub admin would be responsible for infrastructure, (i.e. credentials, shared mounts, instance types). Researchers will build their environment from a github repo(using repo2docker or similar.) and select the instance type they want to run this environment on. I think just having the flexibility to bootstrap an environment like Binder will reduce the need for persisting changes to the base image, since we can make those changes in the original repository and presumably persistence will be used for just work in progress or sample data but not whole Conda environments.
from docs.
Hi 2i2c team, thanks for all the discussion here and in 2i2c-org/infrastructure#562. Does this sound like something 2i2c can support? @betolink and @amfriesz can start coordinating/preparing stuff on our end but we wanted to first confirm if this is something you'll be moving forward with, and if you know a rough timeline. @choldgraf I'm happy to chat about it too if you'd like
from docs.
Related Issues (20)
- Create a user off-boarding checklist for hub admins
- Document Grafana access for communities HOT 1
- [BUG] readthedocs actions doesn't provide a working URL to deploy preview when updating docs
- Upgrade Hub Service Guide to Jupyter Book HOT 3
- [EPIC] Port existing Hub Service Guide content to Jupyter Book HOT 8
- Document that the configurator is not available if using profileLists
- Write technical content for guiding communities on how to build custom images for their hubs. HOT 7
- Add how to edit users in admin/howto/manage-users/
- Directive to replace custom Python, e.g. list of running hubs and feature tables HOT 5
- [EPIC] Update workflows for the Hub Service Guide (docs.2i2c.org) to aid support work
- How-to guide for adding persistent storage buckets HOT 6
- How-to guide for tracking usage and costs in Grafana HOT 2
- Update how-to-guides/add-packages-to-image.md
- Document the usage of temp rather than $HOME for keeping temporary data files HOT 2
- Move https://github.com/yuvipanda/example-inherit-from-community-image to 2i2c-org and change into template HOT 2
- Create how-to on using dask-gateway for communities HOT 6
- Document use of cloud object storage HOT 4
- Document how to test JupyterHub images locally
- Create redirect from old customise image docs to new customise image docs HOT 3
- Tutorial for data transfer workflow for large datasets
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docs.