2i2c-org / docs Goto Github PK
View Code? Open in Web Editor NEWDocumentation for 2i2c community JupyterHubs.
Home Page: https://docs.2i2c.org
Documentation for 2i2c community JupyterHubs.
Home Page: https://docs.2i2c.org
We would like to change the default interface from jupyter-tree to jupyter-lab
Once we have users on these hubs, we'll need a model for how they are expected to engage with technical support as well as engage with other users on hubs. In addition there are at least two different kinds of users: admin-level users and non-admin users.
What kinds of communication bottlenecks do we want? What kind of technical platforms should we use for this communication?
A few thoughts for tech platforms:
[email protected]
that is forwarded to many of usIn addition, how shall we break down the difference between pedagogical questions and support vs. technical questions and support? Perhaps having a single location for support will help us blur the lines between these a bit?
cc
@ericvd-ucb
@sandeepsainath
@sideye
@jamesgspercy
@yuvipanda
At this moment (16:21 CEST) many students have frozen systems and cannot continue. Around 16.05 it also happened. It seems that some usage threshold is crossed and many accounts are frozen, while others keep on working.
A student Safelg (email: [email protected]) in the El Camino 2i2c site keeps getting a 500: Internal Server Error.
I tried stopping and restarting his server and had him log out of all his google accounts. I then accessed his server and received the same error message. I can't figure out what's wrong.
Thanks for reporting your issue! Sorry things aren't working as expected.
In order to help resolve things quickly, please provide the following information:
✔️ The URL of your hub:
hhttps://elcamino.cloudbank.2i2c.cloud/user/[email protected]/notebooks/ecc-materials-sp2021/materials/x19/lab/1/lab01/lab01.ipynb
✔️ What you expected to happen:
notebook to oepn
✔️ What actually happened:
500: Internal Server Error
✔️ Any error messages you see:
above
✔️ Any specific packages or tools you were using:
N/A
Thanks for requesting an improvement to your hub. 👍
In order to help resolve your request, please provide the following information:
https://elcamino.cloudbank.2i2c.cloud/user/[email protected]
✔️ What would you like to see changed on the hub: Please give admins/teachers more memory on the hub
✔️ Any particular steps you think are needed to accomplish this: N/A
✔️ How quickly do you need this change to be made: ASAP, I'm teaching at 2:30 pm today and my kernel is crashing on large data sets from the Data8 course
✔️ Are there any alternative ways to accomplish what you're trying to do: Without this, the lecture will not be interactive.
We'll take this into consideration and try to get back to you quickly!
I am trying to upload one of my notebooks and the data to the pilot hub, but there seems to be no file transfer.
What is the purpose/distinction between the shared and shared-readwrite folder?
We would like to add users with non @justiceinnovationlab.org domains and ideally change the log in to github authentication. to make it easier to add future non-admin users.
Under our current funding model we have 5 nodes available to our user pool outside of hackathons. Is there a way to ensure that one of those nodes is available to a person from my research group at all times? That would help spinning up some of our interns.
I imagine the question can be generalized to any partition of an integer n >1.
Once an organization is interested in the pilot and gets their own hub, what are the next pieces of information that they should get in order to become acquainted with the hub, learn how to do important things, get a good mental model for how things work, etc.
Some of this information is here: https://2i2c.org/pilot/use.html
we could either add new stuff there, or add to a new page
cc
I am using the farallon 2i2c cloud to run RStudio and would like the 'lwgeom' package to be installed. Thank you!
We're connected to some organizations that are generally interested in the JupyterHub / Data 8 / etc stack, but have not yet jumped on board. What is some information that would be of interest to them to decide whether they want to be involved in the pilot?
We can take this information and integrate it into a landing page for the 2i2c pilots, or perhaps an "about" page or something like this.
cc
We use an Azure blob storage and approve specific IP addresses to the blob storage. Is there an IP range that we should approve? I've approved the IP address for the hub when I log in, but it appears that other users have a different ip address?
We can customize the name, logo, and URL of a hub, we should document for people how they could modify these (or request that they be modified!)
In order to facilitate collaboration on our hub, it would be useful if we could edit notebooks there, push them to GitHub, and have other collaborators pull them down to their own user space on the hub, make changes, and push them back to GitHub, etc. This way we would know we are sharing a common data + computational environment, and not have to worry about debugging issues that are tied to our local configuration / environment.
Cloning public GitHub repos to the hub works fine, but I had trouble pushing changes back to GitHub. I have 2-factor authentication set up on my GitHub account, and so https:// + username and password doesn't work. I've been authenticating with my SSH key, but that doesn't seem to be working here.
jovyan@jupyter-zane-2eselvans-40catalyst-2ecoop:~/.ssh$ ssh-add id_rsa
Could not open a connection to your authentication agent.
I was ultimately able to get https based authentication working by generating a personal access token, and telling git to store the credential locally for later re-use, but I'm still curious if I was just doing something wrong with respect to getting SSH authentication working, and whether ssh-key based authentication would be an easy thing to enable, so the access token isn't being stored as cleartext on the hub.
Also wondering if there is a preferred solution for running git commands within JupyterLab, for collaborators with less terminal / CLI familiarity.
I noticed two folders in my accounts with the names shared
and shared-readwrite
. I would really like to share a dataset of approximately 50 MB among the students, that they will all open at the same time (30 students). Is this possible, or is it better to put the file separately in each account?
Among the people using the hub, some of the users should have access to one repo while another user should have access to a different repo. What's the best way for me to guarantee that if users clone repos that those repos don't end up shared with users that shouldn't have access to them? Is it as simple as they clone the repo NOT in the shared folder?
Is there a way to have multiple kernels in my session if I'm not a hub administrator? I tried to install some packages from the terminal and it seems I have no sudo
either, is there a particular reason why?
In some testing, the Wageningen hub was consistently hitting just under 1GB of RAM, so in a context with students it will likely just exceed this, so let's bump to 2GB as a precaution.
In a recent meeting @sideye, Alan made a good point that we'll need to make a plan for how to gather information about the experience of the organizations that we're serving with these hubs. Let's use this issue to brainstorm the kinds of things we'd want to learn from them, as well as a plan for when to do checkpoints and interviews.
On the Catalyst Cooperative pilot hub, we want to be able to have collaborators clone GitHub repositories and also push changes back to them. What's the easiest way to integrate GitHub authentication into the hub?
Getting someone's SSH key in there is a one-time setup which we can handle, but we're wondering if there's a simpler solution that already exists which we can use, because our spreadsheet-based collaborators will need assistance to get this kind of authentication set up, and start their journey into Python based analysis.
Relatedly, how do users typically deal with doing development work? Do folks just use the text editor that's built into Jupyter to edit files that are checked out onto the JupyterHub? Or do they user their normal desktop editor locally, and then push to a repository, which they pull from on the JupyterHub to access the new changes?
It looks like many of the things we typically do in Jupyter take at least 2.5 GB of RAM (based on running processes locally with the nbresuse
plugin activated), and the kernel is predictably dying when we try to run those things on our pilot hub
Would it be possible to bump the RAM to 4GB?
We can play around with configurations on the hub as it is, but to start sharing it with a couple of collaborators and exploring how we can use the JupyterHub to improve our workflow, we'll need more memory so... some time this coming week would be great.
Obviously increasing the per-node resources increases overall costs. What can we do to ensure we aren't needlessly consuming resources? How long does the server keep running and holding on to resources after we go idle? Can we minimize impacts on the overall cost by shutting the server down as soon as we're done using it for a while?
As more courses use 2i2c Hubs for their teaching, they will need a place to speak with one another, with the CDSS teams, and with 2i2c people. There are roughly three kinds of communication that we should think about:
We should have a place for each type of communication, and ideally non-overlapping spaces.
If people want quick and informal chatter they use Slack. This is the "messiest" of all communication spaces. We'll invite them to talk on the dsep-pilot-hubs
Slack room in 2i2c (perhaps this room should be shared with the DSEP slack?)
In general we recommend people ask questions and give feedback via GitHub Discussions. This is a Q&A-style forum that is attached to this repository.
If people have specific requests they'd like to make for the hub infrastructure, we use GitHub Issues for this. In addition, specific items that arise out of GitHub Discussions will become Issues as well.
A link to ephemeral-hubs is broken, is this still a supported type of hub?
https://pilot.2i2c.org/en/latest/about/infrastructure.html#ephemeral-hubs
In the current pilot deployment, we configure the dask scheduler via dask.distributed
and dask_gateway
with the dask dashboard appearing as an external link:
from dask.distributed import Client
from dask_gateway import Gateway
gateway = Gateway()
cluster = gateway.new_cluster()
Would it be possible to have the dashboard embedded in the Jupyterhub by default as it would be for example by installing the dask-labextension package so that we could configure the dask scheduler directly via the dashboard?
In #57 a user was unable to access their server because they had (unintentionally/intentionally) installed a package via --user
. This had upgraded nbconvert
and introduced a bug that meant servers couldn't start.
It may be common for users to use --user
to install things (unless we explicitly allow it). We should provide lots of documentation for administrators that suggests to them this is a potential cause of inexplicable problems.
--user
.bug
issue templateSome folks weren't sure if there were commas or spaces that separated multiple usernames
Some folks were asking how long their server will be active if they leave the computer and log back on. I believe this is configured somewhere but I'm not sure
As new instructors are onboarded to the hub infrastructure, we should try to learn a few things from them to understand how we can best-serve them.
For instructors that have finished a cycle of use (e.g. after a semester or course ends), we should determine some things that we want to learn from them.
Many people may wish to provide introductory material to the Jupyter ecosystem, both for instructors or for students. We should curate some useful information for this group to help them get started.
As the hub begin running in S21, we should plan how to synchronize our activity so that we can stay on the same page and coordinate action items.
Thanks for requesting an improvement to your hub. 👍
In order to help resolve your request, please provide the following information:
We'll take this into consideration and try to get back to you quickly!
I think that one of these should be an educational hub via the Berkeley DataHub model - perhaps @ericvd-ucb's teams can help put together a guide to pedagogy with JupyterHub that we can point people to? I am happy to help out there as well.
At this moment (16:21 CEST) many students have frozen systems and cannot continue. Around 16.05 it also happened. It seems that some usage threshold is crossed and many accounts are frozen, while others keep on working.
I think we've had at least two groups of people who have used 2i2c hubs for things. Let's make a testimonial page and collect them?
Since your webpage is a sales document not a research paper, put the benefit first, then the feature. I like the content, it is clear and detailed, but I think each section could benefit from a 'feature' being at the top. For example: at the top of the pilot page, I would put what a 2i2c lets you do or why you should use it. eg. "Advance science faster by removing barriers to accessing the power of cloud-based computing, collaborations, and reproducibility. 2i2c Hubs.... " I think going through high level pages and adding stuff like this would help people understand how Hubs can advance their research and help their team/group/project.
Thanks for reporting your issue! Sorry things aren't working as expected.
In order to help resolve things quickly, please provide the following information:
✔️ The URL of your hub:
https://sbcc.cloudbank.2i2c.cloud/
✔️ What you expected to happen:
✔️ What actually happened:
I was able to log in and launch the server, but when I tried to use it (just starting a notebook), it stopped working, tried to relaunch, but that also failed
✔️ Any error messages you see:
Spawn failed
The latest attempt to start your server has failed.
Tried relaunching and logging out, but that did not seem to help.
✔️ Any specific packages or tools you were using:
I am just trying to get started, but it seems very unstable... This happened 2 weeks ago as well.
Right now we are serving this documentation via github pages
and github actions
. We set this up because it was the easiest way to automatically get sites hosted at sub-directories of 2i2c.org
(e.g., 2i2c.org/pilot).
For most of our other projects we use ReadTheDocs. What do folks think about using ReadTheDocs instead? This at the least would give us PR previews of the documentation, and would also just streamline our workflow with other repositories as well.
One question to resolve: I am not sure whether ReadTheDocs would let us use 2i2c.org/<repo-name>
the way that GitHub Pages does. Their custom domains documentation suggests that we can point to our own URL, but I don't know if this would require us to use a subdomain like pilot.2i2c.org
rather than 2i2c.org/pilot
. @GeorgianaElena or @yuvipanda any chance you have an intuition there?
Within an Rstudio session I looked to add a number of cores and create a cluster for a calculation using the parallel
package. Using that package, I used makeCluster()
to add a number of cores to a cluster at which point the process just hung. I killed the process and started a new Rstudio session. This time, using benchmarkme
I attempted to test the time for a matrix calculation using four cores at which point again the process hung. Is this behavior expected?
Within the terminal, I checked the processes running ps aux
and the multiple R processes I initialized are listed, but are zombies. I attempted to kill them, but was unable to remove them. How might I remove these zombie processes? I didn't want to kill the Rstudio parent process as I was unsure if that would affect other things.
In a recent meeting @sandeepsainath mentioned that he has put together an onboarding questionnaire for the CC pilot organizations (@sandeepsainath can you link it?). In addition, we've got this triage questionnaire that looks at many of the same questions.
We should either:
@ericvd-ucb I assume that for the CloudBank collaboration, we should direct users to the Berkeley DSEP team if they are interested in participating. Is there an email address you'd like me to point institutions to if they are interested? Or would you prefer to work with your pre-existing contacts?
What happens if I click the access server
button from one of the student accounts in the hub/admin
page? Can I take control of their notebook, or are we both editing in the same file? This seems like a useful option for remote help, but only if it comes without the risk of data loss?
The following error only occurs if I try to use VAST package plotting functions from saved model output if the server has timed out or R Studio crashes or my internet has lost connection for a bit - it doesn’t occur if I use the functions right after the model runs. I'm a bit confused in deciphering the error message and wondering if anyone might have a better idea if this is an issue with the VAST package or with using the server or both? Maybe this question is better directed toward the creator of the VAST package? Thanks in advance!
Error in .Call("TMBconfig", e, as.integer(1), PACKAGE = DLL) :
"TMBconfig" not available for .Call() for package "VAST_v13_0_0"
We'd like to install a few packages for all our users:
There are other R and python packages we'd like to system wide install, but happy to do that myself if you can say how. The above are ones that have just caused me headaches in the past installing on linux systems.
On this page: https://pilot.2i2c.org/en/latest/admin/howto/replicate.html
In this section:
A link to https://github.com/2i2c-org/pilot-hubs/blob/master/hubs.yaml is broken.
We should write up documentation for how others can replicate the service used by 2i2c. This would be a more in-depth guide to the migration guide that would include information about dev-ops, cloud infrastructure, etc.
This would have two main benefits:
I think the guide should include things like
I've spotted some broken links. A sensible approach to mitigate this is to automate link checking that preferably run both on changes and as a cronjob as links can break by external changes as well.
Here is a PR that implements this for one of our repositories: 2i2c-org/infrastructure#649
It is based on the two code samples below
In Slack @yuvipanda described an interesting workflow. I didn't quite wrap my head around it so I'll paste his text here and maybe he can clarify a little bit so we can document.
all hubs now have a 'shared/' directory in $HOME, and admins have a 'shared-readwrite/' in home that does what you think it would do!
so we could potentially ahve folks get a git repo under 'shared-readwrite', and have others access it via 'shared'
so there's a 'shared-readwrite/homework' 'bare' git repo, and if you then use this nbgitpuller link: https://staging.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=shared%2Fhomework&urlpath=tree%2Fhomework%2F&branch=master it'll actually pull from that
separate from that, as an admin, you can (via terminal) do 'git clone shared/homework homework-r'w, make changes into homework-rw, make git commits, and push
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.