Code Monkey home page Code Monkey logo

docs's Introduction

2i2c Managed Hub Service Documentation

This repository serves as the user-facing documentation and communication space for those who are using 2i2c Hubs.

Most of the infrastructure that we discuss in the documentation is deployed in the infrastructure/ repository.

See the service documentation for more information.

How to preview this documentation

To preview this documentation, use the Nox tool. First install it:

pip install nox

To build the documentation and place the HTML files in _build/html:

nox -s docs

To build the documentation with a server that watches for changes and auto-builds the documentation with a preview, run the following:

nox -s docs -- live

docs's People

Contributors

aidea775 avatar akhmerov avatar bramveen1 avatar choldgraf avatar colliand avatar consideratio avatar don-jil avatar georgianaelena avatar ianabc avatar jameshowison avatar jbusecke avatar jmunroe avatar jnywong avatar pnasrat avatar rabernat avatar scottyhq avatar sgibson91 avatar tacaswell avatar yuvipanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs's Issues

Student unable to log into any jupyter labs

A student Safelg (email: [email protected]) in the El Camino 2i2c site keeps getting a 500: Internal Server Error.

I tried stopping and restarting his server and had him log out of all his google accounts. I then accessed his server and received the same error message. I can't figure out what's wrong.


Thanks for reporting your issue! Sorry things aren't working as expected.

In order to help resolve things quickly, please provide the following information:

✔️ The URL of your hub:
hhttps://elcamino.cloudbank.2i2c.cloud/user/[email protected]/notebooks/ecc-materials-sp2021/materials/x19/lab/1/lab01/lab01.ipynb
✔️ What you expected to happen:
notebook to oepn
✔️ What actually happened:
500: Internal Server Error
✔️ Any error messages you see:
above
✔️ Any specific packages or tools you were using:
N/A

.Call error when trying to use saved output in R Studio

The following error only occurs if I try to use VAST package plotting functions from saved model output if the server has timed out or R Studio crashes or my internet has lost connection for a bit - it doesn’t occur if I use the functions right after the model runs. I'm a bit confused in deciphering the error message and wondering if anyone might have a better idea if this is an issue with the VAST package or with using the server or both? Maybe this question is better directed toward the creator of the VAST package? Thanks in advance!

Error in .Call("TMBconfig", e, as.integer(1), PACKAGE = DLL) :
"TMBconfig" not available for .Call() for package "VAST_v13_0_0"

Add a 'testimonials' page

I think we've had at least two groups of people who have used 2i2c hubs for things. Let's make a testimonial page and collect them?

Increase RAM to 4GB on Catalyst Cooperative pilot hub?

What would you like to see changed on the hub:

It looks like many of the things we typically do in Jupyter take at least 2.5 GB of RAM (based on running processes locally with the nbresuse plugin activated), and the kernel is predictably dying when we try to run those things on our pilot hub

Any particular steps you think are needed to accomplish this:

Would it be possible to bump the RAM to 4GB?

How quickly do you need this change to be made:

We can play around with configurations on the hub as it is, but to start sharing it with a couple of collaborators and exploring how we can use the JupyterHub to improve our workflow, we'll need more memory so... some time this coming week would be great.

How can we be good stewards of the server resources?

Obviously increasing the per-node resources increases overall costs. What can we do to ensure we aren't needlessly consuming resources? How long does the server keep running and holding on to resources after we go idle? Can we minimize impacts on the overall cost by shutting the server down as soon as we're done using it for a while?

How should we authenticate with GitHub on the JupyterHub?

On the Catalyst Cooperative pilot hub, we want to be able to have collaborators clone GitHub repositories and also push changes back to them. What's the easiest way to integrate GitHub authentication into the hub?

  • If we were using GitHub authentication to allow users to log in, would that identity information be available for the user, or does it only work for login purposes?
  • Is there any way for the user to interact with git / authenticate without needing to drop into the terminal to (e.g.) clone / pull / push, edit an SSH key, or provide a username & password for https:// based GitHub authentication?

Getting someone's SSH key in there is a one-time setup which we can handle, but we're wondering if there's a simpler solution that already exists which we can use, because our spreadsheet-based collaborators will need assistance to get this kind of authentication set up, and start their journey into Python based analysis.

Relatedly, how do users typically deal with doing development work? Do folks just use the text editor that's built into Jupyter to edit files that are checked out onto the JupyterHub? Or do they user their normal desktop editor locally, and then push to a repository, which they pull from on the JupyterHub to access the new changes?

Sharing read-only data among students

I noticed two folders in my accounts with the names shared and shared-readwrite. I would really like to share a dataset of approximately 50 MB among the students, that they will all open at the same time (30 students). Is this possible, or is it better to put the file separately in each account?

Clarify that user-installed packages have unpredictable results

Background

In #57 a user was unable to access their server because they had (unintentionally/intentionally) installed a package via --user. This had upgraded nbconvert and introduced a bug that meant servers couldn't start.

It may be common for users to use --user to install things (unless we explicitly allow it). We should provide lots of documentation for administrators that suggests to them this is a potential cause of inexplicable problems.

To Do

  • Find places in the docs where we should include a warning about --user.
  • Consider adding this check to the bug issue template
  • Make a PR to make these changes

server start up failures (spawn failed)

Thanks for reporting your issue! Sorry things aren't working as expected.

In order to help resolve things quickly, please provide the following information:

✔️ The URL of your hub:
https://sbcc.cloudbank.2i2c.cloud/
✔️ What you expected to happen:
✔️ What actually happened:
I was able to log in and launch the server, but when I tried to use it (just starting a notebook), it stopped working, tried to relaunch, but that also failed
✔️ Any error messages you see:
Spawn failed
The latest attempt to start your server has failed.
Tried relaunching and logging out, but that did not seem to help.
✔️ Any specific packages or tools you were using:
I am just trying to get started, but it seems very unstable... This happened 2 weeks ago as well.

Risk of data loss with `Access server` button?

What happens if I click the access server button from one of the student accounts in the hub/admin page? Can I take control of their notebook, or are we both editing in the same file? This seems like a useful option for remote help, but only if it comes without the risk of data loss?

Document a local git repository `shared/` folder workflow

In Slack @yuvipanda described an interesting workflow. I didn't quite wrap my head around it so I'll paste his text here and maybe he can clarify a little bit so we can document.

all hubs now have a 'shared/' directory in $HOME, and admins have a 'shared-readwrite/' in home that does what you think it would do!

so we could potentially ahve folks get a git repo under 'shared-readwrite', and have others access it via 'shared'

so there's a 'shared-readwrite/homework' 'bare' git repo, and if you then use this nbgitpuller link: https://staging.pilot.2i2c.cloud/hub/user-redirect/git-pull?repo=shared%2Fhomework&urlpath=tree%2Fhomework%2F&branch=master it'll actually pull from that

separate from that, as an admin, you can (via terminal) do 'git clone shared/homework homework-r'w, make changes into homework-rw, make git commits, and push

Wageningen hub to 2GB ram

In some testing, the Wageningen hub was consistently hitting just under 1GB of RAM, so in a context with students it will likely just exceed this, so let's bump to 2GB as a precaution.

Create a questionnaire to use for the community college pilots

In a recent meeting @sandeepsainath mentioned that he has put together an onboarding questionnaire for the CC pilot organizations (@sandeepsainath can you link it?). In addition, we've got this triage questionnaire that looks at many of the same questions.

We should either:

  • bring in relevant questions from the triage questionnaire into @sandeepsainath's one that is focused on education
  • figure out if the generic triage questionnaire could be used for this instead, and add any extra questions as needed

Add more high-level description of the benefits/features of a hub rather than just technical stuff

Since your webpage is a sales document not a research paper, put the benefit first, then the feature. I like the content, it is clear and detailed, but I think each section could benefit from a 'feature' being at the top. For example: at the top of the pilot page, I would put what a 2i2c lets you do or why you should use it. eg. "Advance science faster by removing barriers to accessing the power of cloud-based computing, collaborations, and reproducibility. 2i2c Hubs.... " I think going through high level pages and adding stuff like this would help people understand how Hubs can advance their research and help their team/group/project.

Brainstorm an end-of-use questionnaire for instructors

Background

For instructors that have finished a cycle of use (e.g. after a semester or course ends), we should determine some things that we want to learn from them.

ToDo

  • Make a draft of questions to ask instructors (see this draft)
  • Turn these into a Google Survey
  • Send to first few people finishing their deployment

Reserving nodes for particular users/organizations

Under our current funding model we have 5 nodes available to our user pool outside of hackathons. Is there a way to ensure that one of those nodes is available to a person from my research group at all times? That would help spinning up some of our interns.

I imagine the question can be generalized to any partition of an integer n >1.

Properly installing some dependency packages

We'd like to install a few packages for all our users:

  • nodejs with the ability to manage node versions
  • gatsby
  • A C and a C++ compiler as mentioned in the requirements for this R library (https://igraph.org/r/)
  • igraph
  • devtools for R
  • Rcpp
  • roxygen2
  • rJava
  • davidsjoberg/ggsankey

There are other R and python packages we'd like to system wide install, but happy to do that myself if you can say how. The above are ones that have just caused me headaches in the past installing on linux systems.

Create an onboarding survey for new instructors

Background

As new instructors are onboarded to the hub infrastructure, we should try to learn a few things from them to understand how we can best-serve them.

ToDo

  • Agree on a set of questions to ask (see this draft)
  • Turn into a Google form
  • Test with a few new instructors
  • Iterate over time to improve the questions

Update otter-grader

Thanks for requesting an improvement to your hub. 👍

In order to help resolve your request, please provide the following information:

  • What would you like to see changed on the hub: Please, update otter-grader to the current version. Thank you!
  • Any particular steps you think are needed to accomplish this:
  • How quickly do you need this change to be made: As soon as possible. We can upgrade individually for the time being to practice using the Otter documentation tutorials.
  • Are there any alternative ways to accomplish what you're trying to do:

We'll take this into consideration and try to get back to you quickly!

Determine communication channels for the education hubs

Background

As more courses use 2i2c Hubs for their teaching, they will need a place to speak with one another, with the CDSS teams, and with 2i2c people. There are roughly three kinds of communication that we should think about:

  • Informal and quick communication
  • Unstructured but searchable / archivable information
  • Formal To-Do conversation items

We should have a place for each type of communication, and ideally non-overlapping spaces.

Proposed Workflow

If people want quick and informal chatter they use Slack. This is the "messiest" of all communication spaces. We'll invite them to talk on the dsep-pilot-hubs Slack room in 2i2c (perhaps this room should be shared with the DSEP slack?)

In general we recommend people ask questions and give feedback via GitHub Discussions. This is a Q&A-style forum that is attached to this repository.

If people have specific requests they'd like to make for the hub infrastructure, we use GitHub Issues for this. In addition, specific items that arise out of GitHub Discussions will become Issues as well.

ToDo

  • Agree on the proposed workflow above
  • Do whatever setup is needed to make this happen
    • Informal/quick
    • Unstructured but searchable
    • Formal To-Do

Embed dask dashboard into the Jupyterhub

In the current pilot deployment, we configure the dask scheduler via dask.distributed and dask_gateway with the dask dashboard appearing as an external link:

from dask.distributed import Client
from dask_gateway import Gateway

gateway = Gateway()
cluster = gateway.new_cluster()

Would it be possible to have the dashboard embedded in the Jupyterhub by default as it would be for example by installing the dask-labextension package so that we could configure the dask scheduler directly via the dashboard?

Properly isolating project repos among users

Among the people using the hub, some of the users should have access to one repo while another user should have access to a different repo. What's the best way for me to guarantee that if users clone repos that those repos don't end up shared with users that shouldn't have access to them? Is it as simple as they clone the repo NOT in the shared folder?

Document how users could sign up for their hub

@ericvd-ucb I assume that for the CloudBank collaboration, we should direct users to the Berkeley DSEP team if they are interested in participating. Is there an email address you'd like me to point institutions to if they are interested? Or would you prefer to work with your pre-existing contacts?

Increasing cores and killing processes

Within an Rstudio session I looked to add a number of cores and create a cluster for a calculation using the parallel package. Using that package, I used makeCluster() to add a number of cores to a cluster at which point the process just hung. I killed the process and started a new Rstudio session. This time, using benchmarkme I attempted to test the time for a matrix calculation using four cores at which point again the process hung. Is this behavior expected?

Within the terminal, I checked the processes running ps aux and the multiple R processes I initialized are listed, but are zombies. I attempted to kill them, but was unable to remove them. How might I remove these zombie processes? I didn't want to kill the Rstudio parent process as I was unsure if that would affect other things.

Come up with a user communication and support workflow

Once we have users on these hubs, we'll need a model for how they are expected to engage with technical support as well as engage with other users on hubs. In addition there are at least two different kinds of users: admin-level users and non-admin users.

What kinds of communication bottlenecks do we want? What kind of technical platforms should we use for this communication?

A few thoughts for tech platforms:

  1. Slack
  2. A Discourse site
  3. GitHub issues in these repositories
  4. An email address like [email protected] that is forwarded to many of us
  5. Pigeons with notes wrapped to their feet

In addition, how shall we break down the difference between pedagogical questions and support vs. technical questions and support? Perhaps having a single location for support will help us blur the lines between these a bit?

cc

@ericvd-ucb
@sandeepsainath
@sideye
@jamesgspercy
@yuvipanda

Write a guide for how other organizations could replicate the 2i2c service

Background

We should write up documentation for how others can replicate the service used by 2i2c. This would be a more in-depth guide to the migration guide that would include information about dev-ops, cloud infrastructure, etc.

This would have two main benefits:

  1. It gives people a pathway to doing this without 2i2c, which is part of our Right to Replicate pledge
  2. It also shows people how costly it might be to run this infrastructure totally on your own

I think the guide should include things like

  • What kind of JupyterHub configuration we use
  • Common issues that arise when operating this infrastructure over time
  • Where they could get support online when running it themselves
  • What personnel they'd need (e.g., 25% of a dev-ops FTE) to run the hub infra
  • What their cloud costs would likely be
  • What their total annual cost would likely be

To Do

  • Make an initial draft of this text (see https://pilot.2i2c.org/en/latest/admin/howto/replicate.html)
  • Improve explanation of the "setting up a JupyterHub part" a bit
  • Improve explanation of setting up the "bells and whistles" of 2i2c Hubs
  • Specific how-tos
    • How to download all user data in one go
    • How to download list of authorized users

Frozen systems

At this moment (16:21 CEST) many students have frozen systems and cannot continue. Around 16.05 it also happened. It seems that some usage threshold is crossed and many accounts are frozen, while others keep on working.

Brainstorm a plan for interviewing / collecting UX data from our users

In a recent meeting @sideye, Alan made a good point that we'll need to make a plan for how to gather information about the experience of the organizations that we're serving with these hubs. Let's use this issue to brainstorm the kinds of things we'd want to learn from them, as well as a plan for when to do checkpoints and interviews.

Resources

To Do

  • Create the initial onboarding questionnaire
  • Decide what questions we want to learn from people using the infrastructure
  • Create a mid-way checkpoint questionnaire

Students have frozen system

At this moment (16:21 CEST) many students have frozen systems and cannot continue. Around 16.05 it also happened. It seems that some usage threshold is crossed and many accounts are frozen, while others keep on working.

Create or point to "introduction to Jupyter" materials for educators

Background

Many people may wish to provide introductory material to the Jupyter ecosystem, both for instructors or for students. We should curate some useful information for this group to help them get started.

ToDo

  • Determine what kinds of information new instructors would want to know (or would want to share with their students)
  • Prepare a list of links etc that would be useful
  • Incorporate into the pilot hubs documentation (potentially via a standalone set of docs)

Plan for Berkeley education team syncs

Background

As the hub begin running in S21, we should plan how to synchronize our activity so that we can stay on the same page and coordinate action items.

ToDo

  • Agree upon an initial team sync
  • Agree upon the format for team syncs
  • Define a schedule for team syncs

Swap these docs over to `readthedocs`

Background

Right now we are serving this documentation via github pages and github actions. We set this up because it was the easiest way to automatically get sites hosted at sub-directories of 2i2c.org (e.g., 2i2c.org/pilot).

For most of our other projects we use ReadTheDocs. What do folks think about using ReadTheDocs instead? This at the least would give us PR previews of the documentation, and would also just streamline our workflow with other repositories as well.

One question to resolve: I am not sure whether ReadTheDocs would let us use 2i2c.org/<repo-name> the way that GitHub Pages does. Their custom domains documentation suggests that we can point to our own URL, but I don't know if this would require us to use a subdomain like pilot.2i2c.org rather than 2i2c.org/pilot. @GeorgianaElena or @yuvipanda any chance you have an intuition there?

Give admins/teachers more memory

Thanks for requesting an improvement to your hub. 👍

In order to help resolve your request, please provide the following information:

https://elcamino.cloudbank.2i2c.cloud/user/[email protected]

✔️ What would you like to see changed on the hub: Please give admins/teachers more memory on the hub
✔️ Any particular steps you think are needed to accomplish this: N/A
✔️ How quickly do you need this change to be made: ASAP, I'm teaching at 2:30 pm today and my kernel is crashing on large data sets from the Data8 course
✔️ Are there any alternative ways to accomplish what you're trying to do: Without this, the lecture will not be interactive.

We'll take this into consideration and try to get back to you quickly!

Automate check for broken links in documentation and fix those currently broken

Description

I've spotted some broken links. A sensible approach to mitigate this is to automate link checking that preferably run both on changes and as a cronjob as links can break by external changes as well.

Implementation

Here is a PR that implements this for one of our repositories: 2i2c-org/infrastructure#649

It is based on the two code samples below

Tasks to complete

  • Add a linkcheck command to our Makefile
  • Add a job within an existing GitHub workflow, or add a dedicated GitHub workflow to run linkcheck on changes to docs
  • Fix currently broken links
  • Do the above for
    • pilot-hubs
    • pilot
    • team-compass

Unable to push to GitHub from 2i2c JupyterHub using ssh-key authentication

In order to facilitate collaboration on our hub, it would be useful if we could edit notebooks there, push them to GitHub, and have other collaborators pull them down to their own user space on the hub, make changes, and push them back to GitHub, etc. This way we would know we are sharing a common data + computational environment, and not have to worry about debugging issues that are tied to our local configuration / environment.

Cloning public GitHub repos to the hub works fine, but I had trouble pushing changes back to GitHub. I have 2-factor authentication set up on my GitHub account, and so https:// + username and password doesn't work. I've been authenticating with my SSH key, but that doesn't seem to be working here.

  • Hub URL: https://catalyst-cooperative.pilot.2i2c.cloud
  • What you expected to happen: I thought I would be able to add an SSH key and use it to authenticate with GitHub, allowing me to push changes.
  • What actually happened: I was unable to add my SSH key on the JupyterHub.
  • Any error messages you see:
jovyan@jupyter-zane-2eselvans-40catalyst-2ecoop:~/.ssh$ ssh-add id_rsa
Could not open a connection to your authentication agent.

I was ultimately able to get https based authentication working by generating a personal access token, and telling git to store the credential locally for later re-use, but I'm still curious if I was just doing something wrong with respect to getting SSH authentication working, and whether ssh-key based authentication would be an easy thing to enable, so the access token isn't being stored as cleartext on the hub.

Also wondering if there is a preferred solution for running git commands within JupyterLab, for collaborators with less terminal / CLI familiarity.

Brainstorm information to provide to get organizations interested

We're connected to some organizations that are generally interested in the JupyterHub / Data 8 / etc stack, but have not yet jumped on board. What is some information that would be of interest to them to decide whether they want to be involved in the pilot?

We can take this information and integrate it into a landing page for the 2i2c pilots, or perhaps an "about" page or something like this.

cc

@ericvd-ucb
@sandeepsainath
@sideye

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.