gesiscss / orc Goto Github PK

View Code? Open in Web Editor NEW

40.0 17.0 12.0 11.21 MB

This repository was superseded by https://github.com/gesiscss/orc2 - Open Research Computing

Home Page: https://notebooks.gesis.org/

License: MIT License

Python 34.79% CSS 7.99% JavaScript 0.17% HTML 56.51% TeX 0.31% Dockerfile 0.24%

jupyterhub binderhub binder gesis-notebooks gesis-binder gesis-hub persistent

orc's Introduction

This repository was superseded by https://github.com/gesiscss/orc2

Open Research Computing (ORC)

For more information about ORC project: https://notebooks.gesis.org/about/

Feel free to open an issue in this repository if there are any questions or contact us at [email protected].

Technical Details

This ORC instance is deployed on kubernetes on bare metal machines with Ubuntu 18.04. And kubernetes cluster (ORC cluster) is created with kubeadm (v1.18.3). calico is used as network provider. Docker version 19.03.8 is installed on servers.

All docker images of this project can be found in https://hub.docker.com/u/gesiscss/.

Load Balancer

Because we setup the kubernetes cluster on baremetal, we use the deployment approach "Using a self-provisioned edge".

Nginx is used as reverse proxy server and load balancer. It also handles SSL offloading/termination and serves static files. It is outside of ORC cluster and a public entrypoint to the cluster. All services in the cluster has type NodePort.

Storage

NFS Server Provisioner is the default storage provider in ORC cluster.

GESIS Hub

Persistent BinderHub runs under https://notebooks.gesis.org/hub/.

Uses Docker Hub Registry (https://hub.docker.com/u/gesiscss/) to store built images.

GESIS Binder

BinderHub runs under https://notebooks.gesis.org/binder/.

GESIS Hub and Binder uses same docker images (they uses same repo2docker version).

Gallery

Gallery of popular repos launched on GESIS Binder and featured projects: https://notebooks.gesis.org/gallery/

Funded by the German Research Foundation (DFG). FKZ/project number: 324867496.

orc's People

Contributors

Stargazers

Watchers

Forkers

codeaudit christophernhill moop-china jupyter-china petardinevbg bitnik speryp chandrikasp lpds25 pedrocklein

orc's Issues

Potential CSS problem

Persistent[1] and regular[2] launch lead to different looking Lab views. Maybe this is a CSS issue.

https://notebooks.gesis.org/services/binder/v2/gh/Galadirith/learn-github/master?urlpath=lab/tree/learn-github/1-introduction.ipynb

https://notebooks.gesis.org/binder/v2/gh/Galadirith/learn-github/master?urlpath=lab/tree/learn-github/1-introduction.ipynb

Also add delete while copying new changes to ngnix

orc/fabfile.py

Line 22 in b8efd04

c.run('echo "######## Copying config files"')

fix update bot with release changes to repo2docker (CalVer)

"Meta" Helm chart for ORC

We have now helm chart for gesis binder and gesis hub.

Learn more about helm releases and rollback
Make helm chart for gallery
Make helm chart for orc-site -> here mount html templates as configmaps too
Finally make a orc helm chart for whole project (similar to mybinder.org-deploy chart), which requires all other charts/apps (including storage, monitoring, gesis binder, hub, gallery charts...). So we can have 1 values.yaml for whole project?
- For example we can also have GESIS templates (and statics?) in a configmap and all apps use that configmap.

[Moved from GitLab]

Restarting from the hub

Hi guys,

Every once in a while, when the hub says, a server is still running, I click on "My Server" but I get the error message:

"503 : Service Unavailable. Your server appears to be down. Try restarting it from the hub"

Then I'm in a loop: I get back to the hub but going back to the server does not work. I was caught in that loop for a whole day before i noticed! (NOT)

If I need to urgently work, I log myself off, log in again, wait a few seconds. This shuts down the server. Then I start it again and it works.

Best and thanks!

Haiko

Documentation of ORC features

Add more details to FAQs
Create tutorials using various features of ORC.
mention /projects folder
How to use nbgitpuller and other recommendations (documentation/tutorials).

Persist the urlPath/filePath when opening a file on GESIS hub.

Can this be written to the database? or needs to be stored separately.

Allow FTP downloading

I recently started using GESIS binder for showcasing notebooks with geographical spatio-temporal analysis. I wondered if there's a sort of firewall blocking downloading files from FTP sources.

For instance, when wget the following ftp source:
!wget ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf

It keeps unresponsive

--2021-12-09 17:19:51--  ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf
           => ‘dp13046.pdf’
Resolving ftp.zew.de (ftp.zew.de)... 193.196.11.224
Connecting to ftp.zew.de (ftp.zew.de)|193.196.11.224|:21...

The same problem exists with python urllib.
urllib.request.urlretrieve('ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf', 'dp13046.pdf')

I would appreciate guidance on how to fetch data from FTP sources within GESIS binder environments.

Nice templet 4 a launch Badge with the ability to select the Hub

Some of the NFDI partners made a nice widget

https://tibhannover.gitlab.io/nfdi4ds/nfdi4ds-widget/

Could be a nice templet 4 a launch Badge with the ability to select the Hub

This could be relevant for 2i2c-org/infrastructure/issues/1382

Docker registry service failing

Binder health checks failing

Binder health checks failing since yesterday 23:00

// 20201128110613
// https://notebooks.gesis.org/binder/health

{
  "ok": false,
  "checks": [
    {
      "service": "Docker registry",
      "ok": false
    },
    {
      "service": "JupyterHub API",
      "ok": true
    },
    {
      "service": "Pod quota",
      "ok": true,
      "total_pods": 8,
      "build_pods": 0,
      "user_pods": 8,
      "quota": 200
    }
  ]
}

Downtime for network maintenance

Put up a banner 72hr before scheduled downtime on notebooks.gesis.org as well as on status page.
~~Create a script/handler for querying active users in the last 2 weeks to send out an email too (?)~~ 4eb351b

Different behaviour in pub binder and persistent binder

As @faflo just reported for IWAAN/Topic_modelling_C++.ipynb cell 10

topics, _ = display_topics(model_dir, K, voca_pt, tokens_processed, lng, the_page)

persistent bHub https://notebooks.gesis.org/services/binder/v2/gh/gesiscss/IWAAN/a8a28ab?filepath=Topic_modelling_C%2B%2B.ipynb throws an error in

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-ae57cbe4ce69> in <module>
      1 #print output
----> 2 topics, _ = display_topics(model_dir, K, voca_pt, tokens_processed, lng, the_page)

~/BTM/script/topicDisplay.py in display_topics(model_dir, K, voca_pt, tokens_processed, lng, the_page)
     43         wvs = sorted(wvs, key=lambda d:d[1], reverse=True)
     44         #tmps = ' '.join(['%s' % voca[w] for w,v in wvs[:10]])
---> 45         tmps = ' '.join(['%s:%f' % (voca[w],v) for w,v in wvs[:10]])
     46         topics_words.append([str(w) for w,v in wvs[:40]])
     47         rev = []

~/BTM/script/topicDisplay.py in <listcomp>(.0)
     43         wvs = sorted(wvs, key=lambda d:d[1], reverse=True)
     44         #tmps = ' '.join(['%s' % voca[w] for w,v in wvs[:10]])
---> 45         tmps = ' '.join(['%s:%f' % (voca[w],v) for w,v in wvs[:10]])
     46         topics_words.append([str(w) for w,v in wvs[:40]])
     47         rev = []

KeyError: 0

while regular bHub https://notebooks.gesis.org/binder/v2/gh/gesiscss/IWAAN/a8a28ab?filepath=Topic_modelling_C%2B%2B.ipynb does not.

add status page link to maintenance page

NodeNotReady failure on one of the nodes.

One of the nodes went down and the status of the node on kubectl get nodes was NotReady.

To get a better overview of the error, use the describe function.
kubectl describe nodes output:

  Ready                False   Sun, 06 Dec 2020 18:56:49 +0100   Sun, 06 Dec 2020 18:49:40 +0100   KubeletNotReady              PLEG is not healthy: pleg was last seen active 10m18.040734348s ago; threshold is 3m0s

The PLEG health checks depend on robustness of docker on the particular server (kubernetes/kubernetes#45419) but everything looked okay after checking the usual docker commands, docker ps, docker images, etc.

To solve this I did a restart of the docker service, systemctl restart docker and that worked to bring the node up.

There is an underlying docker bug/issue here that needs to be fixed. (Not sure if it's related to the recent changes to DockerHub)

Named Servers for GESISHub

Currently users can spin up only one project at a time, we should let users run multiple projects.

upgrade the cluster to workout the calico issue

This will lead to downtime.

500 Internal Server Error on Admin page.

After recent update to the new persistent binder hub chart, the admin page isn't rendered. Pagination was added to jupyterhub admin page, ORC should also support it.

error log from the hub pod.

      File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/etc/jupyterhub/extra_config.py", line 77, in get
        html = self.render_template(
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/base.py", line 1177, in render_template
        return template.render(**template_ns)
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 1090, in render
        self.environment.handle_exception()
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 832, in handle_exception
        reraise(*rewrite_traceback_stack(source=source))
      File "/usr/local/lib/python3.8/dist-packages/jinja2/_compat.py", line 28, in reraise
        raise value.with_traceback(tb)
      File "/etc/jupyterhub/orc_templates/admin.html", line 1, in top-level template code
        {% extends "templates/admin.html" %}
      File "/usr/local/share/jupyterhub/templates/admin.html", line 8, in top-level template code
        {%- elif sort.get(key) == 'desc' -%}
      File "/etc/jupyterhub/orc_templates/page.html", line 128, in top-level template code
        {% block body %}
      File "/etc/jupyterhub/orc_templates/page.html", line 146, in block "body"
        {% block main %}
      File "/usr/local/share/jupyterhub/templates/admin.html", line 107, in block "main"
        {% if pagination.links %}
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 471, in getattr
        return getattr(obj, attribute)
    jinja2.exceptions.UndefinedError: 'pagination' is undefined

Time issue when using external library

Hi,

I am using an external library for inferring gender from image. I have a notebook (link to Binder) in which one can upload an image (or URL) and then the model predicts gender. When testing this notebook on a local machine it usually takes around 1-3 seconds for one image (highlighted part on the screenshot shows computation time, inherited from the library):

And 50-60 seconds per image in the Binder environment:

The model is saved in the same github repo and is downloaded once in the beginning. The time delay is happening during the call of function infer (this function), which is reading the json data and predicting the results (using pytorch).

It seems the problem might be with GPU, although when initializing the M3inference class the parameter use_cuda is set to False, which states whether to not run on a GPU. Parallelization is also effective when there are multiple GPUs available or by parameter num_workers in the infer method, which is set to 0 in the notebook.

Would appreciate any help or suggestions on what might be causing the time delay.

Best regards,
Aleksandra

Documentation of the contribution of the DFG iLCM-Project to Mybinder

As part of his work in the DFG project iLCM - A virtual research infrastructure for large-scale qualitative data (FKZ/project number: 324867496), Kenan Erdogan (GitHub username @bitnik) contributed to the software BinderHub 120 individual commits. A detailed overview and documentation of the commits is available here.

Update CI and documentation for new NFS provisioner

We use https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner now for managing storage on the cluster.

Bug reporting

Hi all.

I've used Gesis Notebooks for 5 months now and have some issues to report.

The server sometimes crashed while I was editing a Jupyter notebook. The message "No kernel!" appeared. It didn't happen because of more than 40 minutes inactivity, but while just working in the notebooks.
Sometimes it was even not possible to save my edits and the error "Saving failed" appeared.

As I already talked to Arnim the problem is probably due to internet stability problems. Sometimes I have a bad connection and high ping. In these cases I could still use my browser and start the server as well, but I could not save my edits and a few moments later the message "No kernel!" appeared again.

Long story short: Probably a stabile internet connection is necessary for using Gesis Notebooks.

Feature requests

Launch private repositories: The last month I've worked with a private repository and to use Gesis Notebooks I first had to launch a public repository and within this I cloned the private repository. So it would be nice, if private repositories could be launched directly from Gesis Notebooks.

refacturing: It can be really frustrating, when you're changing a variable's name and have to do it like 20 times manually. Therefore it would enhance the productivity if refacturing is added.

Cheers,
Michelle

FAQ persistent launch link doku

The FAQs don't seem to contain a docu on how to "create" a launch link for the persistent BH along the lines of

Let's create a Q for this.

Enable dask-kubernetes on GESISHub

Docker will be deprecated as an underlying CR, we need to move to another CRI

https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/

Binder bot script needs to be fixed

Error log from the update-binder pod.

Cloning into 'orc_repo'...
WARNING: You are using pip version 19.3; however, version 21.1.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
2021-07-02 18:25:23,878 INFO {'binderhub': {'live': '0.2.0-n577.h14cc6c7', 'latest': '0.2.0-n604.h9bc978a'}, 'repo2docker': {'live': '2021.03.0-15.g73ab48a', 'latest': '2021.03.0-23.gea90ae2'}, 'jupyterhub': {'live': '0.11.1', 'latest': '1.0.1'}}
2021-07-02 18:25:23,878 INFO repo2docker:2021.03.0-15.g73ab48a-->2021.03.0-23.gea90ae2
Traceback (most recent call last):
File "orc_repo/gesisbinder/bot/bot.py", line 393, in <module>
b.update_repos(['repo2docker', 'binderhub'])
File "orc_repo/gesisbinder/bot/bot.py", line 254, in update_repos
self.set_gitlab_project_id(GL_REPO_NAME)
File "orc_repo/gesisbinder/bot/bot.py", line 75, in set_gitlab_project_id
if project['name'] == repo_name:
TypeError: string indices must be integers

Could be due to recent upgrades to gesis gitlab.

Status page for current health of the server.

[Moved from GitLab]

Add CHANGELOG.md to ORC

[Moved from GitLab]

We should have a changelog to keep track of all the changes/updates/fixes on the website.

[Private Repos] Feature requests for Gesis Notebooks

Feature requests

refacturing: It can be really frustrating, when you're changing a variable's name and have to do it like 20 times manually. Therefore it would enhance the productivity if refacturing is added.

Cheers,
Michelle

CI only looks at the last commit to create "mode"

Currently the CI only looks at the last commit git diff HEAD~ --name-only to find all the changed files. If multiple commits are merged in the "mode" will be not be set properly.

HTML Anchor for each q in FAQ

Can we have an anchor markup so that each individual question (along with its expanded answer) can be linked to?

add link to hackmd-document that contains a list of binders hosted by Gesis and information on status and errors

This file https://hackmd.io/@r8oqez7mTmio-rBeMPutIw/gesiscss-orc-test-notebooks/edit allows to edit the described hackmd-document, can we link that document somehwere on the gesiscss/orc page?

Add the sharing URL as part of the project form.

"Fill in the fields to see a URL for sharing your Binder"

Currently users need to manually construct the URLs if they want to share the URL, we should do the same thing like mybinder so automatically construct the links. (with a "launch with GESIS notebooks button)

Update BinderBot auto update script to point towards quay.io for images.

The link to the data science image in the "how it works" could point to a direct launch URL on gesis notebooks.

Use a rolling 14/21 day user backup strategy

User data backup is getting bigger (~170GB) so instead of the 40 day backup we need to move to 15 day, and update the backup script to have a rolling 14/21 day backup strategy instead of waiting till the 10th every month to delete previous month's backup snapshots.

Username should be visible in the header.

just like jupyterhub.

access to mongodb on https://github.com/gesiscss/btw17_sample_scripts does not work.

When trying to access database in btw17_sample_script binder, following error appears (in python):

ServerSelectionTimeoutError: 10.6.13.55:27017: timed out
Timeout: 30s
Topology Description: <TopologyDescription id: 63b840d25d5c7a7603d611d5, topology_type: Unknown, servers: [<ServerDescription ('10.6.13.55', 27017) server_type: Unknown
rtt: None, error=NetworkTimeout('10.6.13.55:27017: timed out')>]>

Support for persistence with non-jovyan users in dockerfile based repositories

Currently, persistence is not not possible with non-jovyan users in dockerfile based repositories. We should enable this via persistent_binderhub (potentially also have an issue over there). Maybe we should also have a look at how tljh-repo2docker does it.

GESIS Hub fails with the new pbhub release

This is coming in from the custom changes to the auth handlers.

https://notebooks-test.gesis.org/ -> gives 500
https://notebooks-test.gesis.org/oauth_login -> takes to the login page as expected
https://notebooks-test.gesis.org/hub/admin_orc -> gives 500
https://notebooks-test.gesis.org/hub/admin -> serves this as expected (coming in from jhub)

Service degradation due to docker toomanyrequests error

The service degradation https://notebooks.gesis.org/grafana/d/nDQPwi7mk/node-activity?viewPanel=34&orgId=1&from=1610691341338&to=1610702604704was was due to toomanyrequests error on docker pulls.

From the docker service log (sudo journalctl -fu docker.service)

Jan 15 10:09:58 spko-css-app03 dockerd[2445]: time="2021-01-15T10:09:58.873953109+01:00" level=error
msg="Handler for POST /v1.40/images/create returned error: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit"

This was fixed by doing sudo docker login again on the server, need to figure out a more permanent solution than force logging in again and again (this has happened before too).

server username should be in function signature rather than hard coded in the fabfile

orc/fabfile.py

Line 6 in b8efd04

c.user = 'iuser'

Encoding of email with '+'/special characters in the name.

From the DB

 'name': '[email protected]',
 'admin': False,
 'groups': [],
 'server': '/user/something%[email protected]/',

Users are redirected to /user/[email protected] but the server is running on /user/something%[email protected]/.

Demo session: ORC and persistent binderhub at Collaborations Workshop 2021

CfP: https://docs.google.com/forms/d/e/1FAIpQLSfT9RS7xrBmShWfB32MW5g_h8Cey-p8c9lt5wapvvi67HB77Q/viewform
deadline: 31st Jan 2021
event: https://www.software.ac.uk/cw21

gesiscss / orc Goto Github PK

orc's Introduction

Open Research Computing (ORC)

Technical Details

orc's People

Contributors

Stargazers

Watchers

Forkers

orc's Issues

Binder health checks failing

Bug reporting

Feature requests

Feature requests

Recommend Projects

Recommend Topics

Recommend Org