Code Monkey home page Code Monkey logo

orc's Introduction

This repository was superseded by https://github.com/gesiscss/orc2

Open Research Computing (ORC)

For more information about ORC project: https://notebooks.gesis.org/about/

Feel free to open an issue in this repository if there are any questions or contact us at [email protected].

Technical Details

This ORC instance is deployed on kubernetes on bare metal machines with Ubuntu 18.04. And kubernetes cluster (ORC cluster) is created with kubeadm (v1.18.3). calico is used as network provider. Docker version 19.03.8 is installed on servers.

All docker images of this project can be found in https://hub.docker.com/u/gesiscss/.

Because we setup the kubernetes cluster on baremetal, we use the deployment approach "Using a self-provisioned edge".

Nginx is used as reverse proxy server and load balancer. It also handles SSL offloading/termination and serves static files. It is outside of ORC cluster and a public entrypoint to the cluster. All services in the cluster has type NodePort.

NFS Server Provisioner is the default storage provider in ORC cluster.

Persistent BinderHub runs under https://notebooks.gesis.org/hub/.

Uses Docker Hub Registry (https://hub.docker.com/u/gesiscss/) to store built images.

BinderHub runs under https://notebooks.gesis.org/binder/.

GESIS Hub and Binder uses same docker images (they uses same repo2docker version).

Gallery of popular repos launched on GESIS Binder and featured projects: https://notebooks.gesis.org/gallery/


Funded by the German Research Foundation (DFG). FKZ/project number: 324867496.

orc's People

Contributors

arnim avatar arnimtest avatar bitnik avatar dependabot[bot] avatar dhurim avatar gesisnotebooks avatar mriduls avatar rgaiacs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orc's Issues

"Meta" Helm chart for ORC

We have now helm chart for gesis binder and gesis hub.

  • Learn more about helm releases and rollback
  • Make helm chart for gallery
  • Make helm chart for orc-site -> here mount html templates as configmaps too
  • Finally make a orc helm chart for whole project (similar to mybinder.org-deploy chart), which requires all other charts/apps (including storage, monitoring, gesis binder, hub, gallery charts...). So we can have 1 values.yaml for whole project?
    - For example we can also have GESIS templates (and statics?) in a configmap and all apps use that configmap.

[Moved from GitLab]

Restarting from the hub

Hi guys,

Every once in a while, when the hub says, a server is still running, I click on "My Server" but I get the error message:

"503 : Service Unavailable. Your server appears to be down. Try restarting it from the hub"

Then I'm in a loop: I get back to the hub but going back to the server does not work. I was caught in that loop for a whole day before i noticed! (NOT)

If I need to urgently work, I log myself off, log in again, wait a few seconds. This shuts down the server. Then I start it again and it works.

Best and thanks!

Haiko

Documentation of ORC features

  • Add more details to FAQs
  • Create tutorials using various features of ORC.
  • mention /projects folder
  • How to use nbgitpuller and other recommendations (documentation/tutorials).

Allow FTP downloading

I recently started using GESIS binder for showcasing notebooks with geographical spatio-temporal analysis. I wondered if there's a sort of firewall blocking downloading files from FTP sources.

For instance, when wget the following ftp source:
!wget ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf

It keeps unresponsive

--2021-12-09 17:19:51--  ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf
           => ‘dp13046.pdf’
Resolving ftp.zew.de (ftp.zew.de)... 193.196.11.224
Connecting to ftp.zew.de (ftp.zew.de)|193.196.11.224|:21... 

The same problem exists with python urllib.
urllib.request.urlretrieve('ftp://ftp.zew.de/pub/zew-docs/dp/dp13046.pdf', 'dp13046.pdf')

I would appreciate guidance on how to fetch data from FTP sources within GESIS binder environments.

Docker registry service failing

Binder health checks failing

Binder health checks failing since yesterday 23:00

// 20201128110613
// https://notebooks.gesis.org/binder/health

{
  "ok": false,
  "checks": [
    {
      "service": "Docker registry",
      "ok": false
    },
    {
      "service": "JupyterHub API",
      "ok": true
    },
    {
      "service": "Pod quota",
      "ok": true,
      "total_pods": 8,
      "build_pods": 0,
      "user_pods": 8,
      "quota": 200
    }
  ]
}

Downtime for network maintenance

  • Put up a banner 72hr before scheduled downtime on notebooks.gesis.org as well as on status page.
  • Create a script/handler for querying active users in the last 2 weeks to send out an email too (?) 4eb351b

Different behaviour in pub binder and persistent binder

As @faflo just reported for IWAAN/Topic_modelling_C++.ipynb cell 10

topics, _ = display_topics(model_dir, K, voca_pt, tokens_processed, lng, the_page)

persistent bHub https://notebooks.gesis.org/services/binder/v2/gh/gesiscss/IWAAN/a8a28ab?filepath=Topic_modelling_C%2B%2B.ipynb throws an error in

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-ae57cbe4ce69> in <module>
      1 #print output
----> 2 topics, _ = display_topics(model_dir, K, voca_pt, tokens_processed, lng, the_page)

~/BTM/script/topicDisplay.py in display_topics(model_dir, K, voca_pt, tokens_processed, lng, the_page)
     43         wvs = sorted(wvs, key=lambda d:d[1], reverse=True)
     44         #tmps = ' '.join(['%s' % voca[w] for w,v in wvs[:10]])
---> 45         tmps = ' '.join(['%s:%f' % (voca[w],v) for w,v in wvs[:10]])
     46         topics_words.append([str(w) for w,v in wvs[:40]])
     47         rev = []

~/BTM/script/topicDisplay.py in <listcomp>(.0)
     43         wvs = sorted(wvs, key=lambda d:d[1], reverse=True)
     44         #tmps = ' '.join(['%s' % voca[w] for w,v in wvs[:10]])
---> 45         tmps = ' '.join(['%s:%f' % (voca[w],v) for w,v in wvs[:10]])
     46         topics_words.append([str(w) for w,v in wvs[:40]])
     47         rev = []

KeyError: 0

while regular bHub https://notebooks.gesis.org/binder/v2/gh/gesiscss/IWAAN/a8a28ab?filepath=Topic_modelling_C%2B%2B.ipynb does not.

NodeNotReady failure on one of the nodes.

One of the nodes went down and the status of the node on kubectl get nodes was NotReady.

To get a better overview of the error, use the describe function.
kubectl describe nodes output:

  Ready                False   Sun, 06 Dec 2020 18:56:49 +0100   Sun, 06 Dec 2020 18:49:40 +0100   KubeletNotReady              PLEG is not healthy: pleg was last seen active 10m18.040734348s ago; threshold is 3m0s

The PLEG health checks depend on robustness of docker on the particular server (kubernetes/kubernetes#45419) but everything looked okay after checking the usual docker commands, docker ps, docker images, etc.

To solve this I did a restart of the docker service, systemctl restart docker and that worked to bring the node up.

There is an underlying docker bug/issue here that needs to be fixed. (Not sure if it's related to the recent changes to DockerHub)

500 Internal Server Error on Admin page.

After recent update to the new persistent binder hub chart, the admin page isn't rendered. Pagination was added to jupyterhub admin page, ORC should also support it.

error log from the hub pod.

      File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/etc/jupyterhub/extra_config.py", line 77, in get
        html = self.render_template(
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/base.py", line 1177, in render_template
        return template.render(**template_ns)
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 1090, in render
        self.environment.handle_exception()
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 832, in handle_exception
        reraise(*rewrite_traceback_stack(source=source))
      File "/usr/local/lib/python3.8/dist-packages/jinja2/_compat.py", line 28, in reraise
        raise value.with_traceback(tb)
      File "/etc/jupyterhub/orc_templates/admin.html", line 1, in top-level template code
        {% extends "templates/admin.html" %}
      File "/usr/local/share/jupyterhub/templates/admin.html", line 8, in top-level template code
        {%- elif sort.get(key) == 'desc' -%}
      File "/etc/jupyterhub/orc_templates/page.html", line 128, in top-level template code
        {% block body %}
      File "/etc/jupyterhub/orc_templates/page.html", line 146, in block "body"
        {% block main %}
      File "/usr/local/share/jupyterhub/templates/admin.html", line 107, in block "main"
        {% if pagination.links %}
      File "/usr/local/lib/python3.8/dist-packages/jinja2/environment.py", line 471, in getattr
        return getattr(obj, attribute)
    jinja2.exceptions.UndefinedError: 'pagination' is undefined

Time issue when using external library

Hi,

I am using an external library for inferring gender from image. I have a notebook (link to Binder) in which one can upload an image (or URL) and then the model predicts gender. When testing this notebook on a local machine it usually takes around 1-3 seconds for one image (highlighted part on the screenshot shows computation time, inherited from the library):
Screenshot 2020-11-27 at 16 40 18

And 50-60 seconds per image in the Binder environment:
Screenshot 2020-11-27 at 16 48 13

The model is saved in the same github repo and is downloaded once in the beginning. The time delay is happening during the call of function infer (this function), which is reading the json data and predicting the results (using pytorch).

It seems the problem might be with GPU, although when initializing the M3inference class the parameter use_cuda is set to False, which states whether to not run on a GPU. Parallelization is also effective when there are multiple GPUs available or by parameter num_workers in the infer method, which is set to 0 in the notebook.

Would appreciate any help or suggestions on what might be causing the time delay.

Best regards,
Aleksandra

Bug reporting

Bug reporting

Hi all.

I've used Gesis Notebooks for 5 months now and have some issues to report.

The server sometimes crashed while I was editing a Jupyter notebook. The message "No kernel!" appeared. It didn't happen because of more than 40 minutes inactivity, but while just working in the notebooks.
Sometimes it was even not possible to save my edits and the error "Saving failed" appeared.

As I already talked to Arnim the problem is probably due to internet stability problems. Sometimes I have a bad connection and high ping. In these cases I could still use my browser and start the server as well, but I could not save my edits and a few moments later the message "No kernel!" appeared again.

Long story short: Probably a stabile internet connection is necessary for using Gesis Notebooks.

Feature requests

Launch private repositories: The last month I've worked with a private repository and to use Gesis Notebooks I first had to launch a public repository and within this I cloned the private repository. So it would be nice, if private repositories could be launched directly from Gesis Notebooks.

refacturing: It can be really frustrating, when you're changing a variable's name and have to do it like 20 times manually. Therefore it would enhance the productivity if refacturing is added.

Cheers,
Michelle

FAQ persistent launch link doku

The FAQs don't seem to contain a docu on how to "create" a launch link for the persistent BH along the lines of

image

Let's create a Q for this.

Binder bot script needs to be fixed

Error log from the update-binder pod.

Cloning into 'orc_repo'...
WARNING: You are using pip version 19.3; however, version 21.1.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
2021-07-02 18:25:23,878 INFO {'binderhub': {'live': '0.2.0-n577.h14cc6c7', 'latest': '0.2.0-n604.h9bc978a'}, 'repo2docker': {'live': '2021.03.0-15.g73ab48a', 'latest': '2021.03.0-23.gea90ae2'}, 'jupyterhub': {'live': '0.11.1', 'latest': '1.0.1'}}
2021-07-02 18:25:23,878 INFO repo2docker:2021.03.0-15.g73ab48a-->2021.03.0-23.gea90ae2
Traceback (most recent call last):
File "orc_repo/gesisbinder/bot/bot.py", line 393, in <module>
b.update_repos(['repo2docker', 'binderhub'])
File "orc_repo/gesisbinder/bot/bot.py", line 254, in update_repos
self.set_gitlab_project_id(GL_REPO_NAME)
File "orc_repo/gesisbinder/bot/bot.py", line 75, in set_gitlab_project_id
if project['name'] == repo_name:
TypeError: string indices must be integers

Could be due to recent upgrades to gesis gitlab.

Add CHANGELOG.md to ORC

[Moved from GitLab]

We should have a changelog to keep track of all the changes/updates/fixes on the website.

[Private Repos] Feature requests for Gesis Notebooks

Feature requests

Launch private repositories: The last month I've worked with a private repository and to use Gesis Notebooks I first had to launch a public repository and within this I cloned the private repository. So it would be nice, if private repositories could be launched directly from Gesis Notebooks.

refacturing: It can be really frustrating, when you're changing a variable's name and have to do it like 20 times manually. Therefore it would enhance the productivity if refacturing is added.

Cheers,
Michelle

Add the sharing URL as part of the project form.

"Fill in the fields to see a URL for sharing your Binder"

Currently users need to manually construct the URLs if they want to share the URL, we should do the same thing like mybinder so automatically construct the links. (with a "launch with GESIS notebooks button)

Use a rolling 14/21 day user backup strategy

User data backup is getting bigger (~170GB) so instead of the 40 day backup we need to move to 15 day, and update the backup script to have a rolling 14/21 day backup strategy instead of waiting till the 10th every month to delete previous month's backup snapshots.

access to mongodb on https://github.com/gesiscss/btw17_sample_scripts does not work.

When trying to access database in btw17_sample_script binder, following error appears (in python):

ServerSelectionTimeoutError: 10.6.13.55:27017: timed out
Timeout: 30s
Topology Description: <TopologyDescription id: 63b840d25d5c7a7603d611d5, topology_type: Unknown, servers: [<ServerDescription ('10.6.13.55', 27017) server_type: Unknown
rtt: None, error=NetworkTimeout('10.6.13.55:27017: timed out')>]>

Service degradation due to docker toomanyrequests error

The service degradation https://notebooks.gesis.org/grafana/d/nDQPwi7mk/node-activity?viewPanel=34&orgId=1&from=1610691341338&to=1610702604704was was due to toomanyrequests error on docker pulls.

From the docker service log (sudo journalctl -fu docker.service)

Jan 15 10:09:58 spko-css-app03 dockerd[2445]: time="2021-01-15T10:09:58.873953109+01:00" level=error
msg="Handler for POST /v1.40/images/create returned error: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit"

This was fixed by doing sudo docker login again on the server, need to figure out a more permanent solution than force logging in again and again (this has happened before too).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.