Code Monkey home page Code Monkey logo

multicloud-incident-response-navigator's Introduction

Skipper

Skipper is an interactive terminal tool for managing multiple kubernetes clusters.

Installation and Setup

Before you start, you will need to have python3 and pip3 installed. Then clone the repo, and run

chmod +x ./installer && ./installer

Running Skipper

In the project directory, run

./skipper

On startup, Skipper will locate your kube-config by looking at the KUBECONFIG environment variable. If this is empty, it will look at the default location (~/.kube/config).

Skipper loads your kube-config in order to get information from your clusters, and will never create, modify, or delete any of your kubernetes resources.

Logs

Logs are stored in the logs/ directory. flask.log contains output from the flask webserver, crawler.log contains output from the crawler, and skipper.log contains output from the frontend curses application.

Usage

Skipper usage walkthrough video: using skipper

navigation keybinds

-> or ENTER: navigate into a resource

<- or BACKSPACE: navigate out of a resource

[left pane] up and down arrows: go up / down a list of resources

[right pane] up and down arrows: scroll

cluster mode

cluster mode screenshot

Cluster mode presents your kubernetes resources in the following hierarchy:

cluster -> ns -> {deployment,daemonset,statefulset,service} -> pod

Note: currently standalone pods (those not being managed by a deployment) do not show up in cluster mode.

app mode

app mode screenshot

App mode presents your kubernetes resources in the following hierarchy:

Application -> Deployable -> {deployment,daemonset,statefulset,service} -> pod

Application and Deployable are custom resource definitions specified by IBM's Multicloud Manager. If none of your clusters have the Application or Deployable crds, app mode will be disabled for you.

anomaly mode

anomaly mode screenshot

Anomaly mode presents a curated shortlist of pods across your clusters that are in an error state.

query mode

query mode screenshot

Query mode lets you search for resources across all your clusters. Currently query mode supports keyword search and filters. Supported filters are app, cluster, ns, kind. You can apply as many filters as you'd like in succession.

Example queries:

app:my-app

Will return all kubernetes resources that belong to the Application named my-app.

kind:pod frontend

Will return all pods with 'frontend' in the name.

cluster:iks-extremeblue ns:default

Will return all kubernetes resources that reside in the default namespace of the cluster named iks-extremeblue.

modeswitching

Whenever you have a resource selected and switch to app / cluster mode, Skipper will bring you to the location of the selected resource within the app / cluster hierarchy.

Architecture Overview

A high level overview of how Skipper works under the hood can be found here: skipper.pdf

Future Work

  • Convert run script to a python script (currently in-progress on run-py branch)
  • Package Skipper as a python package and upload to the Python Package Index
  • Have Skipper automatically adjust when terminal window is resized
  • Add additional search bar keybinds for highlighting, copy, paste, cut
  • Add infrastructure mode to see the Node -> pod hierarchy
  • Add ability to exec into containers within pods
  • Add ability to open yamls / logs in preferred editor
  • Script Skipper backend to allow users to run commands in terminal
  • Add related resources to summary panel

Credits

Extreme Blue 2019, RTP Lab

Sponsored by Dave Lindquist and the IBM Cloud Private team

Interns

Name School Major Role Email
Patricia Lu MIT EECS EB Technical Intern [email protected]
Jane Hsieh Oberlin College CS (CSCI) EB Technical Intern [email protected]
Tom Gong UT Austin CS / Marketing EB Technical Intern [email protected]
Caitlin Endyke UMich MBA / HCI EB Offering Management Intern [email protected]

Mentors

  • Ross Grady (RTP EB Lab Manager)
  • Shikha Srivastava
  • Chris Waldon
  • Ethan Swartzentruber
  • Ryan Smith
  • Yu Cao
  • Jorge Padilla
  • Kevin Myers
  • Michael Elder
  • Charles Quincy
  • Sanjay Joshi

multicloud-incident-response-navigator's People

Contributors

imgbotapp avatar janeon avatar pjlu01 avatar sandygogo avatar stevemar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multicloud-incident-response-navigator's Issues

Deleting a resource while running Skipper causes crawler to crash in the background

Most likely cause is the way the crawler is inserting resources into the db. Right now it is doing BFS over the cluster and app hierarchies, which means it aggregates a list of resources and adds them at a later point. If one of those resources was deleted after being added to the list but before it was written to the db, that would likely cause an issue.

Skipper crashes when going into cluster

After deleting the db and running ./skipper, get to cluster mode and all clusters are greyed out.
When I try right-arrowed into a cluster, it throws the following exception:

127.0.0.1 - - [14/Aug/2019 10:17:01] "POST /edge/iks-extremeblue/iks-extremeblue_1080e46c-7bee-11e9-84bf-b6f504208771 HTTP/1.1" 200 -
[2019-08-14 10:17:01,137] ERROR in app: Exception on /mode/cluster/iks-extremeblue [GET]
Traceback (most recent call last):
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 2311, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1834, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1737, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/_compat.py", line 36, in reraise
    raise value
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1832, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1818, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/tgong/Desktop/skipper/controller/webserver/app/routes.py", line 245, in get_table_by_resource
    ns_uid = cname + "_" + ns.metadata.uid
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Probable cause:

  • /start/ only lazy loads if there are no clusters, crawler has probably inserted clusters before GET /start/ was called
  • get_resource_table looks for cluster_name inside object metadata, that metadata field isn’t being set in cluster_mode_backend

Skipper crashes on first start

When I started Skipper using ./skipper after deleting the db, it crashed and gave me the following traceback in logs/flask.log.

[2019-08-14 09:25:21,633] ERROR in app: Exception on /running [GET]
Traceback (most recent call last):
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 2311, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1834, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1737, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/_compat.py", line 36, in reraise
    raise value
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1832, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/tgong/Desktop/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1818, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/tgong/Desktop/skipper/controller/webserver/app/routes.py", line 57, in status
    exc_str = str(type(e)) + "\n" + str(e)
NameError: name 'e' is not defined

I deleted the db and tried again and it worked, so it doesn't happen every time.

Left window doesn't erase properly when switching to anomaly mode

Curses rendering issue related to erasing/clearing/refreshing the window
To reproduce:
Navigate to a resource that is at the bottom of a table that is longer than the length of the screen (must scroll down a lot)
Right arrow into the resource, and then left arrow back out of it
Then press 3 for anomaly mode
The left window will not erase correctly and the anomalous resources will not be displayed either.

One-time fix: enter query mode, press esc, and then go back to the mode you were originally in

Starting Skipper Hangs Indefinitely

When I run, the following command

./skipper

I get the following error:

Starting Skipper..../skipper: line 12: venv/bin/activate: No such file or directory

Then it hangs indefinitely:

Waiting for flask webserver to start.  

Can't access GKE cluster

The following exception was thrown when trying to load a kube config with a GKE cluster.

127.0.0.1 - - [18/Jul/2019 16:04:56] "GET / HTTP/1.1" 200 -
/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
Process list namespaces:
Traceback (most recent call last):
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib64/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "../../backend/k8s_config.py", line 25, in test_liveness
    new_client = config.new_client_from_config(context=context_name)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 667, in new_client_from_config
    persist_config=persist_config)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 652, in load_kube_config
    loader.load_and_set(client_configuration)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 461, in load_and_set
    self._load_authentication()
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 203, in _load_authentication
    if self._load_auth_provider_token():
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 218, in _load_auth_provider_token
    return self._load_gcp_token(provider)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 261, in _load_gcp_token
    self._refresh_gcp_token()
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 270, in _refresh_gcp_token
    credentials = self._get_google_credentials()
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 163, in _refresh_credentials
    credentials.refresh(request)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/oauth2/credentials.py", line 136, in refresh
    self._client_secret))
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/oauth2/_client.py", line 237, in refresh_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/oauth2/_client.py", line 111, in _token_endpoint_request
    _handle_error_response(response_body)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/oauth2/_client.py", line 61, in _handle_error_response
    error_details, response_body)
google.auth.exceptions.RefreshError: ('invalid_grant: Bad Request', '{\n  "error": "invalid_grant",\n  "error_description": "Bad Request"\n}')
/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/
  warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
[2019-07-18 16:04:58,149] ERROR in app: Exception on /cluster_names [GET]
Traceback (most recent call last):
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 2311, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1834, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1737, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/flask/_compat.py", line 36, in reraise
    raise value
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1832, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/flask/app.py", line 1818, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/username/Code/builds/skipper/controller/webserver/app/routes.py", line 29, in get_cluster_names
    k8s_config.update_available_clusters()
  File "../../backend/k8s_config.py", line 62, in update_available_clusters
    if liveness_test.is_alive() or test_liveness(context_name) == False:
  File "../../backend/k8s_config.py", line 25, in test_liveness
    new_client = config.new_client_from_config(context=context_name)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 667, in new_client_from_config
    persist_config=persist_config)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 652, in load_kube_config
    loader.load_and_set(client_configuration)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 461, in load_and_set
    self._load_authentication()
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 203, in _load_authentication
    if self._load_auth_provider_token():
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 218, in _load_auth_provider_token
    return self._load_gcp_token(provider)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 261, in _load_gcp_token
    self._refresh_gcp_token()
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 270, in _refresh_gcp_token
    credentials = self._get_google_credentials()
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 163, in _refresh_credentials
    credentials.refresh(request)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/oauth2/credentials.py", line 136, in refresh
    self._client_secret))
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/oauth2/_client.py", line 237, in refresh_grant
    response_data = _token_endpoint_request(request, token_uri, body)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/oauth2/_client.py", line 111, in _token_endpoint_request
    _handle_error_response(response_body)
  File "/home/username/Code/builds/skipper/venv/lib/python3.7/site-packages/google/oauth2/_client.py", line 61, in _handle_error_response
    error_details, response_body)
google.auth.exceptions.RefreshError: ('invalid_grant: Bad Request', '{\n  "error": "invalid_grant",\n  "error_description": "Bad Request"\n}')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.