gardener / dashboard Goto Github PK

View Code? Open in Web Editor NEW

208.0 22.0 101.0 529.01 MB

Web-based GUI for Gardener installations.

License: Apache License 2.0

Makefile 0.05% JavaScript 61.11% HTML 0.02% Vue 37.88% Shell 0.58% Smarty 0.05% Dockerfile 0.13% SCSS 0.18%

dashboard's Introduction

Gardener Dashboard

Demo

Documentation

Gardener Dashboard Documentation

License

Apache License 2.0

dashboard's People

Contributors

Stargazers

Watchers

Forkers

mvladev rfranzke kinvolk-archives wyb1 blzjns al3xanderschmidt smartpcr nuzumglobal mssedusch ialidzhikov jia-jerry janwillies gonzolino kubeforge martinweindel cearnhartcbre pks-os kristianzh muenchdo vpnachev ntr0 zhangjianagry shreyas-s-rao kayrus ssshcl emoinlanyu diaphteiros dguendisch saurabhsharma009 nbnco-public timuthy g-pavlov forkkit danielfoehrkn bochuxt laashub-soa einfachnuralex kristian-zh ezeeyahoo nottheevilone maksimiliana thedemodev mathewjose251 minchaow istvanballok brml21 poelzi javamachr arunbagul plkokanov holgerkoser richmondeke berendt n-boshnakov schrodit bzerkskemer kubenew beckermax khalilengi shafeeqes kesavanking dimityrmirchev kostov6 jguipi alexrogalskiy isabella232 sebastianwolf-sap dkistner sallyan bd3lage huzefadatasci sven-petersen darmiel dmahmalatsap nickytd tibeer dimitar-kostadinov afritzler justin-py rickardsjp sst9999 doker78 wieneo luis-sousa-pinto phucdn-2561 mrbatschner j2l4e 23technologies raphaelvogel metal-stack scheererj vogelhome myeong-han petersutter geli01 yaxtang malsourie lekev grolu

dashboard's Issues

Defer rendering

When the displayed clusters receive many events, the rendering clogs the CPU. If possible (limited investment), let's find out whether we can defer the rendering for e.g. 200ms (after the first new event hits the backend).

Introduce Quota/Limits

To be spec'ed:

By @vlerenc: We want to offer trial clusters, but we need quota/limits for those.

Per IaaS account (which we model today, maybe not ideally, as a secret):

Main Memory
CPUs
GPUs
Disk Size (per type)
Cluster LBs
Cluster PVC Size (per type)
Cluster Lifetime (terminate clusters after given period of time)

Alternatively, for now, we could simplify that by a VMs quota, but then we would need to restrict the machine types (since they differ significantly in their resources and prices, especially when looking at the GPU-based machine types).

Per Gardener project (which we model as namespace):

Main Memory
CPUs
GPUs
Disk Size (per type)
Cluster LBs
Cluster PVC Size (per type)
Cluster Lifetime (terminate clusters after given period of time)

Note: We may have/want to add more resource types over time. Some are infrastructure specific (if not abstracted) like e.g. the disk type.

This way we could achieve the following:

Limit the overall costs for our trial IaaS account, e.g. 400 GB Main Memory, 200 CPUs, 20 GPUs, 4 TB gp2 and 1 TB io1 Disk, 4 LB services and 200 GB gp2 and 50 GB io1 PVCs per cluster, cluster auto-termination after 28 days
Allot each Gardener project only a tiny share, so that no single project can consume all resources, e.g. 20 GB Main Memory, 10 CPUs, 0 GPUs, 200 GB gp2 Disk, cluster auto-termination after 7 days
But allow for exceptions like permitting one project to use GPUs, another to use more expensive io1 disks, or yet another to extend the lifetime of their clusters

Note: See also #101.
Note: Most of these quotas can be enforced by an admission controller in the garden cluster, but the LBs and PVC size quotas will require similar controllers in the respective shoot clusters themselves, which we can handle with lower priority/later. Actually, it would be nice if LBs and PVC size quotas could be specified on the IaaS account level for the entire IaaS account, but that would require the shoot cluster admission controller to call back to the garden cluster, which is currently not possible (not reachable), so let's go with the per-cluster simplification.
Note: We support worker count min/max, i.e. cluster auto-scaling. Check the max requirements against the quota, i.e. assume the worst case.
Note: If quotas are set for the IaaS account and the Gardener project, both must comply (simplest strategy for now).
Note: Make sure a cluster member can't change these quotas. Later we may allow that, e.g. in a non-trial use case a customer purchases a certain quantity of resources and allots it to its projects.
Note: Secrets are currently placed into the Gardener project and must be readable, but we don't want that. They should be "somewhere else" outside the namespace, so that their contents are protected (not be directly visible in the project, but when a cluster is created, they appear in the tf-vars of the infrastructure terraform files, which is bad, but acceptable for now). That would also ease setting the overal quota (per IaaS account). One possible solution to refer them is described in #102, but it's not ellegant as it requires two different code paths in the operator and UI to handle these cases. It would be better if we find a solution where the secret feels the same, whether its shared or not.

Implementation Proposal:

Use a similar concept like ResourceQuotas, but in our API group (see 1 and 2) to describe quotas per IaaS account and Gardener project.
Use admission controllers for checking the quota.
Consider having independent ResourceQuotas and admission controllers (per quota) or combine them into one in the first step, but this may make later extensions, especially when we open source, somewhat harder (still, if it speeds us up, let's go with a combined ResourceQuota and admission controller)
The question of sharing a secret and binding quota is still somewhat unanswered. Personally, I would explore this by prototyping a few resources in the file system (mapping to namespaces):
- Have a shared secret in the trial NS
- Have a global quota in the trial NS
- Assumption: No longer invite people to the trial NS, kick them out over time
- Create a binding in a customer namespace to the shared secret
- Create a local secret and binding in that customer namespace to the local secret
- Motivation: The above would show local and shared secrets; the UI would always create local secret and secret binding and both the UI and the operator would only read and work with the bindings
- Create a binding in a customer namespace to the global quota
- Create a local quota and binding in that customer namespace to the local quota
- Motivation: The above would show local and global quotas; the admission controllers would only read and work with the bindings

Kube-Cluster Access for Dashboard

dashboardUrl should now be pointing to /api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

Background: several incompatible changes were put forth by the community:
a) change to proxy subresource (with v1.8.0)
b) change to https (with v1.7.1)
for ref, see history of https://github.com/kubernetes/dashboard/commits/master/src/deploy/recommended/kubernetes-dashboard.yaml

(we probably can expect further incompatible changes, which might require some kubernetes-dashboard version dependant configmap)

Automated Project Ownership Transfer

Story

As user I want to transfer project ownership to someone else, so that the new owner can take over.

(Minimal) Acceptance Criteria

When an owner gets deleted as member by another member, that other member becomes the new owner.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual test tickets?
Integration Tests are provided: Have you written automated integration tests or added manual test tickets?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide about ops-relevant changes?
User documentation: Have you informed the documentation developers about user-relevant changes?

Drop Username+Password from URLs

Let's please remove the username and password from the endpoint URLs as this isn't working anymore anyways with the new dashboard and is considered a security issue.

GitHub Integration (Issues, Labels and Comments)

Story

As ~~operator~~ admin (the same means we use to give the operator of the week access to all projects), I want to have a journal for a cluster (only visible by said admins), so that I build up knowledge around a particular cluster (e.g. "Owner often breaks it by running performance tests and then complains about this or that") or issue of that cluster (e.g. "Cluster VPN failed", "Checked seed VPN and found nothing in the logs, shoot VPN logs to be checked next when I have time again").
As ~~operator~~ admin (the same means we use to give the operator of the week access to all projects), I want to label clusters (only visible by said admins), so that I can mark clusters as e.g. "critical" or "ignore-because-already-checked" or "performance" or "to-be-ignored".

Motivation

Help ease the work of the operator (of the week) to drop https://github.wdf.sap.corp/kubernetes/kube-docs/wiki/Operator-of-the-Week-Journal which is hard to maintain.
As long as we use the dashboard also as front-end to our self-service/canary users, we might want to hide the journal and labels from the users. Later, this distinction and double-use (self-service and ops frontend) won't be necessary anymore and the journal can be shown to the operators in general.

Implementation

As a practical and quick solution, let's use GitHub for issues, journal (in the form of issue comments) and labels (in the form of issue labels). That way we do not have to implement much on our own. We only read the data conveniently from GitHub with a technical user and delegate to GitHub for editing with the true actual/human user.

See also #5 and #33. This resolves two thirds of #8 s well.

Acceptance Criteria

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Helm Chart

In order for the landscape deployment, the Gardener packages itself as Helm chart, see https://github.com/gardener/gardener/tree/master/charts. When this is stable, it would be great to also have a Helm chart for the dashboard (one or two, depending on whether we should package dex separately), so that a landscape can be set up using the same means.

Closed journal issues still visible in Dashboard

Previously, closed journal issues were not visible in the Dashboard, but they seem to be shown now?

Improve Secrets Help for Azure, GCP and OpenStack

As evident from some internal Slack communication, users struggle to set up their Azure, GCP and OpenStack technical users and permissions. Let's contact @AndreasBurger and @dkistner (for Azure), @DockToFuture and @mliepold (for GCP) and @mandelsoft and @afritzler (for OpenStack) to improve the descriptions with their feedback.

However, it may well be, that we should also give some internal advice/hints (how to apply for a technical user within our company - or any other for that matter), so maybe we have to pull that configuration out? If possible we should avoid it at first (given the other backlog), but the time will come that this may become necessary for the community and us.

Rename workers into machines

With the advent of the machine controller manager and the machine API, we should rename:

In the Gardener/in our shoot cluster spec (covered in gardener/gardener#80):
- workers into machines
In the dashboard/in the cluster creation dialog:
- WORKER into MACHINES
- Group Name into Pool Name
- cpu-worker (default) into cpu-pool

General Status Information

Story

As operator, I want to inform all users about a certain issue (e.g. AWS Europe out of M2 instances; Clusters in Azure North America temporarily down, landscape/URL change) via a banner of sorts in the dashboard (or something similar; UX to be defined), so that I can roll-out important information via this channel.

Acceptance Criteria

Provider/admin can create messages for the dashboard to display
Dashboard shows message after login

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Restrict User Access

Story

As provider I want to restrict the users that can work with the dev, staging or live landscapes so that only we work on dev and staging and only the need-to-have operators have access to live.

Motivation

At present everybody at SAP has access to all our landscapes (SAP IdP, in contrast to LDAP, doesn't give us groups). However, that means that everybody has access to our dev, staging, and live landscapes.

Acceptance Criteria

Nobody that we do not want can create projects (or at least shoots) on the dev, staging and live landscapes

Implementation Proposal

Maybe we could help us with some cluster role on dev, staging and live, that would allow only a dedicated set of users? The basis of that user list could be the users list in landscape-dev-garden#5.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual test tickets?
Integration Tests are provided: Have you written automated integration tests or added manual test tickets?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide about ops-relevant changes?

Maintenance Time Window

Story

As user I want maintenance to occur only in specified time windows, so that the impact is minimal on my business/development.
As provider I want cluster maintenance to occur not all at once, so that I don't overload the infrastructures.

Acceptance Criteria

Start time of maintenance can be defined per shoot cluster (maybe we need to generate start and end time at the moment, but the goal is to define only the start time in the UI); during shoot cluster creation we shall randomly create a maintenance start time proposal ~~during the target region's nighttime (like AWS does)~~ (for now GMT is OK)
- User can specify whether Kubernetes patch version updates should be applied automatically or not during maintenance procedure, otherwise only OS updates are scheduled for that time frame

Resources

Prerequisite is gardener/gardener#84 that introduces maintenance time windows into the shoot cluster resources.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Do no longer create Terraformer ClusterRoleBinding

The dashboard does no longer need to create the Terraformer ClusterRoleBinding when creating a project/namespace:

dashboard/backend/lib/kubernetes/Client.js

Lines 455 to 477 in 4a059ce

    
           createProjectTerraformers ({namespace}, options = {}) { 
        
             const ClusterRole = Resources.ClusterRole 
        
             const body = { 
        
               metadata: { 
        
                 name: 'garden-terraformers', 
        
                 namespace, 
        
                 labels: { 
        
                   'garden.sapcloud.io/role': 'terraformers' 
        
                 } 
        
               }, 
        
               roleRef: { 
        
                 apiGroup: ClusterRole.apiGroup, 
        
                 kind: ClusterRole.kind, 
        
                 name: 'garden-terraformer' 
        
               }, 
        
               subjects: [{ 
        
                 kind: 'ServiceAccount', 
        
                 name: 'default', 
        
                 namespace 
        
               }] 
        
             } 
        
             return this.createRoleBinding({namespace}, {...options, body}) 
        
           }

The Terraform jobs are running in the Seed clusters and the Gardener creates the required (Cluster)RoleBindings itself.

Reintroduce "some seconds/minutes ago"

For performance reasons we have disabled "some seconds/minutes ago". Let's take a look whether we can enable it with a smarter time and where it would make sense to do so (in some cases the date may be sufficient, e.g. when it happened long ago).

Create/View/Modify Shoot Cluster Spec (Mid-Term Mitigation)

This is a mid-term mitigation for #23: Can we please allow the user in the creation dialog and later in the details view to create/edit (ideally in-place) the yaml definition of the shoot cluster (spec only ?).

Cluster Journal (Admins Only)

Story

As ~~operator~~ admin (the same means we use to give the operator of the week access to all projects), I want to have a journal for a cluster (only visible by said admins), so that I build up knowledge around a particular cluster (e.g. "Owner often breaks it by running performance tests and then complains about this or that") or issue of that cluster (e.g. "Cluster VPN failed", "Checked seed VPN and found nothing in the logs, shoot VPN logs to be checked next when I have time again").

Motivation

Help ease the work of the operator (of the week) to drop https://github.wdf.sap.corp/kubernetes/kube-docs/wiki/Operator-of-the-Week-Journal which is hard to maintain.
As long as we use the dashboard also as front-end to our self-service/canary users, we might want to hide the journal from the users. Later, this distinction and double-use (self-service and ops frontend) won't be necessary anymore and the journal can be shown to the operators in general.

Acceptance Criteria

Admin can add entries to the cluster journal and modify or delete existing ones (list of journal entries with date and time in the cluster details view)
If cluster labels (#33) are set or unset, the action is logged into the journal (much like in GitHub)
Existence of a cluster journal (at least one entry) is hinted at in the cluster list (especially #4)

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Copy of kubeconfig fails

When I press the copy kubeconfig button in the cluster details view, nothing seems to get copied into my clipboard?

Advertise New Kubernetes Minor and Patch Versions

Story

As user I want to be informed when I can update/upgrade my Kubernetes cluster, so that I can stay on the latest version and benefit from the latest features and fixes.

Motivation

We as provider want to motivate our users to run the latest Kubernetes version, so that we have less issues with the clusters, offer the latest features and can faster drop support for older Kubernetes versions.

Acceptance Criteria

Dashboard shows that an update (higher patch version)/upgrade (higher minor version) is available for a cluster (in list and detail view)
User can easily patch a cluster (with the click of a button and an additional confirmation)
User can easily upgrade a cluster (with the click of a button and a warning about possible incompatibilities and an additional confirmation)

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?

Expose Cluster Operations Events/Logs

Story

As operator I want to access cluster operations logs (create, reconcile, update, delete?) in a convenient way, so that I can easily see what the outcome of each operation was without trying to grep the information out of the overall/global Gardener logs.

Motivation

We like to improve the way how we interact/access the Gardener/cluster logs. As an operator (of the week) I frequently have to check what the Gardener says/logs on a certain cluster and operation. Now with the reconciler, the logs grew and with all the other planned features, they will even grow more. Usually, we need to know something specific about a particular cluster, for a particular operation at a particular date/time (gardener/gardener#49). A central logging stack and UI (e.g. in the Garden cluster) will help me, but we can only expose it to project members if it supports multi-tenancy.

Acceptance Criteria

Show Gardener operations (and their status/results) per cluster (shoot resource event log)
Show link to logging UI (Kibana) that would display all logs for a particular Gardener operation

Implementation Proposal

We could use the Kubify-deployed logging stack and instead of showing the logs for an operation directly in the dashboard, the dashboard could generate a query for said logging stack that shows the right logs in the logging UI. Of course, as long as the logging stack doesn't support multi-tenancy, this would mean that the feature would be only available to the core team/admins, not to project members. This would be acceptable. The primary goal is to help our operators (of the week) and that solution would fully serve that purpose.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Dashboard Tests

Integrate tests into CI pipeline:

Backend
- Static code checks (Dashboard CI pipeline)
- Unit tests (Dashboard CI pipeline)
- Acceptance tests (Dashboard CI pipeline)
Frontend
- Static code checks (Dashboard CI pipeline)
- Unit (functional and UI component) tests (Dashboard CI pipeline)

Self-Termination Information/Warning

Stories

As user I want to know when my trial cluster is terminating, so that I know for how long I can use it.

Motivation

Clusters that are created with trial secrets will usually self-terminate (default is 7 days, but it depends). The user should know of this.

Acceptance Criteria

When the cluster is created, please let him know that it will self-terminate and when (depends on the secret that is used)
When the cluster is shown in list or details view, please let him know when it will self-terminate
- When the cluster is near its end of life (last 24h) make this even more visible (color, boldness, whatever seems appropriate and fits the technology and UI design)

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Shoot overview does not refresh when failed cluster gets deleted.

Issue by rfranzke
Monday Dec 04, 2017 at 07:03 GMT
Originally opened as https://git.removed/kubernetes-attic/gardener-ui/issues/151

When a Shoot cluster has reached state "Create Failed" and I try to delete it (clicking on the bin, entering its name and clicking on confirm), the Shoot cluster overview does not reload automatically (i.e. I don't see that my delete request succeeded).
When I refresh the page, I can see that the cluster has state "Delete processing".

Prepare and Release Dashboard

Let's please prepare and make a real (CI pipeline) release of the Dashboard for the new landscapes:

Update Vuetify
...and other dependencies
Final Release (CI pipeline)

Cluster Operation Retry Button

@rfranzke and @mandelsoft implemented a cluster operation retry in the Gardener that can be initiated with an annotation at the shoot resource. To expose this functionality to our users, we can now add some retry button or similar means to the Dashboard for failed shoot clusters (operations).

Create/View/Modify Shoot Cluster Spec

Story

As user I want to see my shoot cluster spec in a readable form (not only the yaml), so that I know what I have configured for my cluster.
As user I want to modify my shoot cluster spec, so that I can update it rather than having to create a new shoot cluster and move my workload.

Motivation

With the advent of the automated cluster updates and the machine controller manager, we will be able to change an existing shoot cluster, so let's expose this functionality.

Acceptance Criteria

Not Possible

Name, infrastructure and region (in general)

Implementation Proposal

Replace cluster creation dialog with a page similar to cluster details. That page can then be used to create a cluster, see its configuration and change it. Also make the yaml view editable, so that users can e.g. tweak the network settings (expert mode) and more directly in yaml (and the dashboard doesn't need to implement these complex scenarios).

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?

Expose Prometheus, Alert Manager, and Grafana Dashboard Links

Story

As operator I want direct links to Prometheus, the alert manager, and the Grafana dashboard (from the Gardener dashboard), so that I can inspect the healthiness of my cluster.

Acceptance Criteria

Direct links (from the cluster details?) to:
- Prometheus
- Alert Manager
- Grafana

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Search shoots does not work anymore

In the shoot list, when trying to search for a shoot, nothing happens.
This seems to be related to this issue: vuetifyjs/vuetify#3475
We probably have to filter in the vuex store.

Bad Shoot List Rendering Performance

When working with large projects or when using the "all projects" view as administrator, we are currently facing frontend performance issues. A first analysis showed that this is caused by re-rendering every single row in the table when a shoot in the client store changes. The root cause is probably bad / wrong usage of computed properties.
We should

Make table row a dedicated component to be able the use computed properties the right way
Work with store resources directly and process data where it is used instead of precomputing the row data in large arrays as this leads to invalidation of all row data upon a single change
Normalize shoot data in vuex store as described here: https://forum.vuejs.org/t/vuex-best-practices-for-complex-objects/10143 and here: https://forum.vuejs.org/t/updating-array-property-within-vuex/11411
This avoids looping through large arrays on web socket events and avoid re-building the array on every change (array.splice)

Other changes may be necessary in order to get this to work properly.

Cluster Labels (Admins Only)

Story

As ~~operator~~ admin (the same means we use to give the operator of the week access to all projects), I want to label clusters (only visible by said admins), so that I can mark clusters as e.g. "critical" or "ignore-because-already-checked" or "performance" or "to-be-ignored".

Motivation

Help ease the work of the operator (of the week) to drop https://github.wdf.sap.corp/kubernetes/kube-docs/wiki/Operator-of-the-Week-Journal which is hard to maintain.
As long as we use the dashboard also as front-end to our self-service/canary users, we might want to hide the labels from the users. Later, this distinction and double-use (self-service and ops frontend) won't be necessary anymore and the labels can be shown to the operators in general.

Acceptance Criteria

Admin can set or unset cluster labels (alphabetically sorted list of single-word labels in the cluster details view, much like in GitHub)
If cluster labels are set or unset, the action is logged into the journal (#5, much like in GitHub)
Cluster labels are shown in the cluster list (especially #4)
Cluster list (especially #4) can be filtered by those with certain labels or not (much like in GitHub)

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Hide Monitoring Endpoints for Non-Admins

Let's please hide the monitoring endpoints for non-admins (like seed and journal), so that "normal" users can't see our monitoring.

Create Shoot dialog tabs issue

when clicking on a tab on the create cluster dialog, the tabs scroll outside the visible area

Ops Collaboration Platform

Stories

As operator I want to influence how my clusters and inner entities (like pods and more) are rendered, so that my team and I can work with higher-level abstractions.
As operator I want to annotate my clusters and inner entities, so that my team colleagues can see these annotations around the world and clock (see also #5).
As operator I want to add additional information in an easy syntax like a wiki markup language (e.g. md), so that I can collaborate with my team colleagues.
As operator I want an integrated ticket system, so that I can create, see and work on issues in my area of work within the same ops tool that I daily use to do my work.

Motivation

Gardener shall not only become the technical means to operate Kubernetes clusters, but also a collaboration platform. Some of the above ideas were proposed by @vasu1124, while others were driven by others like the operators (of the week) (e.g. #5).

Acceptance Criteria

Users can influence the rendering of clusters and inner entities (like pods and more) in a wiki markup language (e.g. md) that supports higher-level constructs such as loops (Gardener-flavored markdown)
Users can add instance-specific annotations in this wiki-like system and even entire new pages relative to the resource tree (starting at their namespace, then clusters, then the inner entities)
Users can create, process, and resolve/close tickets associated with entities in this resource tree

Implementation Proposal

Replace hard-coded visualisations of clusters and inner entities (like pods and more) and lists of the same with a wiki engine and render templates we store as third party resources in the Garden cluster. Add a ticketing system.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Add worker group information to "Cluster Details"

Hi,
I've noticed that once you've created a cluster, you cannot find any information about the worker you've defined (apart from looking into the yaml config). For me it would be interesting to see the VM types (and maybe autoscaler and disk information) of my worker groups.

Any thoughts on that?

Best regards
Daniel

Expert Mode Network Management

Story

As user I want to influence the shoot cluster network setup in more detail or reuse my existing networks I already have so that I can better integrate my cluster with my other infrastructure.

Motivation

We have dropped the network sections from the cluster creation dialog (in the old UI, long since before the self-service), but it would make sense to (re-)expose that functionality/flexibility again in an export mode:

Acceptance Criteria

User can switch to an expert mode network management
- Cluster can be placed into existing VPC, VNET, resource group, etc. (based on input by the IaaS adopters)
- Specific CIDRs can be defined for the L2/L3 networks
- Specific zones can be selected

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Show Mailing List Information

Story

As user, I want to easily reach the Gardener dev/ops teams or like-minded folks, so that I can engage in a communication with them or get help from them (or others).
As provider, I want to establish a communication channel for everybody, so that I learn of issues (fast) and also foster a community that can help and support itself.

Acceptance Criteria

Show link to our mailing list https://listserv.sap.corp/mailman/listinfo/kubernetes-users in the UI
Show "compose message" button/link that opens the default e-mail client accordingly (mailto:[email protected])

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?

Write/Update Gardener UI Contributor Section in README.md

Issue by vlerenc
Saturday Aug 05, 2017 at 07:07 GMT
Originally opened as https://git.removed/kubernetes-attic/gardener-ui/issues/28

Story

As contributor I depend on a well prepared contributor guide, so that I can be productive and help the team/effort quickly as well as feel needed/integrated.

Motivation

Newly hired or joined colleagues or contributors from outside the group of colleagues who primarily work on a repo/subject/topic have a hard time contributing, especially when sitting in remote locations, but in order to sustainably grow, we need everybody to be happy and help our cause.

Acceptance Criteria

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?

Web Console Access (Clusters, Workers and Pods)

Stories

As operator I want an easy way to access the landscape, so that I do not have to repeat the tedious steps of downloading the garden cluster kubeconfig, switching to the terminal and pointing my kubectl to the downloaded kubeconfig.
As operator I want an easy way to access the shoot cluster control plane in the seed cluster or shoot cluster, so that I do not have to repeat the tedious steps of finding out about and downloading the seed cluster kubeconfig, switching to the terminal and pointing my kubectl to the downloaded kubeconfig and correct namespace.
As operator I want an easy way to access any pod in the shoot cluster control plane in the seed cluster or shoot cluster, so that I can inspect it "locally" from within.
As operator I want an easy way to access a seed/shoot cluster via a pod (via VPN), so that I can inspect the cluster directly from within.
As operator I want an easy way to access a worker by name or running a particular garden/seed/shoot pod via a privileged pod (potentially via VPN), so that I can inspect the worker directly from underneath.
As operator I want an easy way to access a worker by name or running a particular garden/seed/shoot pod via the actual network, so that I can inspect the worker directly from underneath even if VPN is down, the kubelet fails to respond or the resources are depleted.
- As operator I want to limit the lifetime of a potential bastion (required for AWS, unclear about the other infrastructures) to the bare minimum, so that I can reduce the attack surface.

Motivation

Ops and debugging, constrained by security considerations.

Acceptance Criteria

First Step

Web console with kubectl pointing to the project namespace in the garden cluster for which the web console was opened (scope: project)

Second Step

Web console with kubectl pointing to the shoot cluster control plane namespace in the seed cluster for which the web console was opened (scope: shoot cluster)
Web console with kubectl pointing to the kube-system namespace in the shoot cluster for which the web console was opened (scope: shoot cluster)

Third Step (let's postpone this for now as it can be easily achieved with the functionality of the second step)

Web console into a particular pod in the shoot cluster control plane namespace in the seed cluster (scope: shoot cluster)
Web console into a particular pod in the kube-system (or any other) namespace in the shoot cluster (scope: shoot cluster)

Fourth Step

Web console into a new pod in the shoot cluster control plane namespace in the seed cluster (scope: shoot cluster)
Web console into a new pod in the kube-system namespace in the shoot cluster (scope: shoot cluster)

Fifth Step

Web console into a new privileged pod (chrooting into the host with full host access) nodeSelectored by node/host name or same node that runs a particular garden pod
Web console into a new privileged pod (chrooting into the host with full host access) nodeSelectored by node/host name or same node that runs a particular shoot cluster control plane pod (scope: shoot cluster)
Web console into a new privileged pod (chrooting into the host with full host access) nodeSelectored by node/host name or same node that runs a particular shoot cluster pod (scope: shoot cluster)

Sixth Step (let's postpone this for now until we see sufficient need due to the high complexity)

Web console into a concrete worker via the actual network (not via VPN)
- If a bastion is required:
  - Implement it as lease, with 8h default time, which can be extended any number of times by 8h, 24h, 7d
  - Show existence of bastion in dashboard with IP, randomly created credentials, remaining lease time, buttons to extend the lease or terminate the bastion right away

Implementation Proposal (for bastions)

Terraform job for creation and deletion of bastion (but we actually would like to get rid of that "Terraform stuff" mid- to long-term)
CRD-based approach with Gardener as executor (creation, deletion when lease expires or Gardener sets the finalizer on the CRD)

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

More prominent error code description

Make the error code description (e.g. Invalid cloud provider credentials) more prominent, and add it to the tooltip of the status icon

Expert Mode Domain Management

Story

As user I want to create shoot clusters in my own DNS zone, so that I define its domain name.
As user I want to create shoot clusters without a Gardener managed domain, so that I define its domain name all by myself without sharing the zone secret with the Gardener.

Motivation

By default (for the self-service) we manage the cluster domain automatically. In this case, we do not ask for a hosted zone ID and also not for a domain name (though the latter may be shown as read-only/FYI). In this fully managed case (default), the hosted zone is ours and the domain is created via the following pattern, e.g. <shoot_name>.<project_name>.shoot.example.com. The Gardener supports this and detects the case if no hosted zone ID is found in the shoot cluster resource and the target domain matches a pre-defined one (like above). That means we only lack the UI part now.

Acceptance Criteria

User can switch to an expert mode domain management (two options):
- User lets us manage the DNS records:
  - User must set the domain name
  - User must then provide zone details (AWS Route 53 zone ID or GCP Cloud DNS zone name)
  - User must select a DNS secret that manages the above zone
- User instructs us to respect a given cluster domain, but do nothing else (user must set the DNS records all by himself, we do nothing in regards to DNS records, but we add the given domain to the self-signed API server cert)
  - User must set the domain name

Implementation Proposal

The previous DNS Settings (in the old UI, long since before the self-service) shall become something like the non-default expert mode that the user can choose in the cluster creation dialog. If he chooses the expert mode, he must supply a hosted zone ID (AWS) or zone name (GCP) and the domain name (like today, except that we should no longer give any proposal/default domain name) as well as a DNS secret that must include the Route53FullAccess policy for AWS and whatever is necessary for Google Cloud DNS (the user should be informed about that requirement, if possible).

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the [operations guide]?

Add animated dashboard demo to Readme.md

Gardener UI CI

Issue by kubernetes-jenkins
Wednesday Jul 19, 2017 at 18:44 GMT
Originally opened as https://git.removed/kubernetes-attic/gardener-ui/issues/12

Stories

As contributor I depend on a continuous integration/delivery pipeline, so that I neither waste my time with repetitive tasks nor can I make (fatal) mistakes.

Motivation

Acceptance Criteria

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?

ℹ️ Migrated from Jira issue KUBE-148

Confirmation Dialog

Story

As user I want a final confirmation dialog when creating new clusters, so that I see what this will mean to me.

Motivation

Multiple users complained about a missing confirmation dialog and final confirmation button, i.e. some users do not want to get the cluster by pressing create, but they want to get an overview, what they have configured and ideally (technically even more important) also the IaaS resources that would be created (e.g. so that they can ensure proper quotas/limits and have a better feeling what happens under the hood in their IaaS account).

Acceptance Criteria

Should the current cluster creation dialog become a wizard again, we should at least list:

configuration made in the wizard
domain and networks (which are currently handled blindly)
used up resources (in terms of quota, i.e. sum of CPUs, GPUs, memory, disk, LBs, cluster lifespan, etc.)
ideally list of additional IaaS infrastructures (see above; hard to get to)

Implementation Proposal

The UI doesn't have this information and even the Gardener can only extract the information indirectly by inspecting and counting the Terraform resources that are rendered and would be created.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?

Hibernate/Wake-Up Clusters

Story

As user/operator I want to hibernate clusters, so that I can save costs when I don't need the clusters anymore/temporarily.

Motivation

See gardener/gardener#50 in which the required Gardener functionality was implemented.

Acceptance Criteria

Cluster can be scaled down to 0 machines across all pools and up again

Implementation Proposal

We can either do nothing (basically only do #23, which we like to do anyways) and leave it to the user to find out that by scaling down all pools manually he can effectively hibernate a cluster or we can offer some convenience function in the dashboard that does that across all pools, clearly labeled "Hibernate". The reverse operation, "Wake-Up", is harder, but maybe the former pool sizes could be written as annotations into the shoot resource by the dashboard, in which case "Wake-Up" would know how to scale up again.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Show All Clusters Across All Projects (can be filtered by e.g. having issues or not)

Story

As ~~operator~~ admin (the same means we use to give the operator of the week access to all projects), I want to see all clusters across all projects and filter them by e.g. having issues or not, having a certain label set or not (see #33), so that I can do my ops tasks more easily.

Acceptance Criteria

Dashboard offers means to show all clusters across all projects ~~the logged in user is member of~~ if an admin is logged in (e.g. from the project selector)
Dashboard allows to filter the above list, e.g. by having issues or not or by having a certain label set or not (see #33)

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide?

Improve Infrastructure Labels, Colors and Logos

Companies may want to "brand" their supported infrastructures, e.g. internal OpenStack installations. Historically the Dashboard used SAP Gold for "SAP Cloud Platform" (now "OpenStack"), but we should change this. Also the other infrastructures use not the right logos/colors, so it's time for a cleanup.

Let's make the following bits configurable at Garden cluster resource level (not hard-coded):

Name, RGB color, (abstract) resource name (for logo) per cloud profile

If we have this, it should be possible to support also multiple AWSes (e.g. general and government cloud), multiple OpenStacks or whatnot. It will also be possible to "brand" them, e.g. "OpenStack" could become "OpenStack (SAP Cloud Platform on Converged Cloud Industry Edition)" or we could integrate "special" discriminators/qualifiers like "Office Network" into the configurable name.

Let's also change the following:

OpenStack color should rather become OpenStack Red (see logo below)
Logo -> https://www.openstack.org/brand/openstack-logo/logo-download

AWS: https://aws.amazon.com/de/architecture/icons -> AWS Simple Icons im EPS- und SVG-Format -> Simple Icons im EPS- und SVG-Format or AWS Simple Icons für PowerPoint

Azure: New logo (not sure where to get it from officially), see: https://buildazure.com/2017/09/25/microsoft-azure-gets-a-new-logo-and-a-manifesto/

GCP: https://cloud.google.com/icons -> SVG/PNG -> Products & Services/Extras/Google Cloud Platform.png|svg

Keep Shoot in Vuex store while viewing it

As an Administrator, I navigation from the "Shoots with issues" list to the details page of a shoot. If this shoot becomes healthy, it will be removed from the shoots with issues list and thus remove from the local vuex store. In this case, I will be left with a broken page as all shoot information i gone but I'm still on the details page.
We should join a dedicated room for this shoot while viewing it, so that it will not be removed from the store while viewing it. In addition, we should inform the user or navigate away in the unlikely event of an actual shoot deletion.

Control Plane Tweaking

Story

As user I want to tweak certain features like alpha/beta switches or allowing privileged containers or not, so that I can do more things with my clusters, even though not all are production-ready yet in Kubernetes.

Motivation

The Gardener allows to tweak the configurations of API server, controller manager, scheduler and kube-proxy as well as allowing privileged containers or not, but we haven't exposed that yet (unless we are fine with #34).

Acceptance Criteria

To be defined in more detail. We need to investigate first (with Rafael F.), whether we can offer some pre-selection or how to actually allow the configuration (outcome can also be to leave it at #34, but then we need some documentation and it will not be very convenient).

During cluster creation/update the user shall be able to switch on/off privileged containers support
...

Resources

See Gardener.

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual test tickets?
Integration Tests are provided: Have you written automated integration tests or added manual test tickets?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?
Operations guide: Have you updated the operations guide about ops-relevant changes?
User documentation: Have you informed the documentation developers about user-relevant changes?

Use attach prop on v-menu in MainNavigation

By @petersutter: When upgrading the vuetify version to >= 1.0.0, add the attach property to the projectFilter v-menu in the MainNavigation so that the menu does not scroll with the main page.

See also related vuetify github issue:
vuetifyjs/vuetify#2981 (comment)

Technical User Management

Story

As programmatic Gardener consumer I want a technical user, so that I can manage my cluster without the use of the dashboard (create or delete them).

Motivation

Our current IdP SAML-based authentication solution works only from the dashboard. There are limitations to expand that to technical users and programmatic access.

Acceptance Criteria

In the members area of the Gardener dashboard we should offer the possibility to create a technical user (API user). This could be done as an additional tab in the "assign user to project" popup.
After adding the technical user as a member, a download icon should appear (next to the delete icon), which allows him to download a kubeconfig file. This kubeconfig file includes a valid token and allows the user to create shoot clusters via kubectl.
After the enduser added the technical user the following things happen in the background
- A ServiceAccount with the same name as the technical user is created.
- The Service Account is added to the rolebinding 'garden-project-members' to have the same restricted permissions as all other members.
- A kubeconfig file is created with a valid token. The token can be retrieved from a secret, which is automatically created when you create a ServiceAccount.

Note: Scripting solution is already available internally at https://github.wdf.sap.corp/kubernetes/garden-setup/tree/master/utils (by @RaphaelVogel).

Definition of Done

Knowledge is distributed: Have you spread your knowledge in pair programming/code review?
Unit Tests are provided: Have you written automated unit tests or added manual NGPTT tickets?
Integration Tests are provided: Have you written automated integration tests?
Minimum API exposure: If you have added public API, was it really necessary/is it minimal?

	createProjectTerraformers ({namespace}, options = {}) {
	const ClusterRole = Resources.ClusterRole
	const body = {
	metadata: {
	name: 'garden-terraformers',
	namespace,
	labels: {
	'garden.sapcloud.io/role': 'terraformers'
	}
	},
	roleRef: {
	apiGroup: ClusterRole.apiGroup,
	kind: ClusterRole.kind,
	name: 'garden-terraformer'
	},
	subjects: [{
	kind: 'ServiceAccount',
	name: 'default',
	namespace
	}]
	}
	return this.createRoleBinding({namespace}, {...options, body})
	}