awslabs / aws-virtual-kubelet Goto Github PK

View Code? Open in Web Editor NEW

23.0 6.0 9.0 14.37 MB

License: Apache License 2.0

Makefile 2.67% Dockerfile 0.75% Shell 0.32% Go 96.26%

aws-virtual-kubelet's People

Contributors

Stargazers

Watchers

Forkers

saranbalaji90 paritosh16 muskanmahajan486 samkenxstream bminahan73 josh-bonello jguice dims wangrzneu

aws-virtual-kubelet's Issues

Move RunInstance parameters from PodSpec to ConfigMap

Today, when calling RunInstance API, virtual Kubelet fetches the parameters from PodSpec and pass that value to EC2. VK should support fetching these from ConfigMap also. This will allow ClusterAdministrator to control what parameters to use and also improves the UX for application developers.

Reduce scope of Kubernetes service account permissions

Is your feature request related to a problem? Please describe.

Permissions defined in https://github.com/awslabs/aws-virtual-kubelet/blob/main/deploy/vk-clusterrole_binding.yaml are most likely wider-scoped than necessary.

Describe the outcome you'd like

Reduce to the minimal set of permissions needed for the system to run code that is in main today. e.g. remove these unless the system cannot run without them:

  - apiGroups: [""]
    resources: ["configmaps", "secrets"]
    verbs: ["get", "list", "watch"]
 - apiGroups: [""]
    resources: ["pods", "pods/status"]
    verbs: ["update", "delete", "create"]

Describe alternatives you've considered

Run with wide permission scope. Not desirable from a security perspective.

Pod deletion should ignore errors stopping monitoring and the applications

Describe the outcome you'd like

https://github.com/awslabs/aws-virtual-kubelet/blob/main/internal/ec2provider/ec2provider.go#L216 should not return an error, but instead should continue with pod deletion.

Since we are about to terminate the EC2 instance anyway, we should just log a warning and proceed.

Describe alternatives you've considered

We could attempt a graceful shutdown initially with a (context-based) timeout that proceeds with forceful shutdown after a period of time. Currently no use-cases exist that require this level of "gentle" handling at the application level though.

Describe in detail the functionality that requires each permission specified in the VK service account role

Describe the issue with documentation

We need to understand why each permission is needed for aws-virtual-kubelet and what operation(s) the permission setting enables. Also document what exactly fails in aws-virtual-kubelet when the permission is removed. See example below.

Expectation

A comment similar to the following should be added for each permission:

# When our custom provider returns an error, the upstream virtual-kubelet library uses a k8s client to update Pod status
# directly.  e.g.
# https://github.com/virtual-kubelet/virtual-kubelet/blob/bf3a764409b5fa5ae8f613797e4c648e9c38c4eb/node/pod.go#L172
# This requires the `pod:update` permission in the service account role.
#
# When this permission is _not_ present, virtual-kubelet is unable to update k8s with
# pod status changes which causes k8s to become out-of-sync with the pod state vs. virtual-kubelet.

Try to identify at least one use-case from the upstream virtual-kubelet library (or our custom provider) for each permission.

NOTE that a permission is a combination of a resource and a verb (e.g. get pod). You may need to separate rules into smaller groups to facilitate documentation where a large number of verbs are allowed for multiple resources.

Anything else we need to know?

PR #11 attempted to complete this exercise but did not have the full set of requirements.

Document Agent setup and usage

Describe the issue with documentation

Instructions for building and using the provided example agent are missing/incomplete.

Expectation

Detailed steps to use the provided example agent to verify proper setup of all other components.
Steps to create an AMI with the agent pre-installed.
Information on extending the example agent to create a usable implementation.

Anything else we need to know?

There is a provision in the code to enable retrieval of the agent software via user data (e.g. from S3). This use-case should be enable-able via configuration and documented.

Provider's vkvmaclient should not exit fatally when logging a connection attempt

What happened

https://github.com/awslabs/aws-virtual-kubelet/blob/main/internal/vkvmaclient/vkvmaclient.go#L144

What you expected to happen

I expect the provider to not exit when emitting a status log line (https://pkg.go.dev/log#Fatalf invokes os.Exit(1))

How to reproduce it (as minimally and precisely as possible)

Introduce a >1s delay in the VKVMAgent dial connection.

Environment

aws-virtual-kubelet v0.5.2

GitHub Actions not working correctly for fork PRs

What happened

GitHub actions running from public forks only have read access. A step in the Format workflow updated the PR status with failed files for easier resolution (vs. having to look at the workflow run log to find the failed files). This fails when ran from a public fork with Resource not accessible by integration.

What you expected to happen

PR Workflows should run correctly from both forks and branches in this repo.

How to reproduce it (as minimally and precisely as possible)

Create a PR from a fork, observe errors in the Format workflow's logs (other workflow jobs also fail for the same root reason).

Anything else we need to know?

https://securitylab.github.com/research/github-actions-preventing-pwn-requests/ proposes a solution with 2 workflows. One to read / generate information from the user's PR code and generate a build artifact with the data. Another that triggers off this one via the workflow_run event, which grants a write-capable token. This second workflow is where examination of the previous one's artifact data and the POSTing of statuses, comments, etc. to the PR happens. The second workflow doesn't build or run the user's code which sharply increases security.

Document / Codify release process

Describe the issue with documentation

No documentation exists for the release process.

Expectation

The release process should be documented and/or automated.

Anything else we need to know?

The current flow / steps are:

do work in short-lived branches off main using common prefixes like fix, feat, etc. to organize branches by type
required at least one reviewer approval from CODEOWNERS to merge
- branches generally should include updates to the CHANGELOG to simplify capturing changes between releases
squash all branch commits into a single commit with a meaningful title and description
releases follow semantic versioning and use -rc.1, -rc.2 etc. qualifiers for pre-releases
- versions are created using GitHub's Release capability (and main is tagged with the appropriate version identifier)

Add integration / runtime tests

Describe the outcome you'd like

In addition to the existing Unit Tests, there should be a suite of Integration or Runtime tests that verify behavior of the system while running. This is likely an implementation of the upstream virtual-kubelet library's importable e2e test framework.

Describe alternatives you've considered

No runtime tests is an option (and the current state). This leads to an increase in failures released to "production" though.

Anything else we need to know?

There are some specific use-cases already identified for these tests to cover. GitHub Actions integration for automated test runs is desirable, but having a way to run locally is sufficient.

Document steps to visualize metrics

Describe the issue with documentation

Documentation to see VK metrics via Prometheus or other mechanism is missing.

Expectation

Some steps exist telling users how to view the metrics.

Anything else we need to know?

Metrics are there and viewable on individual nodes via the prometheus endpoint.

Serverless applications?

I'm familiar with virtual kubelet as a technique for scheduling FAAS in a k8s cluster.

Is that a potential use for this virtual kubelet?

awslabs / aws-virtual-kubelet Goto Github PK

aws-virtual-kubelet's People

Contributors

Stargazers

Watchers

Forkers

aws-virtual-kubelet's Issues

Is your feature request related to a problem? Please describe.

Describe the outcome you'd like

Describe alternatives you've considered

Describe the outcome you'd like

Describe alternatives you've considered

Describe the issue with documentation

Expectation

Anything else we need to know?

Describe the issue with documentation

Expectation

Anything else we need to know?

What happened

What you expected to happen

How to reproduce it (as minimally and precisely as possible)

Environment

What happened

What you expected to happen

How to reproduce it (as minimally and precisely as possible)

Anything else we need to know?

Describe the issue with documentation

Expectation

Anything else we need to know?

Describe the outcome you'd like

Describe alternatives you've considered

Anything else we need to know?

Describe the issue with documentation

Expectation

Anything else we need to know?

Recommend Projects

Recommend Topics

Recommend Org