Code Monkey home page Code Monkey logo

aws-virtual-kubelet's People

Contributors

amazon-auto avatar dependabot[bot] avatar dims avatar nyalavarthi avatar saranbalaji90 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

aws-virtual-kubelet's Issues

Move RunInstance parameters from PodSpec to ConfigMap

Today, when calling RunInstance API, virtual Kubelet fetches the parameters from PodSpec and pass that value to EC2. VK should support fetching these from ConfigMap also. This will allow ClusterAdministrator to control what parameters to use and also improves the UX for application developers.

Reduce scope of Kubernetes service account permissions

Is your feature request related to a problem? Please describe.

Permissions defined in https://github.com/awslabs/aws-virtual-kubelet/blob/main/deploy/vk-clusterrole_binding.yaml are most likely wider-scoped than necessary.

Describe the outcome you'd like

Reduce to the minimal set of permissions needed for the system to run code that is in main today. e.g. remove these unless the system cannot run without them:

  - apiGroups: [""]
    resources: ["configmaps", "secrets"]
    verbs: ["get", "list", "watch"]
 - apiGroups: [""]
    resources: ["pods", "pods/status"]
    verbs: ["update", "delete", "create"]

Describe alternatives you've considered

Run with wide permission scope. Not desirable from a security perspective.

Pod deletion should ignore errors stopping monitoring and the applications

Describe the outcome you'd like

https://github.com/awslabs/aws-virtual-kubelet/blob/main/internal/ec2provider/ec2provider.go#L216 should not return an error, but instead should continue with pod deletion.

Since we are about to terminate the EC2 instance anyway, we should just log a warning and proceed.

Describe alternatives you've considered

We could attempt a graceful shutdown initially with a (context-based) timeout that proceeds with forceful shutdown after a period of time. Currently no use-cases exist that require this level of "gentle" handling at the application level though.

Describe in detail the functionality that requires each permission specified in the VK service account role

Describe the issue with documentation

We need to understand why each permission is needed for aws-virtual-kubelet and what operation(s) the permission setting enables. Also document what exactly fails in aws-virtual-kubelet when the permission is removed. See example below.

Expectation

A comment similar to the following should be added for each permission:

# When our custom provider returns an error, the upstream virtual-kubelet library uses a k8s client to update Pod status
# directly.  e.g.
# https://github.com/virtual-kubelet/virtual-kubelet/blob/bf3a764409b5fa5ae8f613797e4c648e9c38c4eb/node/pod.go#L172
# This requires the `pod:update` permission in the service account role.
#
# When this permission is _not_ present, virtual-kubelet is unable to update k8s with
# pod status changes which causes k8s to become out-of-sync with the pod state vs. virtual-kubelet.

Try to identify at least one use-case from the upstream virtual-kubelet library (or our custom provider) for each permission.

NOTE that a permission is a combination of a resource and a verb (e.g. get pod). You may need to separate rules into smaller groups to facilitate documentation where a large number of verbs are allowed for multiple resources.

Anything else we need to know?

PR #11 attempted to complete this exercise but did not have the full set of requirements.

Document Agent setup and usage

Describe the issue with documentation

Instructions for building and using the provided example agent are missing/incomplete.

Expectation

  • Detailed steps to use the provided example agent to verify proper setup of all other components.
  • Steps to create an AMI with the agent pre-installed.
  • Information on extending the example agent to create a usable implementation.

Anything else we need to know?

There is a provision in the code to enable retrieval of the agent software via user data (e.g. from S3). This use-case should be enable-able via configuration and documented.

GitHub Actions not working correctly for fork PRs

What happened

GitHub actions running from public forks only have read access. A step in the Format workflow updated the PR status with failed files for easier resolution (vs. having to look at the workflow run log to find the failed files). This fails when ran from a public fork with Resource not accessible by integration.

What you expected to happen

PR Workflows should run correctly from both forks and branches in this repo.

How to reproduce it (as minimally and precisely as possible)

Create a PR from a fork, observe errors in the Format workflow's logs (other workflow jobs also fail for the same root reason).

Anything else we need to know?

https://securitylab.github.com/research/github-actions-preventing-pwn-requests/ proposes a solution with 2 workflows. One to read / generate information from the user's PR code and generate a build artifact with the data. Another that triggers off this one via the workflow_run event, which grants a write-capable token. This second workflow is where examination of the previous one's artifact data and the POSTing of statuses, comments, etc. to the PR happens. The second workflow doesn't build or run the user's code which sharply increases security.

Document / Codify release process

Describe the issue with documentation

No documentation exists for the release process.

Expectation

The release process should be documented and/or automated.

Anything else we need to know?

The current flow / steps are:

  • do work in short-lived branches off main using common prefixes like fix, feat, etc. to organize branches by type
  • required at least one reviewer approval from CODEOWNERS to merge
    • branches generally should include updates to the CHANGELOG to simplify capturing changes between releases
  • squash all branch commits into a single commit with a meaningful title and description
  • releases follow semantic versioning and use -rc.1, -rc.2 etc. qualifiers for pre-releases

Add integration / runtime tests

Describe the outcome you'd like

In addition to the existing Unit Tests, there should be a suite of Integration or Runtime tests that verify behavior of the system while running. This is likely an implementation of the upstream virtual-kubelet library's importable e2e test framework.

Describe alternatives you've considered

No runtime tests is an option (and the current state). This leads to an increase in failures released to "production" though.

Anything else we need to know?

There are some specific use-cases already identified for these tests to cover. GitHub Actions integration for automated test runs is desirable, but having a way to run locally is sufficient.

Document steps to visualize metrics

Describe the issue with documentation

Documentation to see VK metrics via Prometheus or other mechanism is missing.

Expectation

Some steps exist telling users how to view the metrics.

Anything else we need to know?

Metrics are there and viewable on individual nodes via the prometheus endpoint.

Serverless applications?

I'm familiar with virtual kubelet as a technique for scheduling FAAS in a k8s cluster.

Is that a potential use for this virtual kubelet?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.