Code Monkey home page Code Monkey logo

k8s-digester's Issues

digester should use failurePolicy: Fail in the example mutatingwebhookconfiguration to avoid a specific edge case

We use GKE, use Binary Auth and for system components we like to use latest tags from our own Artifact Registry so we use digester.

We have a daily Terraform deployment run that deploys a variety of configs to our environment including some system components like digester but also some other deployments which utilise digester.

One of these components is a kubernetes API proxy because we use private services connect so we have this as a system component that is delivered daily.

Now, we shold have but didn't up to now, ignore_changes for the image field in the Terraform run. It didn't really matter to us but we did know that when the deployment happened every day digester without the @sha256 value would mutate the deployment but since this was in keeping with the current replicaset then this didn't matter.

Yesterday I discovered about 18000 ReplicaSets for this deployment and new ones were getting created every 2-3 minutes rotating the proxy out of service and breaking people's connection.

Inspection revealed that the k8s API proxy deployment had gotten past digester without being mutated and just had the tag 'latest' meaning this was now different to the ReplicaSet and then the fight began - the deployment controller seeing the difference thought to create a new ReplicaSet using just the tag it had, then digester would update the ReplicaSet, start rotating pods from the old to the new and the process would begin again 2 mins later.

Takeaways are..

  1. For us we should use TF lifecycle policy to ignore changes on the image to stop these servces even touching the deployment.
  2. A failurePolicy of Fail instead of Ignore would have stopped the deployment being updated if there was a service timeout to the webhook deployment.

We're implementing these changes now ourselves but submit this edge case for your consideration.

Feature Request: skip digester resolution for images from certain repo base paths

We have been experiencing an issue, supposedly resolved by a recent release of Anthos Service Mesh mdp-controller, where mdp-controller would restart pods which it believed to be out of date against the current version of the mesh deployed.

Having continued to experience this issue we checked further into this.

It appears that mdp-controller is not aware of the @sha256 form of the image and instead inspects running pods for tag version and if this does not match then it deems the pod out of date, patches the deployment and restarts the pod... digester then patches this to a @sha256 and the process starts again with the already updated pod being restarted once more because mdp-controller does not yet support the resolved @sha256 form of the image.

The actual images concerned fall within the default Binary Authorisation whitelist paths being under gcr.io/gke/releaase/asm/proxyv2 so skipping digester functionality for these paths should allow the container to start using those paths for the sidecar whilst allowing it to function for our main container images.

I'll monitor this thread and perhaps raise a pull request for this in a couple of days if nobody can get to it. In the meantime I believe that the ASM product team are now aware of the issue and a fix may be in the works but that may take longer to arrive. In any case this seems like it might be a sensible feature for images within the Binary Auth whitelist section.

The proposal would be to be able to specify a user provided list of prefix paths that filterImage in resolve.go could use to selectively skip processing.

Use of unscoped labels to control injection

As an issuer with a standard matchLabel..

... used to determine if the webhook should be enabled for the namespace.

According to k8s documentation at https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/ which uses their app.kubernetes.io prefix in their example..

'Shared labels and annotations share a common prefix: app.kubernetes.io. Labels without a prefix are private to users. The shared prefix ensures that shared labels do not interfere with custom user labels'

It seems that the plain label used in the config really should be deemed a user label and that we should expect from an issuer, creating a label standard, a prefix of their choice.

We have to trust that the issuer (you) does this, as simply for us to choose a different prefix or adding a label to a well known one from Google is clearly against that pattern.

Is there any reason that you don't use a prefix?

Improve error message on tag resolution failure

Digester currently uses HEAD requests to look up image digests.

If the image registry returns an error, there is no response body to provide additional information about the error.

As long as the HTTP status code was not 429, digester could fall back to a GET request in order to get the error message and surface it to the user.

Offline authentication fails for public GCR and AR images when gcloud is installed but not initialized

Situation

Using offine authentication to resolve the digest for a public image in Google Container Registry or Artifact Registry, e.g., gcr.io/google-containers/pause-amd64:3.2.

This fails in situations if all of the following are true:

  • Digester is not running on GCP (GKE, GCE, Cloud Run, etc) with access to the metadata server.
  • The GOOGLE_APPLICATION_CREDENTIALS environment variable is not set or doesn't point to a valid Google service account key file.
  • There are no credentials available at the expected location ($HOME/.config/gcloud/application_default_credentials.json).
  • The gcloud command line tool is installed
  • The gcloud tool has not been set up with credentials (i.e., the user hasn't run gcloud init or gcloud auth login).

If any of the above are not true, this issue does not occur.

Behavior

The webhook fails to resolve the digest for the public GCR image with this error:

handler.go:100] "msg"="admission error" "error"="could not get digest for gcr.io/google-containers/pause-amd64:3.2: failed to create token source from env: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information. or gcloud: error executing gcloud config config-helper : exit status 1"

Full details: https://github.com/google/k8s-digester/runs/2997015198#step:9:249

Investigation

authn.multiKeychain.Resolve() returns error on the first attempt to resolve,

This happens because google.Keychain.resolve() fails to create both the environment Authenticator and the gcloud Authenticator.

Helm chart

Any plans on having this packaged as a helm chart so this can be deployed using argocd?

Does the k8s-digester work with the kind CronJobs?

Hey guys -- thanks for the amazing work here...

The digester seems like can't get the hash from the registry using the CronJob kind.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: my-cronjob
spec:
  schedule: '*/15 * * * *'
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: my-cronjob
              image: myorg/my-cronjob-image:latest
kpt fn eval my-cronjob --exec ./scripts/digester

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.