google / k8s-digester Goto Github PK

Add digests to container and init container images in Kubernetes pod and pod template specs. Use either as a mutating admission webhook, or as a client-side KRM function with kpt or kustomize.

License: Apache License 2.0

Go 100.00%

kubernetes containers go container-registry docker gcp gke binary-authorization google-cloud-platform google-kubernetes-engine

k8s-digester's Issues

digester should use failurePolicy: Fail in the example mutatingwebhookconfiguration to avoid a specific edge case

We use GKE, use Binary Auth and for system components we like to use latest tags from our own Artifact Registry so we use digester.

We have a daily Terraform deployment run that deploys a variety of configs to our environment including some system components like digester but also some other deployments which utilise digester.

One of these components is a kubernetes API proxy because we use private services connect so we have this as a system component that is delivered daily.

Now, we shold have but didn't up to now, ignore_changes for the image field in the Terraform run. It didn't really matter to us but we did know that when the deployment happened every day digester without the @sha256 value would mutate the deployment but since this was in keeping with the current replicaset then this didn't matter.

Yesterday I discovered about 18000 ReplicaSets for this deployment and new ones were getting created every 2-3 minutes rotating the proxy out of service and breaking people's connection.

Inspection revealed that the k8s API proxy deployment had gotten past digester without being mutated and just had the tag 'latest' meaning this was now different to the ReplicaSet and then the fight began - the deployment controller seeing the difference thought to create a new ReplicaSet using just the tag it had, then digester would update the ReplicaSet, start rotating pods from the old to the new and the process would begin again 2 mins later.

Takeaways are..

For us we should use TF lifecycle policy to ignore changes on the image to stop these servces even touching the deployment.
A failurePolicy of Fail instead of Ignore would have stopped the deployment being updated if there was a service timeout to the webhook deployment.

We're implementing these changes now ourselves but submit this edge case for your consideration.

Feature Request: skip digester resolution for images from certain repo base paths

We have been experiencing an issue, supposedly resolved by a recent release of Anthos Service Mesh mdp-controller, where mdp-controller would restart pods which it believed to be out of date against the current version of the mesh deployed.

Having continued to experience this issue we checked further into this.

It appears that mdp-controller is not aware of the @sha256 form of the image and instead inspects running pods for tag version and if this does not match then it deems the pod out of date, patches the deployment and restarts the pod... digester then patches this to a @sha256 and the process starts again with the already updated pod being restarted once more because mdp-controller does not yet support the resolved @sha256 form of the image.

The actual images concerned fall within the default Binary Authorisation whitelist paths being under gcr.io/gke/releaase/asm/proxyv2 so skipping digester functionality for these paths should allow the container to start using those paths for the sidecar whilst allowing it to function for our main container images.

I'll monitor this thread and perhaps raise a pull request for this in a couple of days if nobody can get to it. In the meantime I believe that the ASM product team are now aware of the issue and a fix may be in the works but that may take longer to arrive. In any case this seems like it might be a sensible feature for images within the Binary Auth whitelist section.

The proposal would be to be able to specify a user provided list of prefix paths that filterImage in resolve.go could use to selectively skip processing.

Use of unscoped labels to control injection

As an issuer with a standard matchLabel..

k8s-digester/manifests/mutating-webhook-configuration.yaml

Line 37 in 5043a58

digest-resolution: enabled

... used to determine if the webhook should be enabled for the namespace.

According to k8s documentation at https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/ which uses their app.kubernetes.io prefix in their example..

'Shared labels and annotations share a common prefix: app.kubernetes.io. Labels without a prefix are private to users. The shared prefix ensures that shared labels do not interfere with custom user labels'

It seems that the plain label used in the config really should be deemed a user label and that we should expect from an issuer, creating a label standard, a prefix of their choice.

We have to trust that the issuer (you) does this, as simply for us to choose a different prefix or adding a label to a well known one from Google is clearly against that pattern.

Is there any reason that you don't use a prefix?

Improve error message on tag resolution failure

Digester currently uses HEAD requests to look up image digests.

If the image registry returns an error, there is no response body to provide additional information about the error.

As long as the HTTP status code was not 429, digester could fall back to a GET request in order to get the error message and surface it to the user.

Offline authentication fails for public GCR and AR images when gcloud is installed but not initialized

Situation

Using offine authentication to resolve the digest for a public image in Google Container Registry or Artifact Registry, e.g., gcr.io/google-containers/pause-amd64:3.2.

This fails in situations if all of the following are true:

Digester is not running on GCP (GKE, GCE, Cloud Run, etc) with access to the metadata server.
The GOOGLE_APPLICATION_CREDENTIALS environment variable is not set or doesn't point to a valid Google service account key file.
There are no credentials available at the expected location ($HOME/.config/gcloud/application_default_credentials.json).
The gcloud command line tool is installed
The gcloud tool has not been set up with credentials (i.e., the user hasn't run gcloud init or gcloud auth login).

If any of the above are not true, this issue does not occur.

Behavior

The webhook fails to resolve the digest for the public GCR image with this error:

handler.go:100] "msg"="admission error" "error"="could not get digest for gcr.io/google-containers/pause-amd64:3.2: failed to create token source from env: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information. or gcloud: error executing gcloud config config-helper : exit status 1"

Full details: https://github.com/google/k8s-digester/runs/2997015198#step:9:249

Investigation

authn.multiKeychain.Resolve() returns error on the first attempt to resolve,

This happens because google.Keychain.resolve() fails to create both the environment Authenticator and the gcloud Authenticator.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: my-cronjob
spec:
  schedule: '*/15 * * * *'
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: my-cronjob
              image: myorg/my-cronjob-image:latest

kpt fn eval my-cronjob --exec ./scripts/digester

google / k8s-digester Goto Github PK

k8s-digester's Issues

digester should use failurePolicy: Fail in the example mutatingwebhookconfiguration to avoid a specific edge case

Feature Request: skip digester resolution for images from certain repo base paths

Use of unscoped labels to control injection

Improve error message on tag resolution failure

Offline authentication fails for public GCR and AR images when gcloud is installed but not initialized

Situation

Behavior

Investigation

Add flag to resolve platform-specific images

Helm chart

Does the k8s-digester work with the kind CronJobs?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent