Code Monkey home page Code Monkey logo

Comments (13)

juliusl avatar juliusl commented on July 18, 2024 2

@awalford16 I'm able to reproduce this on my side, so I'll be working on a fix. ETA, for a fix will be some time in the next month.

cc: @northtyphoon @ganeshkumarashok

from acr.

juliusl avatar juliusl commented on July 18, 2024

Thanks for reporting this. Does the nodepool have access to both registries?

from acr.

awalford16 avatar awalford16 commented on July 18, 2024

Hi, yes we have a service principal tied to the AKS cluster that has pull permissions for both ACRs

from acr.

juliusl avatar juliusl commented on July 18, 2024

Got it, and can you confirm a couple of things to help narrow down the root cause.

  • If artifact streaming is disabled in the cluster there are no issues?

  • Are there any issues when only a single registry with streaming enabled is used?

Meanwhile, I will try to reproduce the issue on my end as well.

from acr.

awalford16 avatar awalford16 commented on July 18, 2024

I can confirm there are no issues if artifact streaming is disabled in the cluster, we have only started seeing this since we enabled it and we only see it for the one nodepool that we enabled it on.

I was able to get the image to pull from the same ACR when it was not using streaming. However the issue appears to be temporamental and hard to reproduce as it affects random nodes (even though they have not interacted with our streaming-enabled ACR at any point)

Could you please share the command to disable artifact streaming on a nodepool, I could disable it on the pool we are seeing issues and validate that the issue goes away

from acr.

juliusl avatar juliusl commented on July 18, 2024

@awalford16 so I did a bit more digging and I have a workaround you could try. It appears to happen when you have the same image-reference w/ different registries in the same pod.

For example,

Fails

apiVersion: v1
kind: Pod
metadata:
  name: &name mix-wordpress
spec:
  containers:
  - name: wordpress-streaming
    image: streaming.azurecr.io/wordpress:latest
  - name: wordpress-nonstreaming
    image: non-streaming.azurecr.io/wordpress:latest

Works

apiVersion: v1
kind: Pod
metadata:
  name: non-wordpress
spec:
  containers:
  - name: wordpress-nonstreaming
    image: non-streaming.azurecr.io/wordpress:latest

Works

apiVersion: v1
kind: Pod
metadata:
  name: wordpress
spec:
  containers:
  - name: wordpress-streaming
    image: streaming.azurecr.io/wordpress:latest

(I tested these all running on the same node pool w/ node-selectors)

I am working on figuring out the root cause and a fix, but just wanted to share a possible workaround you could try for your own evaluation.

from acr.

juliusl avatar juliusl commented on July 18, 2024

So it looks like this can affect any pod spec that has multiple containers. It looks like there's been a regression in AKS/containerd but I'm still trying to narrow it down.

from acr.

juliusl avatar juliusl commented on July 18, 2024

@awalford16 Could you provide the value of this label from your nodepool that has this issue,

`kubernetes.azure.com/node-image-version'

For example it should be some value that looks like this - AKSUbuntu-2204gen2containerd-202403.13.0

from acr.

awalford16 avatar awalford16 commented on July 18, 2024

@juliusl thanks for looking into this. The label value is AKSUbuntu-2204gen2containerd-202401.17.1

from acr.

juliusl avatar juliusl commented on July 18, 2024

@awalford16 so good news, I figured out the issue and I have a fix. I'm working on the release, so should be about a week or two for it to make it's way upstream.

from acr.

juliusl avatar juliusl commented on July 18, 2024

@awalford16 Hey there just to close up the loop. The fix has been rolled out to all AKS regions for about a week or two, are you able to update your node images and give it a try?

from acr.

awalford16 avatar awalford16 commented on July 18, 2024

Thanks @juliusl! Looks like it is working on my end now. For confirmation these are the versions on my nodes: 5.15.0-1061-azure and containerd://1.7.15-1

from acr.

tolga-hmcts avatar tolga-hmcts commented on July 18, 2024

Has the fix (#739) been distributed to UK South? And how can I rollback "az aks nodepool update --enable-artifact-streaming"?

from acr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.