Code Monkey home page Code Monkey logo

aml-kubernetes's Introduction

Configure Kubernetes cluster for Azure Machine Learning

As part of Azure Machine Learning (AzureML) service capabilities, AzureML Kubernetes extends AzureML service capabilities seamlessly from Azure cloud setting to any infrastructure across on-premises, multicloud, and the edge. With a simple AzureML extension deployment, you can instantly onboard your teams of ML professionals with productivity tools for full ML lifecycle, and have access to both Azure managed compute and customer managed Kubernetes anywhere. Your teams of ML professionals are flexible to build, train, and deploy models wherever and whenever business requires so and get consistent ML experience across different infrastructures.

You can easily bring AzureML capabilities to your Kubernetes cluster from cloud or on-premises by deploying AzureML extension.

This repository is intended to serve as an information hub for customers and partners who are interested in Azure Machine Learning anywhere with Kubernetes. Use this repository for onboarding and testing instructions as well as an avenue to provide feedback, issues, enhancement requests and stay up to date as the product evolves.

Why use Azure Machine Learning Kubernetes?

AzureML Kubernetes is customer fully configured and managed compute for machine learning. It can be used as both training compute target and inference compute target. It provides the following benefits:

  • Harness existing heterogeneous or homogeneous Kubernetes cluster, with CPUs or GPUs.
  • Share the same Kubernetes cluster in multiple AzureML Workspaces across region.
  • Use the same Kubernetes cluster for different machine learning purposes, including model training, batch scoring, and real-time inference.
  • Secure network communication between the cluster and cloud via Azure Private Link and Private Endpoint.
  • Isolate team projects and machine learning workloads with Kubernetes node selector and namespace.
  • Target certain types of compute nodes and CPU/Memory/GPU resource allocation for training and inference workloads.
  • Connect with custom data sources for machine learning workloads using Kubernetes PV and PVC.

Prerequisites

  1. A Kubernetes cluster up and running - We recommend minimum of 4 vCPU cores and 8GB memory, around 2 vCPU cores and 3GB memory will be used by Azure Arc agent and AzureML extension components.
  2. Other than Azure Kubernetes Services (AKS) cluster in Azure, you must connect your Kubernetes cluster to Azure Arc. Please follow instructions in connect existing Kubernetes cluster to Azure Arc.
    • if you have Azure RedHat OpenShift Service (ARO) cluster or OpenShift Container Patform (OCP) cluster, follow additional prerequisite step here before AzureML extension deployment.
    • if you have AKS cluster in Azure, Azure Arc connection is not required.
  3. Cluster running behind an outbound proxy server or firewall needs additional network configurations. Fulfill the network requirements
  4. Install or upgrade Azure CLI to version >=2.16.0
  5. Install Azure CLI extension k8s-extension (version>=1.2.2) by running az extension add --name k8s-extension

Getting started

Getting started with AzureML anywhere is easy with following steps:

  1. Deploy AzureML extension
  2. Attach Kubernetes cluster to AzureML workspace and create a compute target
  3. Train image classification model with AzureML CLI v2
  4. Deploy a trained image classification model
  5. Troubleshooting guide

Supported AzureML built-in features and unique features for Kubernetes

AzureML anywhere essentially brings a new compute target to Azure Machine Learning service. With this new Kubernetes compute target, you can use existing AzureML tools and service capabilities to build, train, and deploy model on Kubernetes cluster anywhere. This seamless integration supports many AzureML built-in features , and existing AzureML examples can run on Kubernetes cluster with a smple change of compute target name.

In addition to many built-in AzureML features support, AzureML anywhere also supports following unique features leveraging native Kubernetes capability:

Supported Kubernetes distributions and locations

  • Azure Kubernetes Services (AKS)
  • AKS Engine
  • AKS on Azure Stack HCI
  • Azure RedHat OpenShift Service (ARO)
  • OpenShift Container Platform (OCP)
  • Google GKE
  • Canonical Kubernetes Distribution
  • Amazon EKS
  • Kind
  • K3s-Lightweight Kubernetes
  • Azure location availability: East US, East US 2, South Central US, West US 2, Australia East, Southeast Asia, North Europe, UK South, West Europe, West Central US, Central US, North Central US, West US, Korea Central, France Central

Supported Kubernetes version

Kubernetes clusters installing AzureML extension have a version support window of "N-2", that is aligned with Azure Kubernetes Service (AKS) version support policy, where 'N' is the latest GA minor version of Azure Kubernetes Service.

For example, if AKS introduces 1.20.a today, versions 1.20.a, 1.20.b, 1.19.c, 1.19.d, 1.18.e, and 1.18.f are supported.

If customers are running an unsupported Kubernetes version, they will be asked to upgrade when requesting support for the cluster. Clusters running unsupported Kubernetes releases are not covered by the AzureML extension support policies.

Additional resources

Get more examples

All AzureML examples can be found in https://github.com/Azure/azureml-examples.git.

For any AzureML example, you only need to update the compute target name to your own Kubernetes compute target, then you are all done.

Support

We are always looking for feedback on our current experiences and what we should work on next. If there is anything you would like us to prioritize, please feel free to suggest so via our GitHub Issue Tracker. You can submit a bug report, a feature suggestion or participate in discussions.

Or reach out to us: [email protected] if you have any questions or feedback.

Activities

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Impressions

Disclaimer

The lifecycle management (health, kubernetes version upgrades, security updates to nodes, scaling, etc.) of the AKS or Arc enabled Kubernetes cluster is the responsibility of the customer.

For AKS, read what is managed and what is shared responsibility here

All preview features are available on a self-service, opt-in basis and are subject to breaking design and API changes. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty.

aml-kubernetes's People

Contributors

adressel avatar adrosa avatar bozhong68 avatar cloga avatar clomeli-ms avatar dans-msft avatar ernani avatar frcai avatar gkaleta avatar henry-zeng avatar iamlost127 avatar jiaochenlu avatar joaocc avatar lisongshan007 avatar martin-jia avatar microsoftopensource avatar mjaow avatar msftcoderdjw avatar muzisakura avatar penorouzi avatar richardzhaow avatar richeney avatar sauryadas avatar snowei avatar wenshangmin avatar yanrez avatar youhuatu-yh avatar yuyue9284 avatar zetiaatgithub avatar zhong-j avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aml-kubernetes's Issues

AML Designer sample pipeline failed with File Not Found error

Repro steps:

  1. Open Designer
  2. Open the "Image Classification using DenseNet" sample
  3. Set one of the attached AKS as compute target
  4. Submit pipeline

The node "Convert to Image Directory" will fail:
image

Error message:

2020-11-19 02:38:04 [job-b999f2db-bb56-432f-87fc-dc27f4fade34-kj7mt, unknown_ip] FileNotFoundError: [Errno 2] No such file or directory: '$AZ_CMAKS_JOB_MOUNT_ROOT/azureml_globaldatasets/Images/Animals_Images_Dataset'

Attach Cluster Fails on AKS with NVidia installed

When trying to attach AKS cluster that has GPU nodepool and NVIDIA DaemonSet installed as per MSFT docs https://docs.microsoft.com/en-us/azure/aks/gpu-cluster#install-nvidia-device-plugin
Getting following error:

Error: rendered manifests contain a resource that already exists. Unable to continue with install: DaemonSet "nvidia-device-plugin-daemonset" in namespace "kube-system" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "azureml-connector"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "azureml"

Uninstalling NVIDIA DS and reattaching cluster works

Fluent bit error in the new extension installation

sauryadas@Sauryas-MacBook-Pro ๎‚ฐ ~/Downloads ๎‚ฐ k get pods -n azureml
NAME READY STATUS RESTARTS AGE
aml-mpi-operator-74966db5d8-gpdl8 1/1 Running 0 6m41s
aml-operator-fbf865974-tjhtf 1/1 Running 0 6m41s
aml-pytorch-operator-d9b8c4c75-74st2 1/1 Running 0 6m41s
aml-tf-job-operator-7c5848bd4c-clf4k 1/1 Running 0 6m41s
azureml-connector-admission-59bdc7489-xmxmt 1/1 Running 0 6m42s
azureml-connector-admission-init-qdmtn 0/1 Completed 0 6m42s
azureml-connector-controllers-7c6fc78b6b-rwjf9 1/1 Running 0 6m42s
azureml-connector-kube-state-metrics-75ccc8ddf6-c2wr6 1/1 Running 0 6m42s
azureml-connector-scheduler-7845fc897b-pxj94 1/1 Running 0 6m41s
fluent-bit-4x67b 0/1 CrashLoopBackOff 4 6m41s
fluent-bit-6z8f4 0/1 CrashLoopBackOff 5 6m41s
fluent-bit-spdth 0/1 CrashLoopBackOff 5 6m42s
job-exporter-bstv2 1/1 Running 0 6m41s
job-exporter-vgjd5 1/1 Running 0 6m42s
job-exporter-zwg7z 1/1 Running 0 6m41s
metrics-controller-manager-6bf8446d67-6llb6 2/2 Running 0 6m42s
prom-operator-5679ff7c7-cprf7 2/2 Running 0 6m42s
prometheus-prom-prometheus-0 3/3 Running 1 6m29s
relay-server-7b4d56cd56-kfsp2 1/1 Running 0 6m40s
rest-server-c5b987fcf-nsmlh 1/1 Running 0 6m40s
sauryadas@Sauryas-MacBook-Pro ๎‚ฐ ~/Downloads ๎‚ฐ k logs pod/fluent-bit-4x67b -n azureml
Fluent Bit v1.7.1

  • Copyright (C) 2019-2021 The Fluent Bit Authors
  • Copyright (C) 2015-2018 Treasure Data
  • Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
  • https://fluentbit.io

Error: Configuration file contains errors. Aborting

[2021/03/29 20:15:07] [ Error] File output-kusto-log-container.conf
[2021/03/29 20:15:07] [ Error] Error in line 8: Key has an empty value
sauryadas@Sauryas-MacBook-Pro ๎‚ฐ ~/Downloads ๎‚ฐ k describe pod/fluent-bit-4x67b -n azureml
Name: fluent-bit-4x67b
Namespace: azureml
Priority: 0
Node: k8s-agentpool1-37961520-vmss000000/10.240.0.34
Start Time: Mon, 29 Mar 2021 13:08:17 -0700
Labels: app=fluent-bit
controller-revision-hash=5678d68cf9
kubernetes.io/cluster-service=true
pod-template-generation=1
version=v1
Annotations: kopf.zalando.org/last-handled-configuration:
{"spec": {"volumes": [{"name": "varlog", "hostPath": {"path": "/var/log", "type": ""}}, {"name": "varlibdockercontainers", "hostPath": {"p...
kubernetes.io/psp: privileged
prometheus.io/path: /api/v1/metrics/prometheus
prometheus.io/port: 2020
prometheus.io/scrape: true
Status: Running
IP: 10.240.0.34
IPs:
IP: 10.240.0.34
Controlled By: DaemonSet/fluent-bit
Containers:
fluent-bit:
Container ID: docker://a28a68320b9cb7e130ccbca0ff79b3ef52f4d1a51d94ca91401da1d85835d866
Image: amlk8s.azurecr.io/public/azureml/amlk8s/docker/preview/fluent-bit-with-kusto:1.0.54
Image ID: docker-pullable://amlk8s.azurecr.io/public/azureml/amlk8s/docker/preview/fluent-bit-with-kusto@sha256:e6b3adaaa159f40e3cf0baff49eb85d1d008918b1b37b175bda9f2c5c59db6da
Port: 2020/TCP
Host Port: 2020/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 29 Mar 2021 13:15:07 -0700
Finished: Mon, 29 Mar 2021 13:15:07 -0700
Ready: False
Restart Count: 5
Requests:
cpu: 200m
memory: 1536Mi
Liveness: http-get http://127.0.0.1:2020/healthz delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HOST: 0.0.0.0
PORT: 2020
FLUENT_AZURE_BLOB_ADAPTER_PORT: 6200
Mounts:
/data/lib/docker/containers from datalibdockercontainers (ro)
/fluent-bit/etc/ from fluent-bit-config (rw)
/var/lib/docker/containers from varlibdockercontainers (ro)
/var/log from varlog (rw)
/var/run/secrets/kubernetes.io/serviceaccount from fluent-bit-token-zbj8m (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
varlog:
Type: HostPath (bare host directory volume)
Path: /var/log
HostPathType:
varlibdockercontainers:
Type: HostPath (bare host directory volume)
Path: /var/lib/docker/containers
HostPathType:
datalibdockercontainers:
Type: HostPath (bare host directory volume)
Path: /data/lib/docker/containers
HostPathType:
fluent-bit-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: fluent-bit-config
Optional: false
fluent-bit-token-zbj8m:
Type: Secret (a volume populated by a Secret)
SecretName: fluent-bit-token-zbj8m
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: :NoExecute
:NoSchedule
guarantee/restart:NoExecute
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message


Normal Scheduled default-scheduler Successfully assigned azureml/fluent-bit-4x67b to k8s-agentpool1-37961520-vmss000000
Normal Pulling 2m46s (x4 over 7m27s) kubelet, k8s-agentpool1-37961520-vmss000000 Pulling image "amlk8s.azurecr.io/public/azureml/amlk8s/docker/preview/fluent-bit-with-kusto:1.0.54"
Normal Pulled 2m46s (x4 over 6m4s) kubelet, k8s-agentpool1-37961520-vmss000000 Successfully pulled image "amlk8s.azurecr.io/public/azureml/amlk8s/docker/preview/fluent-bit-with-kusto:1.0.54"
Normal Created 2m45s (x4 over 6m) kubelet, k8s-agentpool1-37961520-vmss000000 Created container fluent-bit
Normal Started 2m45s (x4 over 6m) kubelet, k8s-agentpool1-37961520-vmss000000 Started container fluent-bit
Warning BackOff 2m17s (x10 over 3m38s) kubelet, k8s-agentpool1-37961520-vmss000000 Back-off restarting failed container

On Prem data support

Sample PoC plan

Create Azure ML Workspace
Create Existing AKS Cluster for Compute
Create HDFS Cluster or Simulate in AKS
Connect AKS Cluster to Azure ML Workspace using Attach Instance
Use Azure ML SDK to create the Model
Use Azure ML SDK to Submit Job to AKS Compute
Entrypoint Script Pulls data from HDFS Directly versus mounting to Blob Storage
Output Gets Dumped to Blob Storage
Use Azure ML Workspace to see Experiment and Run Results and Outputs

KubernetesCompute.attach failes on arc local k8s

from azureml.contrib.core.compute.kubernetescompute import KubernetesCompute

k8s_config = {
}

attach_config = KubernetesCompute.attach_configuration(
resource_id="/subscriptions/5763fde3-4253-480c-928f-dfe1e8888a57/resourceGroups/rg-arc/providers/Microsoft.Kubernetes/connectedClusters/arc-kind",
aml_k8s_config=k8s_config
)

compute_target = KubernetesCompute.attach(ws, "arccompute", attach_config)
compute_target.wait_for_completion(show_output=True)

error:

Creating...............
FailedProvisioning operation finished, operation "Failed"

ComputeTargetException Traceback (most recent call last)
in
10
11 compute_target = KubernetesCompute.attach(ws, "arccompute", attach_config)
---> 12 compute_target.wait_for_completion(show_output=True)

~/miniconda3/envs/k8s/lib/python3.7/site-packages/azureml/core/compute/compute.py in wait_for_completion(self, show_output, is_delete_operation)
573 'Current state is {}'.format(self.provisioning_state))
574 else:
--> 575 raise e
576
577 def _wait_for_completion(self, show_output):

~/miniconda3/envs/k8s/lib/python3.7/site-packages/azureml/core/compute/compute.py in wait_for_completion(self, show_output, is_delete_operation)
566 'state, current provisioning state: {}\n'
567 'Provisioning operation error:\n'
--> 568 '{}'.format(self.provisioning_state, error_response))
569 except ComputeTargetException as e:
570 if e.message == 'No operation endpoint':

ComputeTargetException: ComputeTargetException:
Message: Compute object provisioning polling reached non-successful terminal state, current provisioning state: Failed
Provisioning operation error:
{'code': 'InternalServerError', 'message': 'An internal server error occurred. If the problem persists, contact support.'}
InnerException None
ErrorResponse
{
"error": {
"message": "Compute object provisioning polling reached non-successful terminal state, current provisioning state: Failed\nProvisioning operation error:\n{'code': 'InternalServerError', 'message': 'An internal server error occurred. If the problem persists, contact support.'}"
}
}

my status of arc+aml k8s nodes:
(k8s) azeltov@DESKTOP-D59GBAK:/git/azure-arc-kubernetes-preview$ az k8s-extension create --sub 5763fde3-4253-480c-928f-dfe1e8888a57 -g rg-arc -c arc-kind --cluster-type connectedClusters --extension-type Microsoft.AzureML.Kubernetes -n azureml-kubernetes-connector --release-train preview --config enableTraining=True^C
(k8s) azeltov@DESKTOP-D59GBAK:
/git/azure-arc-kubernetes-preview$ kubectl get pods -n azureml
NAME READY STATUS RESTARTS AGE
aml-mpi-operator-847689c694-69z7x 1/1 Running 0 3m7s
aml-operator-55bb7d4784-jcqnx 1/1 Running 0 3m7s
aml-pytorch-operator-7fd644b97c-bkffn 1/1 Running 0 3m7s
aml-tf-job-operator-7c5848bd4c-ckgn9 1/1 Running 0 3m7s
azureml-kubernetes-connector-admission-78f9b654b6-g67t4 1/1 Running 0 3m7s
azureml-kubernetes-connector-admission-init-tv498 0/1 Completed 0 3m7s
azureml-kubernetes-connector-controllers-547945c795-shtnt 1/1 Running 0 3m7s
azureml-kubernetes-connector-scheduler-746448ff9d-khhzt 1/1 Running 0 3m7s
compute-exporter-d5866df74-zqq7c 1/1 Running 0 3m7s
job-exporter-lnbzf 1/1 Running 0 3m7s
metric-reporter-6756b7dbf8-6lvp6 1/1 Running 0 3m7s
prometheus-deployment-696b5c8bf8-dt554 1/1 Running 0 3m7s
relay-server-5f89db5967-54bdp 1/1 Running 0 3m7s
rest-server-6b76db489c-vl6mz 1/1 Running 0 3m6s

my extension versions:

(k8s) azeltov@DESKTOP-D59GBAK:~/git/azure-arc-kubernetes-preview$ az extension list -o table
Experimental ExtensionType Name Path Preview Version


False whl azure-cli-ml /home/azeltov/.azure/cliextensions/azure-cli-ml False 1.25.0
False whl connectedk8s /home/azeltov/.azure/cliextensions/connectedk8s False 0.3.8
False whl k8s-extension /home/azeltov/.azure/cliextensions/k8s-extension True 0.1PP.15

All AML system pods in a single namespace instead of 5 different ones

Currently we are deploying into kubeflow, default, volcano-system, gpu-resources, mpi-operator. This might confuse customers. Especially the ones being deployed to kubeflow leading them to think we support kubeflow. Maybe we should have one aml-system namespace and deploy all of the system pods in that namespace ?

Sauryas-MacBook-Pro:kf-test sauryadas$ k get po --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default blobfuse-flexvol-installer-29mv4 1/1 Running 0 2m49s
default blobfuse-flexvol-installer-kfkvv 1/1 Running 0 2m49s
default blobfuse-flexvol-installer-rh2qf 1/1 Running 0 2m49s
default cmaks-init-job-lq8gk 1/1 Running 0 2m48s
default jobcontroller-7fd99f5fd8-cnwvp 1/1 Running 0 51s
gpu-resources nvidia-device-plugin-daemonset-bshpv 1/1 Running 0 2m49s
gpu-resources nvidia-device-plugin-daemonset-qrdmj 1/1 Running 0 2m49s
gpu-resources nvidia-device-plugin-daemonset-zn2zg 1/1 Running 0 2m49s
kube-system coredns-544d979687-26mwn 1/1 Running 0 9m13s
kube-system coredns-544d979687-5drvm 1/1 Running 0 7m56s
kube-system coredns-autoscaler-78959b4578-vfph4 1/1 Running 0 9m8s
kube-system dashboard-metrics-scraper-5f44bbb8b5-xrmq2 1/1 Running 0 9m10s
kube-system k8s-host-device-plugin-daemonset-c4hlv 1/1 Running 0 51s
kube-system k8s-host-device-plugin-daemonset-lblz7 1/1 Running 0 51s
kube-system k8s-host-device-plugin-daemonset-n5rf7 1/1 Running 0 51s
kube-system kube-proxy-dtfkb 1/1 Running 0 8m9s
kube-system kube-proxy-f96qx 1/1 Running 0 8m8s
kube-system kube-proxy-wn9vg 1/1 Running 0 8m8s
kube-system kubernetes-dashboard-785654f667-mqbpt 1/1 Running 1 9m10s
kube-system metrics-server-85c57978c6-2nq5s 1/1 Running 1 9m13s
kube-system tunnelfront-6d4945497f-2z97c 2/2 Running 0 9m8s
kubeflow application-controller-stateful-set-0 1/1 Running 0 92s
kubeflow pytorch-operator-6f978f8d8d-tz5x6 1/1 Running 0 92s
kubeflow tf-job-operator-649ff7bd99-6g9rx 1/1 Running 0 92s
mpi-operator mpi-operator-6d8ff5667f-bqv9q 1/1 Running 0 92s
volcano-system volcano-admission-76687cd8d9-x8gcg 1/1 Running 0 92s
volcano-system volcano-admission-init-mvbgd 0/1 Completed 0 92s
volcano-system volcano-controllers-846b85cf85-v6726 1/1 Running 0 92s
volcano-system volcano-scheduler-64696c5cd9-qwv8v 1/1 Running 0 92s

on-prem very low performance with dataset.as_mount() method

Using the sample notebook 002-SciKitLearn

Experienced very slow performance with .as_mount() method.

A possible workaround:
Using .as_download() methos instead the pipeline on prem bare metal cluster, the required time was drastically reduced from 34mins to 3 mins!

Below the original snippet:

from azureml.core import ScriptRunConfig
args = ['--data-folder', mnist_file_dataset.as_mount(), '--regularization', 0.5]
src = ScriptRunConfig(source_directory=script_folder,
                      script='train.py', 
                      arguments=args,
                      compute_target=compute_target,
                      environment=env)

Below the code with as_download() method that solved the performance issue.

from azureml.core import ScriptRunConfig
args = ['--data-folder', mnist_file_dataset.as_download(), '--regularization', 0.5]
src = ScriptRunConfig(source_directory=script_folder,
                      script='train.py', 
                      arguments=args,
                      compute_target=compute_target,
                      environment=env)

Chainer estimator fails on CMAKS compute

Repro: run the notebook in notebooks/how-to-use-azureml/ml-frameworks/chainer/training/distributed-chainer, replacing the compute with a CMAKS compute cluster with two GPU nodes.

Run fails with this error:

UserError: User program failed with RuntimeError: NCCL is not available. Please confirm that NCCL is enabled in CuPy.

Here's the more detailed form from the logs:
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] Traceback (most recent call last):
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] File "train_mnist.py", line 125, in
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] main()
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] File "train_mnist.py", line 57, in main
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] comm = chainermn.create_communicator(args.communicator)
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] File "/azureml-envs/azureml_8e1fdf2e02bb65137f880870de828f5e/lib/python3.6/site-packages/chainermn/communicators/init.py", line 91, in create_communicator
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] return NonCudaAwareCommunicator(mpi_comm=mpi_comm)
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] File "/azureml-envs/azureml_8e1fdf2e02bb65137f880870de828f5e/lib/python3.6/site-packages/chainermn/communicators/non_cuda_aware_communicator.py", line 17, in init
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] 'NCCL is not available. '
2020-08-20 19:58:17 [job-chainer-distr-1597952630-a34666f9-launcher-h8c8l, unknown_ip] RuntimeError: NCCL is not available. Please confirm that NCCL is enabled in CuPy.

Note that distributed Chainer runs always use MPI as a backend.

By comparison, this estimators runs just fine against a standard compute cluster with two NC6 nodes.

Fail to attach a new AKS cluster to AML

I am using the following scenario:
1). Create a totally fresh AKS with all default settings using the portal (just changed the default VM size to Standard_D8s_v3)
2). Try to attach the cluster (default agentpool with 3 nodes, but any random pool even GPU one has the same issue) using the Studio or SDK.

The process fails during Kubeflow installation step with the following error message (in Studio):
2 mins ago [Pod] [cmaks-init-job-fxkn9] Scheduled: Successfully assigned azureml/cmaks-init-job-fxkn9 to aks-agentpool-33378424-vmss0000022 mins ago [Job] [cmaks-init-job] SuccessfulCreate: Created pod: cmaks-init-job-fxkn92 mins ago [Pod] [cmaks-init-job-fxkn9] Pulling: Pulling image "mcr.microsoft.com/azureml/cmk8s/agent-setup:unify-azureml-setup-0922"2 mins ago [Pod] [cmaks-init-job-fxkn9] Pulled: Successfully pulled image "mcr.microsoft.com/azureml/cmk8s/agent-setup:unify-azureml-setup-0922"1 min ago [Pod] [cmaks-init-job-fxkn9] Created: Created container cmaks-init-job-container1 min ago [Pod] [cmaks-init-job-fxkn9] Started: Started container cmaks-init-job-container1 min ago [Job] [cmaks-init-job] BackoffLimitExceeded: Job has reached the specified backoff limit

Error message from AKS:
Agent installation failed due to exceptions: One or more errors occurred. (Kubernetes 'azureml: cmaks-init-job'. Failed to run job.);. Init job pod logs: Error: plugin "diff" exited with error Installation failed, failed module: kubeflow, line: 39 Modules installed/upgraded: Modules skipped:

Compute target throwing fqdn error with old SDK

With the old SDK, when i choose a compute target, i get the following error

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.contrib.core.compute.cmakscompute import CmAksCompute

# choose a name for your Kubernetes compute
compute_name = 'testDemo'
compute_target = ComputeTarget(workspace=ws, name=compute_name)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-124-4a5e650a48fc> in <module>
      5 # choose a name for your Kubernetes compute
      6 compute_name = 'testDemo'
----> 7 compute_target = ComputeTarget(workspace=ws, name=compute_name)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/core/_hooks.py in __new__(cls, workspace, name)
    156                 elif compute_type == child._compute_type:
    157                     compute_target = super(ComputeTarget, cls).__new__(child)
--> 158                     compute_target._initialize(workspace, compute_payload)
    159                     return compute_target
    160         else:

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/core/compute/cmakscompute.py in _initialize(self, workspace, obj_dict)
     77             if obj_dict['properties']['properties'] else None
     78         cluster_fqdn = obj_dict['properties']['properties']['clusterFqdn'] \
---> 79             if obj_dict['properties']['properties'] else None
     80 
     81         super(CmAksCompute, self)._initialize(compute_resource_id, name, location, compute_type, tags, description,

KeyError: 'clusterFqdn'

Need clearer error message for stopped AKS clusters

Some background:
AKS has a Stop/Start feature for clusters. Customers can stop a cluster temporarily, which scales down the node pools to 0 and removes the control plane. This stops billing, and is useful for customers that do not use clusters 24/7.

Creating an AKS cluster and attaching it to a AML workspace is working normally. However, if the cluster is stopped with "az aks stop" CLI command, and you then try to submit a job, you will get an error that does not explain the underlying issue.

With the cluster stopped, I got the following in the console on the experiment:
"ServiceError: Received 404 from a service request"

The notebook I was using showed:
{
"error": {
"code": "ServiceError",
"message": "Received 404 from a service request",
"target": "POST https://REDACTED.servicebus.windows.net/aml-k8s-jobs-connection/jobs",
"details": [
{
"code": "NotFound",
"message": "{"error":{"code":"EndpointNotFound","message":"There are no listeners connected for the endpoint. TrackingId:468fe46e-7ecb-4def-b983-87fd6b4b2381_G0, SystemTracker:REDACTED.servicebus.windows.net:aml-k8s-jobs-connection/jobs, Timestamp:2021-01-19T18:56:39"}}",
"messageParameters": {},
"details": []
}
]
},

This could be an issue if clusters are stopped to reduce billing when not in use, but then someone tries to run jobs without starting the cluster again. There should be a clearer error message, or some kind of validation to make sure the cluster is in a running state.

A problem with regex creating a cluster configuration

Steps:
1). I have created a cluster and use "Properties" tab to copy its resource id: /subscriptions/e0fb9168-280b-42ea-a2a4-e0afd7252a20/resourcegroups/sbaydachaks/providers/Microsoft.ContainerService/managedClusters/sbaydachakscl
2). I am trying to create a configuration using the following code:

attach_config = CmAksCompute.attach_configuration(
resource_id="/subscriptions/e0fb9168-280b-42ea-a2a4-e0afd7252a20/resourcegroups/sbaydachaks/providers/Microsoft.ContainerService/managedClusters/sbaydachakscl",
)

It leads to an exception due to "resourcegroup" because it's looking for the "resourceGroup".

ComputeTargetException: ComputeTargetException:
Message: Invalid resource_id provided /subscriptions/e0fb9168-280b-42ea-a2a4-e0afd7252a20/resourcegroups/sbaydachaks/providers/Microsoft.ContainerService/managedClusters/sbaydachakscl does not match for /subscriptions/[\w-.]+/resourceGroups/[\w-.]+/providers/Microsoft.ContainerService/managedClusters/[\w-.]+
InnerException None
ErrorResponse
{
"error": {
"message": "Invalid resource_id provided /subscriptions/e0fb9168-280b-42ea-a2a4-e0afd7252a20/resourcegroups/sbaydachaks/providers/Microsoft.ContainerService/managedClusters/sbaydachakscl does not match for /subscriptions/[\w\-
\.]+/resourceGroups/[\w\-\.]+/providers/Microsoft.ContainerService/managedClusters/[\w\-\.]+"
}
}

003-Distiributed TensorFlow with parameter server not working

003-Distiributed TensorFlow with parameter server work? I initially tried it with gpu set to 0 so it would run on CPU vs GPU, but it still failed looking for libcuda library. I spun up a new GPU nodepool in AKS based on Standard_NC6 which got past the errors and got everything into a running state, but it never completes. The parameter server starts up ok, and one of the workers starts up ok with no errors, but the 0 index worker keeps saying it is waiting for a response from a worker:

2021-01-28 14:33:25,918 [INFO] root: [/stdout-job-tf-distr-ps-aks-1611843855-90f5db13-worker-0.log] 2021-01-28 14:33:24.599056: I tensorflow/core/distributed_runtime/master.cc:267] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0

Cancelled TF parameter server job does not terminate scheduled jobs

Jobs stuck in 'Terminating' state forever

 sauryadas@Sauryas-MacBook-Pro ๎‚ฐ ~ ๎‚ฐ kpo -o wide
NAME                                                   READY   STATUS        RESTARTS   AGE   IP            NODE                                NOMINATED NODE   READINESS GATES
job-pytorch-distr-aks-1-1611906139-10cea9b8-master-0   0/2     Pending       0          16m   <none>        <none>                              <none>           <none>
job-pytorch-distr-aks-1-1611906139-10cea9b8-worker-0   0/2     Pending       0          16m   <none>        <none>                              <none>           <none>
job-pytorch-distr-aks-1-1611906139-10cea9b8-worker-1   0/2     Pending       0          16m   <none>        <none>                              <none>           <none>
job-pytorch-distr-aks-1-1611906139-10cea9b8-worker-2   0/2     Pending       0          16m   <none>        <none>                              <none>           <none>
job-tf-distr-ps-aks-1-1611904536-35eaa97e-ps-0         0/2     Terminating   0          60m   10.244.2.12   aks-agentpool-18184150-vmss000000   <none>           <none>
job-tf-distr-ps-aks-1-1611904536-35eaa97e-worker-0     0/2     Terminating   0          60m   10.244.1.12   aks-agentpool-18184150-vmss000002   <none>           <none>
job-tf-distr-ps-aks-1-1611904536-35eaa97e-worker-1     0/2     Terminating   0          60m   10.244.0.14   aks-agentpool-18184150-vmss000001   <none>           <none>

Pytorch Estimator fails on CMAKS compute when using NCCL or GLOO backend

When using NCCL or GLOO as a backend for Pytorch, the Estimator launched in this sample notebook fails.

This is the notebook in notebooks/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo
It fails with the error:

UserError: User program failed with SystemExit: 2

Here is the underlying cause of the error (from the logs):

Script type = None
usage: pytorch_mnist.py [-h] [--batch-size N] [--test-batch-size N]
[--epochs N] [--lr LR] [--momentum M] [--seed S]
[-j N] [--log-interval N] [--weight-decay W]
[--world-size WORLD_SIZE] [--dist-url DIST_URL]
[--dist-backend DIST_BACKEND] [--rank RANK]
[job-pytorch-distr-1597957737-1bd5202f-master-0, unknown_ip]

pytorch_mnist.py: error: argument --rank: invalid int value: ''

The lines of code that trigger this are here:

from azureml.train.dnn import PyTorch, Gloo
โ€‹
estimator = PyTorch(source_directory=project_folder,
script_params={"--dist-backend" : "gloo",
"--dist-url": "$AZ_BATCHAI_PYTORCH_INIT_METHOD",
"--rank": "$AZ_BATCHAI_TASK_INDEX",
"--world-size": 2},
compute_target=compute_target,
entry_script='pytorch_mnist.py',
node_count=2,
distributed_training=Gloo(),
use_gpu=True)

Notice that --rank is set equal to $AZ_BATCHAI_TASK_INDEX, which in this case is not set.

A similar problem will happen when using NCCL as the backend.

This runs correctly on a regular (Azure Batch-based) compute cluster. Likely the problem is that the variable above is not being set when this job is set up to run on Kubernetes through KubeFlow's PyTorch operator.

WARNING: to repro this you must use a cluster with at least three nodes (one more than the node_count you're requesting) (due to a separate bug).

Error installing new SDK version

pip install azureml-contrib-k8s --extra-index-url https://azuremlsdktestpypi.azureedge.net/CmAks-Compute-Test/D58E86006C6


Looking in indexes: https://pypi.org/simple, https://azuremlsdktestpypi.azureedge.net/CmAks-Compute-Test/D58E86006C6
Collecting azureml-contrib-k8s
  ERROR: Could not find a version that satisfies the requirement azureml-contrib-k8s (from versions: none)
ERROR: No matching distribution found for azureml-contrib-k8s
Note: you may need to restart the kernel to use updated packages.

Pipeline data process with PythonScriptStep not working with attached ARC compute

Steps to Reproduce:

I am using this notebook:

https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb

  1. attach an azure stack hub deployed Kubernetes cluster to an AML workspace

  2. replace the compute target in run_config and PythonScriptSteps with attached compute

  3. submit the pipeline

Then you will have the following issues:

  1. The step runs for the data process steps are always in "Quequed" state

  2. kubectl get pods on the cluster's master terminal always return no pods at all.

Report inner exception message when attaching cmaks through SDK

Return inner-exception message to user when attaching cmAKS through SDK, Today we return internal error

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
~/anaconda3/envs/amlk8_env/lib/python3.7/site-packages/azureml/core/compute/compute.py in _attach(workspace, name, attach_payload, target_class)
    458         try:
--> 459             resp.raise_for_status()
    460         except requests.exceptions.HTTPError:

~/anaconda3/envs/amlk8_env/lib/python3.7/site-packages/requests/models.py in raise_for_status(self)
    940         if http_error_msg:
--> 941             raise HTTPError(http_error_msg, response=self)
    942 

HTTPError: 500 Server Error: InternalServerError for url: https://management.azure.com//subscriptions/fe375bc2-9f1a-4909-ad0d-9319806d5e97/resourceGroups/cm-we-rg/providers/Microsoft.MachineLearningServices/workspaces/cm-we/computes/amlk8s?api-version=2020-02-02

During handling of the above exception, another exception occurred:

ComputeTargetException                    Traceback (most recent call last)
<ipython-input-9-be80e2c28477> in <module>
      1 # attach compute
----> 2 cmaks_target = CmAksCompute.attach(ws, compute_name, attach_config)

~/anaconda3/envs/amlk8_env/lib/python3.7/site-packages/azureml/core/compute/compute.py in attach(workspace, name, attach_configuration)
    430         """
    431         compute_type = attach_configuration._compute_type
--> 432         return compute_type._attach(workspace, name, attach_configuration)
    433 
    434     @staticmethod

~/anaconda3/envs/amlk8_env/lib/python3.7/site-packages/azureml/contrib/core/compute/cmakscompute.py in _attach(workspace, name, config)
    109         """
    110         attach_payload = CmAksCompute._build_attach_payload(config, workspace)
--> 111         return ComputeTarget._attach(workspace, name, attach_payload, CmAksCompute)
    112 
    113     @staticmethod

~/anaconda3/envs/amlk8_env/lib/python3.7/site-packages/azureml/core/compute/compute.py in _attach(workspace, name, attach_payload, target_class)
    462                                          'Response Code: {}\n'
    463                                          'Headers: {}\n'
--> 464                                          'Content: {}'.format(resp.status_code, resp.headers, resp.content))
    465         if 'Azure-AsyncOperation' not in resp.headers:
    466             raise ComputeTargetException('Error, missing operation location from resp headers:\n'

ComputeTargetException: ComputeTargetException:
	Message: Received bad response from Resource Provider:
Response Code: 500
Headers: {'Cache-Control': 'no-cache', 'Pragma': 'no-cache', 'Content-Length': '552', 'Content-Type': 'application/json; charset=utf-8', 'Expires': '-1', 'x-ms-failure-cause': 'service', 'Request-Context': 'appId=cid-v1:6a27ce65-5555-41a3-85f7-b7a1ce31fd6b', 'x-ms-response-type': 'error', 'x-ms-client-request-id': 'c97c5a5c-0135-4ea6-ac55-9e71e98050c7', 'x-ms-client-session-id': 'c4087eef-4e79-4172-b3f9-e4bebfa134cc', 'x-request-time': '0.246', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'x-ms-ratelimit-remaining-subscription-writes': '1199', 'x-ms-request-id': 'ad0307ca-5a3c-46b3-a773-32ca956a3a53', 'x-ms-correlation-request-id': 'ad0307ca-5a3c-46b3-a773-32ca956a3a53', 'x-ms-routing-request-id': 'WESTEUROPE:20200902T142254Z:ad0307ca-5a3c-46b3-a773-32ca956a3a53', 'X-Content-Type-Options': 'nosniff', 'Date': 'Wed, 02 Sep 2020 14:22:53 GMT', 'Connection': 'close'}
Content: b'{\n  "error": {\n    "code": "ServiceError",\n    "severity": null,\n    "message": "InternalServerError",\n    "messageFormat": null,\n    "messageParameters": null,\n    "referenceCode": null,\n    "detailsUri": null,\n    "target": null,\n    "details": [],\n    "innerError": null,\n    "debugInfo": null\n  },\n  "correlation": {\n    "operation": "2144718a81c40844b7584ded6ccef9cd",\n    "request": "cab10660ef341b40"\n  },\n  "environment": "westeurope",\n  "location": "westeurope",\n  "time": "2020-09-02T14:22:54.0686795+00:00",\n  "componentName": "account-rp"\n}'
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Received bad response from Resource Provider:\nResponse Code: 500\nHeaders: {'Cache-Control': 'no-cache', 'Pragma': 'no-cache', 'Content-Length': '552', 'Content-Type': 'application/json; charset=utf-8', 'Expires': '-1', 'x-ms-failure-cause': 'service', 'Request-Context': 'appId=cid-v1:6a27ce65-5555-41a3-85f7-b7a1ce31fd6b', 'x-ms-response-type': 'error', 'x-ms-client-request-id': 'c97c5a5c-0135-4ea6-ac55-9e71e98050c7', 'x-ms-client-session-id': 'c4087eef-4e79-4172-b3f9-e4bebfa134cc', 'x-request-time': '0.246', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'x-ms-ratelimit-remaining-subscription-writes': '1199', 'x-ms-request-id': 'ad0307ca-5a3c-46b3-a773-32ca956a3a53', 'x-ms-correlation-request-id': 'ad0307ca-5a3c-46b3-a773-32ca956a3a53', 'x-ms-routing-request-id': 'WESTEUROPE:20200902T142254Z:ad0307ca-5a3c-46b3-a773-32ca956a3a53', 'X-Content-Type-Options': 'nosniff', 'Date': 'Wed, 02 Sep 2020 14:22:53 GMT', 'Connection': 'close'}\nContent: b'{\\n  \"error\": {\\n    \"code\": \"ServiceError\",\\n    \"severity\": null,\\n    \"message\": \"InternalServerError\",\\n    \"messageFormat\": null,\\n    \"messageParameters\": null,\\n    \"referenceCode\": null,\\n    \"detailsUri\": null,\\n    \"target\": null,\\n    \"details\": [],\\n    \"innerError\": null,\\n    \"debugInfo\": null\\n  },\\n  \"correlation\": {\\n    \"operation\": \"2144718a81c40844b7584ded6ccef9cd\",\\n    \"request\": \"cab10660ef341b40\"\\n  },\\n  \"environment\": \"westeurope\",\\n  \"location\": \"westeurope\",\\n  \"time\": \"2020-09-02T14:22:54.0686795+00:00\",\\n  \"componentName\": \"account-rp\"\\n}'"
    }
}

When checking telemetry in Kusto

The Resource 'Microsoft.ContainerService/managedClusters/myaks-' under resource group 'myaks_rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix

TensorFlow with ScriptRunConfig fails on loading dataset

Running Distributed TF Job with ScriptRunConfig fails loading MNIST dataset
https://github.com/lenisha/AML-Kubernetes/blob/master/docs/sample-notebooks/001-Tensorflow/train-tensorflow-resume-training.ipynb

tf_env = Environment(name='tf-gpu-pandas')
# Ensure the required packages are installed (we need pandas and fus and Azure ML defaults)
packages = CondaDependencies.create(pip_packages=['azureml-core==1.20.0',
                                                  'azureml-defaults=1.20.0',
                                                  'tensorflow-gpu==2.2.0',
                                                  'azureml-dataprep[pandas,fuse]',
                                                  'azureml-dataset-runtime[fuse,pandas]'])
tf_env.python.conda_dependencies = packages
#register environment for re-use
#tf_env.register(workspace=ws)
args = ['--data-folder', dataset.as_named_input('mnist').as_mount() ]
src = ScriptRunConfig(source_directory=script_folder,
                      compute_target=compute_target,
                      arguments=args,
                      script='tf_mnist_with_checkpoint.py',                     
                      environment=tf_env)

error in Eval

[Yesterday 4:21 PM] Elena Neroslavskaya
    
Hi amlk8s-training
Trying to convert Estimator to ScriptRunConfig and run on AKS . TensorFlow notebook  https://github.com/lenisha/AML-Kubernetes/blob/master/docs/sample-notebooks/001-Tensorflow/train-tensorflow-resume-training.ipynb

tf_env = Environment(name='tf-gpu-pandas')
# Ensure the required packages are installed (we need pandas and fus and Azure ML defaults)
packages = CondaDependencies.create(pip_packages=['azureml-core==1.20.0',
                                                  'azureml-defaults=1.20.0',
                                                  'tensorflow-gpu==2.2.0',
                                                  'azureml-dataprep[pandas,fuse]',
                                                  'azureml-dataset-runtime[fuse,pandas]'])
tf_env.python.conda_dependencies = packages
#register environment for re-use
#tf_env.register(workspace=ws)
args = ['--data-folder', dataset.as_named_input('mnist').as_mount() ]
src = ScriptRunConfig(source_directory=script_folder,
                      compute_target=compute_target,
                      arguments=args,
                      script='tf_mnist_with_checkpoint.py',                     
                      environment=tf_env)

It fails on processing the mapping for dataset, would it have to be declared differently from estimator args?

Processing input mnist
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip] NFS dataset mount failed: invalid syntax (<string>, line 1)
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip] Traceback (most recent call last):
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip]   File "/cmaks_mnt/workspaceblobstore/azureml/akse-arc-tf-training1_1612473280_58095b4e/azureml-setup/context_managers.py", line 315, in redirect_dataset_to_nfs
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip]     dataset_source = eval(json.loads(repr(dataset))["source"][0])
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip]   File "<string>", line 1
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip]     http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip]         ^
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip] SyntaxError: invalid syntax
2021-02-04 21:15:02 [job-akse-arc-tf-training1-1612473280-58095b4e-rmmfl, unknown_ip] Initialize DatasetContextManager.


Editedโ€‹                  9 replies from you, Ada, and Adarsh . Press Enter to expand replies.                                            9 replies from you, Ada, and Adarsh               <https://teams.microsoft.com/l/message/19:[email protected]/1612473707055?tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47&amp;groupId=88aa174e-6310-4634-bfcb-5761e1a1190a&amp;parentMessageId=1612473707055&amp;teamName=ML Platform&amp;channelName=amlk8s-training&amp;createdTime=1612473707055>

ACTION REQUIRED: Microsoft needs this private repository to complete compliance info

There are open compliance tasks that need to be reviewed for your AML-Kubernetes repo.

Action required: 4 compliance tasks

To bring this repository to the standard required for 2021, we require administrators of this and all Microsoft GitHub repositories to complete a small set of tasks within the next 60 days. This is critical work to ensure the compliance and security of your Azure GitHub organization.

Please take a few minutes to complete the tasks at: https://repos.opensource.microsoft.com/orgs/Azure/repos/AML-Kubernetes/compliance

  • The GitHub AE (GitHub inside Microsoft) migration survey has not been completed for this private repository
  • No Service Tree mapping has been set for this repo. If this team does not use Service Tree, they can also opt-out of providing Service Tree data in the Compliance tab.
  • No repository maintainers are set. The Open Source Maintainers are the decision-makers and actionable owners of the repository, irrespective of administrator permission grants on GitHub.
  • Classification of the repository as production/non-production is missing in the Compliance tab.

You can close this work item once you have completed the compliance tasks, or it will automatically close within a day of taking action.

If you no longer need this repository, it might be quickest to delete the repo, too.

GitHub inside Microsoft program information

More information about GitHub inside Microsoft and the new GitHub AE product can be found at https://aka.ms/gim or by contacting [email protected]

FYI: current admins at Microsoft include @cloga, @Zhong-J, @sauryadas, @yuyue9284, @cowtalker, @Martin-Jia, @wenshangmin, @henry-zeng

Pytorch Distributed with NCCL Gloo Fails

Pytorch distributed job fails finalizing the script
https://github.com/lenisha/AML-Kubernetes/tree/master/docs/sample-notebooks/005-Distributed-Pytorch-with-Nccl-Gloo

2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] Done!
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] job-pytorch-distr-nccl-1612503130-4b919e08-worker-1:25:25 [0] NCCL INFO Bootstrap : Using [0]eth0:10.240.1.123<0>
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] job-pytorch-distr-nccl-1612503130-4b919e08-worker-1:25:25 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] job-pytorch-distr-nccl-1612503130-4b919e08-worker-1:25:25 [0] NCCL INFO NET/IB : No device found.
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] job-pytorch-distr-nccl-1612503130-4b919e08-worker-1:25:25 [0] NCCL INFO NET/Socket : Using [0]eth0:10.240.1.123<0>
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] 
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] job-pytorch-distr-nccl-1612503130-4b919e08-worker-1:25:25 [0] init.cc:981 NCCL WARN Invalid rank requested : 2/2
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] Starting the daemon thread to refresh tokens in background for process with pid = 25
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] 
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] 
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] The experiment failed. Finalizing run...
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] [2021-02-05T05:32:39.027701] TimeoutHandler __init__
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] [2021-02-05T05:32:39.027735] TimeoutHandler __enter__
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] Cleaning up all outstanding Run operations, waiting 300.0 seconds
2021-02-05 05:32:39 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] 2 items cleaning up...
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] Cleanup took 0.14803171157836914 seconds
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] [2021-02-05T05:32:39.289728] TimeoutHandler __exit__
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] Traceback (most recent call last):
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip]   File "pytorch_mnist.py", line 113, in <module>
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip]     model = torch.nn.parallel.DistributedDataParallel(model)
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip]   File "/azureml-envs/azureml_77ae6faafb422d20b955420f1d57d91e/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 333, in __init__
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip]     self.broadcast_bucket_size)
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip]   File "/azureml-envs/azureml_77ae6faafb422d20b955420f1d57d91e/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 549, in _distributed_broadcast_coalesced
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip]     dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:82, invalid argument, NCCL version 2.4.8
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] 
2021-02-05 05:32:42 [job-pytorch-distr-nccl-1612503130-4b919e08-worker-1, unknown_ip] + echo end

Similar to NVIDIA/nccl#352 blocked communication to alter the firewall rules to allow using eth0, or configure NCCL_SOCKET_IFNAME to use another interface which is not blocked.

Pipelline AutoML step not working with attached azure stack hub Kubernetes cluster

Steps to Reproduce:

I am using this notebook:

https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb

  1. attach an azure stack hub deployed Kubernetes cluster to an AML workspace

  2. Replace the compute target in AutoMLConfig for AutoML step in the notebook with the attached ARC compute

  3. Submit the pipeline, you will get an error like:

~\Anaconda3\envs\pythonProject\lib\site-packages\azureml\pipeline\steps\automl_step.py in _get_automl_settings(self, context)
377 def _get_automl_settings(self, context):
378
--> 379 self._automl_config._validate_config_settings(context._workspace)
380 fit_params = self._automl_config._get_fit_params()
381 user_settings = {k: v for (k, v) in self._automl_config.user_settings.items() if k not in fit_params}

~\Anaconda3\envs\pythonProject\lib\site-packages\azureml\train\automl\automlconfig.py in _validate_config_settings(self, workspace)
1913 else:
1914 # ensure vm size is set
-> 1915 self.user_settings['vm_type'] = all_compute_targets[compute_target].vm_size
1916
1917 is_timeseries = self.user_settings['is_timeseries']

AttributeError: 'ArcKubernetesCompute' object has no attribute 'vm_size'

Attach AML via ARC Extension to AKS-E on Stack

When running the Attach script we are getting an Error that namespace and nodeSelector is NULL.
If looks like the installation script is pointing at https://github.com/Azure/AML-Kubernetes/blob/89be538091ab34d3112cbdfe2e3ab9e06e2d8a33/docs/profile-config/profile-schema-v1.0.yaml
while it should be pointing at:
https://github.com/Azure/AML-Kubernetes/blob/89be538091ab34d3112cbdfe2e3ab9e06e2d8a33/docs/profile-config/profile-v1.0-sample-1.yaml

This can be viewed in ML workspace.
Looking into the logs of the ARC agent it shows a lot of info and kind = empty as well.

CMAKS SDKv1 installation failure

Trying to run sample 002-SciKitLearn and 001-Tensorflow both failing on standard Azure ML Compute Instance to install the SDK:

pip install --disable-pip-version-check --extra-index-url https://azuremlsdktestpypi.azureedge.net/azureml-contrib-k8s-preview/D58E86006C65 azureml-contrib-k8s

Requirement already satisfied: azure-core<2.0.0,>=1.0.0 in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from azure-identity<1.5.0,>=1.2.0->azureml-dataprep<2.9.0a,>=2.8.0a->azureml-dataset-runtime~=1.21.0->azureml-contrib-pipeline-steps->azureml-contrib-k8s) (1.9.0)
Requirement already satisfied: portalocker~=1.0; platform_system != "Windows" in /anaconda/envs/azureml_py36/lib/python3.6/site-packages (from msal-extensions~=0.2.2->azure-identity<1.5.0,>=1.2.0->azureml-dataprep<2.9.0a,>=2.8.0a->azureml-dataset-runtime~=1.21.0->azureml-contrib-pipeline-steps->azureml-contrib-k8s) (1.7.1)
ERROR: azureml-widgets 1.20.0 has requirement azureml-telemetry~=1.20.0, but you'll have azureml-telemetry 1.21.0 which is incompatible.
ERROR: azureml-train-core 1.21.0 has requirement azureml-core~=1.21.0, but you'll have azureml-core 1.20.0.post1 which is incompatible.
ERROR: azureml-train-automl 1.20.0 has requirement azureml-automl-core~=1.20.0, but you'll have azureml-automl-core 1.21.0 which is incompatible.
ERROR: azureml-train-automl 1.20.0 has requirement azureml-dataset-runtime[fuse,pandas]~=1.20.0, but you'll have azureml-dataset-runtime 1.21.0 which is incompatible.
ERROR: azureml-train-automl 1.20.0 has requirement azureml-train-automl-client~=1.20.0, but you'll have azureml-train-automl-client 1.21.0 which is incompatible.
ERROR: azureml-train-automl-runtime 1.20.0 has requirement azureml-automl-core~=1.20.0, but you'll have azureml-automl-core 1.21.0 which is incompatible.
ERROR: azureml-train-automl-runtime 1.20.0 has requirement azureml-dataset-runtime[fuse,pandas]~=1.20.0, but you'll have azureml-dataset-runtime 1.21.0 which is incompatible.
ERROR: azureml-train-automl-runtime 1.20.0 has requirement azureml-telemetry~=1.20.0, but you'll have azureml-telemetry 1.21.0 which is incompatible.
ERROR: azureml-train-automl-runtime 1.20.0 has requirement azureml-train-automl-client~=1.20.0, but you'll have azureml-train-automl-client 1.21.0 which is incompatible.
ERROR: azureml-train-automl-client 1.21.0 has requirement azureml-core~=1.21.0, but you'll have azureml-core 1.20.0.post1 which is incompatible.
ERROR: azureml-telemetry 1.21.0 has requirement azureml-core~=1.21.0, but you'll have azureml-core 1.20.0.post1 which is incompatible.
ERROR: azureml-sdk 1.20.0 has requirement azureml-dataset-runtime[fuse]~=1.20.0, but you'll have azureml-dataset-runtime 1.21.0 which is incompatible.
ERROR: azureml-sdk 1.20.0 has requirement azureml-train-automl-client~=1.20.0, but you'll have azureml-train-automl-client 1.21.0 which is incompatible.
ERROR: azureml-opendatasets 1.20.0 has requirement azureml-dataset-runtime[fuse,pandas]~=1.20.0, but you'll have azureml-dataset-runtime 1.21.0 which is incompatible.
ERROR: azureml-opendatasets 1.20.0 has requirement azureml-telemetry~=1.20.0, but you'll have azureml-telemetry 1.21.0 which is incompatible.
ERROR: azureml-opendatasets 1.20.0 has requirement scipy<=1.4.1,>=1.0.0, but you'll have scipy 1.5.2 which is incompatible.
ERROR: azureml-defaults 1.20.0 has requirement azureml-dataset-runtime[fuse]~=1.20.0, but you'll have azureml-dataset-runtime 1.21.0 which is incompatible.
ERROR: azureml-datadrift 1.20.0 has requirement azureml-dataset-runtime[fuse,pandas]~=1.20.0, but you'll have azureml-dataset-runtime 1.21.0 which is incompatible.
ERROR: azureml-datadrift 1.20.0 has requirement azureml-pipeline-core~=1.20.0, but you'll have azureml-pipeline-core 1.21.0 which is incompatible.
ERROR: azureml-datadrift 1.20.0 has requirement azureml-telemetry~=1.20.0, but you'll have azureml-telemetry 1.21.0 which is incompatible.
ERROR: azureml-contrib-reinforcementlearning 1.20.0 has requirement azureml-train-core~=1.20.0, but you'll have azureml-train-core 1.21.0 which is incompatible.
ERROR: azureml-contrib-pipeline-steps 1.21.0 has requirement azureml-core~=1.21.0, but you'll have azureml-core 1.20.0.post1 which is incompatible.
ERROR: azureml-contrib-gbdt 1.20.0 has requirement azureml-train-core~=1.20.0, but you'll have azureml-train-core 1.21.0 which is incompatible.
ERROR: azureml-contrib-dataset 1.20.0 has requirement azureml-dataset-runtime[fuse,pandas]~=1.20.0, but you'll have azureml-dataset-runtime 1.21.0 which is incompatible.
ERROR: azureml-cli-common 1.20.0 has requirement azureml-pipeline-core~=1.20.0, but you'll have azureml-pipeline-core 1.21.0 which is incompatible.
ERROR: azureml-cli-common 1.20.0 has requirement azureml-train-core~=1.20.0; python_version >= "3.5", but you'll have azureml-train-core 1.21.0 which is incompatible.
ERROR: azureml-automl-runtime 1.20.0 has requirement azureml-automl-core~=1.20.0, but you'll have azureml-automl-core 1.21.0 which is incompatible.
ERROR: azureml-automl-runtime 1.20.0 has requirement azureml-dataset-runtime[fuse,pandas]~=1.20.0, but you'll have azureml-dataset-runtime 1.21.0 which is incompatible.
Installing collected packages: azureml-telemetry, azureml-dataset-runtime, azureml-automl-core, azureml-train-automl-client

After which steps to detected Attached CMAKS Compute fail:

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.contrib.core.compute.arckubernetescompute import ArcKubernetesCompute

# choose a name for your Kubernetes compute
compute_name = 'amlgpu'
compute_target = ComputeTarget(workspace=ws, name=compute_name)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-7-06b32c94f471> in <module>
      1 from azureml.core.compute import ComputeTarget, AmlCompute
      2 from azureml.core.compute_target import ComputeTargetException
----> 3 from azureml.contrib.core.compute.arckubernetescompute import ArcKubernetesCompute
      4 
      5 # choose a name for your Kubernetes compute

ModuleNotFoundError: No module named 'azureml.contrib.core.compute.arckubernetescompute'

and

from azureml.contrib.core.compute.cmakscompute import CmAksCompute

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-3-902fa7fdc6bf> in <module>
----> 1 from azureml.contrib.core.compute.cmakscompute import CmAksCompute

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/core/__init__.py in <module>
      3 # ---------------------------------------------------------
      4 """This package include compute sub package."""
----> 5 from . import _hooks

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/core/_hooks.py in <module>
     24 from .compute.cmakscompute import CmAksCompute
     25 import re
---> 26 from azureml.pipeline.steps import MpiStep
     27 from azureml.pipeline.core._parallel_run_config_base import _ParallelRunConfigBase
     28 from azureml.pipeline.core import PipelineStep

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/__init__.py in <module>
     24 """
     25 from .adla_step import AdlaStep
---> 26 from .automl_step import AutoMLStep, AutoMLStepRun
     27 from .databricks_step import DatabricksStep
     28 from .data_transfer_step import DataTransferStep

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/steps/automl_step.py in <module>
     24 from azureml.pipeline.core.pipeline_output_dataset import PipelineOutputTabularDataset
     25 from azureml.train.automl import constants
---> 26 from azureml.train.automl._experiment_drivers import driver_utilities
     27 from azureml.train.automl._azureautomlsettings import AzureAutoMLSettings
     28 from azureml.train.automl._environment_utilities import modify_run_configuration

ModuleNotFoundError: No module named 'azureml.train.automl._experiment_drivers'
# compute is attached
print("compute targets after attach:\n")
for targetName in ws.compute_targets:
    print(targetName)
compute targets after attach:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-00d550797dfc> in <module>
      1 # compute is attached
      2 print("compute targets after attach:\n")
----> 3 for targetName in ws.compute_targets:
      4     print(targetName)

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/workspace.py in compute_targets(self)
    996         """
    997         return {
--> 998             compute_target.name: compute_target for compute_target in ComputeTarget.list(self)}
    999 
   1000     def get_default_compute_target(self, type):

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/core/_hooks.py in list_hook(workspace)
    230                         pass
    231                     else:
--> 232                         env_obj = child.deserialize(workspace, env)
    233                     break
    234             if env_obj:

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/core/compute/cmakscompute.py in deserialize(workspace, object_dict)
    284         CmAksCompute._validate_get_payload(object_dict)
    285         target = CmAksCompute(None, None)
--> 286         target._initialize(workspace, object_dict)
    287         return target
    288 

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/contrib/core/compute/cmakscompute.py in _initialize(self, workspace, obj_dict)
     71             if 'isAttachedCompute' in obj_dict['properties'] else None
     72         node_pool = obj_dict['properties']['properties']['nodePool'] \
---> 73             if obj_dict['properties']['properties'] else None
     74         nodes_count = obj_dict['properties']['properties']['nodesCount'] \
     75             if obj_dict['properties']['properties'] else None

KeyError: 'nodePool'

Attached AKS Cluster config:
{
"namespace": "training",
"nodeSelector": {
"training": "schedule"
}
}

Pytorch estimator run gets stuck in Queued state forever

Running a Pytorch Estimator (from one of the sample notebooks) against attached AKS compute causes the job to get Queued forever, even when sufficient resources to run the job are available on the cluster.

I used the sample notebook in notebooks/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo, modifying the code to target an attached AKS compute. My cluster has two GPU nodes (both NC6) and this example requests a two-way distributed run.

The estimator run goes into Queued state and stays there forever.
Looking deep into logs in the Kubernetes cluster, I see that the Volcano scheduler is repeatedly trying to schedule the job, but failing.

Both estimator runs in the sample notebook trigger the same behavior (both Ncci and Gloo backend).

Note: running a Pytorch job using the MPI backend (e.g. using Horovod) works correctly. This bug only affects Ncci and Gloo-backed Pytorch jobs.

[Regression and Workaround] AML Jobs failing in East US

Currently job submissions in the east us region are failing. Its a regression bug and will be fixed in the next roll out

Workaround
1. Create the workspace in west Europe instead of east US
2. enable arc agent on the SH cluster in east us

KeyError: 'nodePool' when trying to retrieve compute target for attached AKS

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.contrib.core.compute.cmakscompute import CmAksCompute

compute_name = 'test-aks'
compute_target = ComputeTarget(workspace=ws, name=compute_name)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-10-54e5cbe80dc4> in <module>
      5 # choose a name for your CMAKS compute
      6 compute_name = 'test-aks'
----> 7 compute_target = ComputeTarget(workspace=ws, name=compute_name)

~\Miniconda3\envs\amlk8s\lib\site-packages\azureml\contrib\core\_hooks.py in __new__(cls, workspace, name)
    155                 elif compute_type == child._compute_type:
    156                     compute_target = super(ComputeTarget, cls).__new__(child)
--> 157                     compute_target._initialize(workspace, compute_payload)
    158                     return compute_target
    159         else:

~\Miniconda3\envs\amlk8s\lib\site-packages\azureml\contrib\core\compute\cmakscompute.py in _initialize(self, workspace, obj_dict)
     71             if 'isAttachedCompute' in obj_dict['properties'] else None
     72         node_pool = obj_dict['properties']['properties']['nodePool'] \
---> 73             if obj_dict['properties']['properties'] else None
     74         nodes_count = obj_dict['properties']['properties']['nodesCount'] \
     75             if obj_dict['properties']['properties'] else None

KeyError: 'nodePool'

image

Cannot attach Azure Arc Kubernetes cluster

We want to use the AML Kubernetes Preview using an on-premise cluster (for the first proof-of-concept this is just a kind cluster) connected to Azure by means of Azure Arc.

If I try to connect the Azure Arc cluster to an AML workspace I get the error message

2021-01-22T14:36:03.7461476Z Creating...........
2021-01-22T14:36:04.0023562Z FailedProvisioning operation finished, operation "Failed"
2021-01-22T14:36:04.4579326Z Traceback (most recent call last):
2021-01-22T14:36:04.4580446Z   File "./attach.py", line 34, in <module>
2021-01-22T14:36:04.4581253Z     arc_target.wait_for_completion(show_output=True)
2021-01-22T14:36:04.4583166Z   File "/agent/_work/2/s/infrastructure/arcpoc/venv/lib/python3.6/site-packages/azureml/core/compute/compute.py", line 575, in wait_for_completion
2021-01-22T14:36:04.4583780Z     raise e
2021-01-22T14:36:04.4584891Z   File "/agent/_work/2/s/infrastructure/arcpoc/venv/lib/python3.6/site-packages/azureml/core/compute/compute.py", line 568, in wait_for_completion
2021-01-22T14:36:04.4586004Z     '{}'.format(self.provisioning_state, error_response))
2021-01-22T14:36:04.4590859Z azureml.exceptions._azureml_exception.ComputeTargetException: ComputeTargetException:
2021-01-22T14:36:04.4592493Z 	Message: Compute object provisioning polling reached non-successful terminal state, current provisioning state: Failed
2021-01-22T14:36:04.4593247Z Provisioning operation error:
2021-01-22T14:36:04.4594511Z {'code': 'InternalServerError', 'message': 'An internal server error occurred. If the problem persists, contact support.'}
2021-01-22T14:36:04.4595205Z 	InnerException None
2021-01-22T14:36:04.4595469Z 	ErrorResponse 
2021-01-22T14:36:04.4595645Z {
2021-01-22T14:36:04.4595805Z     "error": {
2021-01-22T14:36:04.4597011Z         "message": "Compute object provisioning polling reached non-successful terminal state, current provisioning state: Failed\nProvisioning operation error:\n{'code': 'InternalServerError', 'message': 'An internal server error occurred. If the problem persists, contact support.'}"
2021-01-22T14:36:04.4598033Z     }
2021-01-22T14:36:04.4598187Z }

However, I am not sure whether the problem actually lies with AML-Kubernetes or the Azure Arc Kubernetes cluster. I checked kubectl get pods --namespace azure-arc

NAME                                         READY   STATUS    RESTARTS   AGE
cluster-metadata-operator-57c6c76655-mnkxr   2/2     Running   0          114m
clusterconnect-agent-556cc7d5fd-m5p59        3/3     Running   16         114m
clusteridentityoperator-587f8cf48d-5dm84     3/3     Running   0          114m
config-agent-568877757c-tz6sk                2/3     Running   0          114m
controller-manager-7dbd7bb58-nqmqc           3/3     Running   0          114m
extension-manager-7b8779bd4c-x9mrc           3/3     Running   0          114m
flux-logs-agent-74489f7df9-dm599             2/2     Running   0          114m
metrics-agent-759f66665b-zvtjm               2/2     Running   0          114m
resource-sync-agent-6699d9b7cc-scqmp         3/3     Running   0          114m

and the many restarts of the clusterconnect agent were worrying, so I checked kubectl get events --namespace=azure-arc

LAST SEEN   TYPE      REASON      OBJECT                                      MESSAGE
2m7s        Warning   BackOff     pod/clusterconnect-agent-556cc7d5fd-m5p59   Back-off restarting failed container
7m5s        Normal    Synced      connectedcluster/clustermetadata            ConnectedCluster synced successfully
2m4s        Warning   Unhealthy   pod/config-agent-568877757c-tz6sk           Readiness probe failed: HTTP probe failed with statuscode: 500
  

The (redacted) log of the cluster connect agent kubectl logs -p clusterconnect-agent-556cc7d5fd-m5p59 clusterconnect-agent --namespace azure-arc >> log.txt is in the spoiler below

log.txt
Running ConnectProxy Agent
MICROSOFT SOFTWARE LICENSE TERMS
MICROSOFT Azure Arc for Kubernetes
__________________________________
This software is licensed to you as part of your or your company's subscription license for Microsoft Azure Services. You may only use the software with Microsoft Azure Services and subject to the terms and conditions of the agreement under which you obtained Microsoft Azure Services. If you do not have an active subscription license for Microsoft Azure Services, you may not use the software. Microsoft Azure Legal Information: https://azure.microsoft.com/en-us/support/legal/

{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"RelayConfigFetchStarting ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 14:59:54"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ListProxyConnectionDetailStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 14:59:55"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 14:59:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 14:59:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 14:59:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:00:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:01:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:02:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:03:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:09"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:19"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:29"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:39"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Error","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchFailed { message = Retrying token fetch in 00:00:10 seconds }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 76\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.<GetToken>b__4_0() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 50\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider) }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:49"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Information","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ClusterIdentityCRDTokenFetchStarted ","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:59"}
{"ArmId":"subscriptions/***/resourceGroups/***/providers/Microsoft.Kubernetes/connectedClusters/pocworkstation-arc-dev","Location":"westeurope","AgentName":"ConnectProxyAgent","Role":"ClusterConfigAgent","LogLevel":"Warning","Environment":"prod","LogType":"ConnectAgentTrace","CorrelationId":"","Message":"ThrowingException { source = , exception = System.InvalidOperationException: Status not populated\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 61\n   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)\n   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)\n   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider)\n   at Polly.Retry.RetryPolicy.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken)\n   at Polly.Policy.Implementation(Action`2 action, Context context, CancellationToken cancellationToken)\n   at Polly.Policy.Execute(Action`2 action, Context context, CancellationToken cancellationToken)\n   at Polly.Policy.Execute(Action action)\n   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.GetToken() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 52\n   at ServiceCaller.Caller.NotificationServiceCaller.FetchConnectionString() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/ServiceCaller/NotificationServiceCaller.cs:line 104\n   at ServiceCaller.Caller.NotificationServiceCaller.FetchData() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/ServiceCaller/NotificationServiceCaller.cs:line 48\n   at ConnectedProxyAgent.Program.FetchAndDumpConfig(Object data) in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/ConnectedProxyAgent/Program.cs:line 178\n   at ConnectedProxyAgent.Program.Main(String[] args) in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/ConnectedProxyAgent/Program.cs:line 78 }","AgentVersion":"0.2.28","AgentTimestamp":"01/22/2021 15:04:59"}
Unhandled exception. System.InvalidOperationException: Status not populated
   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ClusterIdentityCRDClient.GetTokenFromStatus() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ClusterIdentityCRDClient.cs:line 61
   at Polly.Policy.<>c__DisplayClass108_0.<Execute>b__0(Context ctx, CancellationToken ct)
   at Polly.Policy.<>c__DisplayClass138_0.<Implementation>b__0(Context ctx, CancellationToken token)
   at Polly.Retry.RetryEngine.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken, ExceptionPredicates shouldRetryExceptionPredicates, ResultPredicates`1 shouldRetryResultPredicates, Action`4 onRetry, Int32 permittedRetryCount, IEnumerable`1 sleepDurationsEnumerable, Func`4 sleepDurationProvider)
   at Polly.Retry.RetryPolicy.Implementation[TResult](Func`3 action, Context context, CancellationToken cancellationToken)
   at Polly.Policy.Implementation(Action`2 action, Context context, CancellationToken cancellationToken)
   at Polly.Policy.Execute(Action`2 action, Context context, CancellationToken cancellationToken)
   at Polly.Policy.Execute(Action action)
   at Microsoft.Azure.Arc.Kubernetes.Agent.TokenProvider.ManagedIdentityTokenProvider.GetToken() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/DataPlaneInteraction/msi/ManagedIdentityTokenProvider.cs:line 52
   at ServiceCaller.Caller.NotificationServiceCaller.FetchConnectionString() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/ServiceCaller/NotificationServiceCaller.cs:line 104
   at ServiceCaller.Caller.NotificationServiceCaller.FetchData() in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/ServiceCaller/NotificationServiceCaller.cs:line 48
   at ConnectedProxyAgent.Program.FetchAndDumpConfig(Object data) in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/ConnectedProxyAgent/Program.cs:line 178
   at ConnectedProxyAgent.Program.Main(String[] args) in /home/vsts/work/1/s/gopath/src/github.com/Azure/ClusterConfigurationAgent/ConnectedProxyAgent/ConnectedProxyAgent/Program.cs:line 78
./entrypoint.sh: line 8:     8 Aborted                 (core dumped) dotnet ConnectedProxyAgent.dll

So I am not sure who to contact for this: The Azure Arc Team, people maintaining connectedk8s or k8s_extensions or you guys.... Since it seems that @wmpauli already talked to you about this, I am going to start here and would appreciate if you could forward me to the correct people if the above doesn't make sense to you.

@wmpauli: Sorry for the delay in reporting this, it took me a while to get to last year's state again on this topic and some other things got in between

ERROR: Could not find a version that satisfies the requirement

When running this command:

pip install --disable-pip-version-check --extra-index-url https://azuremlsdktestpypi.azureedge.net/azureml-contrib-k8s-preview/D58E86006C65 azureml-contrib-k8s

I get this error:
ERROR: Could not find a version that satisfies the requirement azureml-dataprep-native<27.0.0,>=26.0.0 (from azureml-dataprep<2.7.0a,>=2.6.0a->azureml-dataset-runtime~=1.19.0->azureml-contrib-pipeline-steps->azureml-contrib-k8s) (from versions: none)
ERROR: No matching distribution found for azureml-dataprep-native<27.0.0,>=26.0.0 (from azureml-dataprep<2.7.0a,>=2.6.0a->azureml-dataset-runtime~=1.19.0->azureml-contrib-pipeline-steps->azureml-contrib-k8s)

I got this many times after trying to run the command. I cloned the repo and ran the command in the root of the directory in powershell as admin, then got the error

PipelineData output goes to a wrong folder

Looks like that there is something with blobfuse settings

1). I am trying to mount a folder in a blob to preserve output from a pipeline step:

output_data = PipelineData("output_test", ws.get_default_datastore())

test_step = PythonScriptStep(
name="test",
script_name="hello.py",
runconfig=run_config,
compute_target=cmaks_target,
arguments = ["--output_data", output_data],
outputs = [output_data],
allow_reuse=False,
)

2). It works pretty well and it allows me to create a hello.txt file in my step and store it in the blob. The only problem that my file has the following path: https://sbaydachaiwrks0355120996.blob.core.windows.net/azureml-blobstore-244ab81b-ef44-4e0a-bbb7-3bd2692fc30a/azureml/16fba9b5-0043-485a-be54-be6ef70604aa/$AZ_CMAKS_JOB_MOUNT_ROOT/workspaceblobstore/azureml/16fba9b5-0043-485a-be54-be6ef70604aa/output_test/hello.txt. ($AZ_CMAKS_JOB_MOUNT_ROOT/workspaceblobstore/azureml/16fba9b5-0043-485a-be54-be6ef70604aa - it's from my cluster that should not be here).
At the same time the log to the step still contains desirable path: {"Container":"azureml-blobstore-244ab81b-ef44-4e0a-bbb7-3bd2692fc30a","SasToken":null,"Uri":"wasbs://azureml-blobstore-244ab81b-ef44-4e0a-bbb7-3bd2692fc30a@sbaydachaiwrks0355120996.blob.core.windows.net/azureml/16fba9b5-0043-485a-be54-be6ef70604aa/output_test","Account":"sbaydachaiwrks0355120996","RelativePath":"azureml/16fba9b5-0043-485a-be54-be6ef70604aa/output_test","PathType":0,"AmlDataStoreName":"workspaceblobstore"}

Install.sh broken

  1. no step to create azureml workspace
    Sauryas-MacBook-Pro:installer sauryadas$ bash install.sh mcr.microsoft.com/azureml/cmk8s/agent-setup:replace-kubeconf-06-28 daemonset.apps/blobfuse-flexvol-installer created daemonset.apps/nvidia-device-plugin-daemonset created Error: uninstall: Release not loaded: install-job: release: not found Error: unknown flag: --create-namespace

  2. helm install install-job --set kube_conf="$kube_conf" --set image=$installer_image ./install-job-chart -n $namespace --atomic --create-namespace

Sauryas-MacBook-Pro:installer sauryadas$ bash install.sh mcr.microsoft.com/azureml/cmk8s/agent-setup:replace-kubeconf-06-28 daemonset.apps/blobfuse-flexvol-installer unchanged daemonset.apps/nvidia-device-plugin-daemonset unchanged Error: uninstall: Release not loaded: install-job: release: not found Error: create: failed to create: namespaces "azureml" not found

Tensorflow estimator using MPI/Horovod fails with error

I triggered this bug using an AKS cluster consisting of 2 GPU nodes (both NC-6).

I ran the example in the sample notebook in notebooks/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod, slightly modified to use an attached AKS compute as the target.

The run failed with the following error:

UserError: User program failed with ImportError: Traceback (most recent call last):
File "/azureml-envs/azureml_43d6f48c7a03663fa8798a18b70267e6/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/azureml-envs/azureml_43d6f48c7a03663fa8798a18b70267e6/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/azureml-envs/azureml_43d6f48c7a03663fa8798a18b70267e6/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/azureml-envs/azureml_43d6f48c7a03663fa8798a18b70267e6/lib/python3.6/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/azureml-envs/azureml_43d6f48c7a03663fa8798a18b70267e6/lib/python3.6/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

Key note: this error only occurs when running a distributed Tensorflow job using MPI (as this example does). Running a distributed Tensorflow job using the Parameter Server backend works correctly.

Attach Fails with Missing aml-volcano-admission-secret Secret and Missing "CertificateSigningRequest"

Create a new 1.19.3 AKS cluster and then tried to attach the compute via Azure ML Studio.

The azureml-connector-admission-... shows missing CertificateSigningRequest in the logs:
k logs azureml-connector-admission-init-m7rdc
creating certs in tmpdir /tmp/tmp.IikkgL
Generating RSA private key, 2048 bit long modulus (2 primes)
.+++++
.............................................................+++++
e is 65537 (0x010001)
certificatesigningrequest.certificates.k8s.io "azureml-connector-admission-service.azureml" deleted
certificatesigningrequest.certificates.k8s.io/azureml-connector-admission-service.azureml created
NAME AGE SIGNERNAME REQUESTOR
CONDITION
azureml-connector-admission-service.azureml 0s kubernetes.io/legacy-unknown system:serviceaccount:azureml:azureml-connector-admission Pending
No resources found
error: no kind "CertificateSigningRequest" is registered for version "certificates.k8s.io/v1" in scheme "k8s.io/kubernetes/pkg/kubectl/scheme/scheme.go:28"

Taking a look at the logs from the azureml-connector-admission job shows missing secret:
k describe po azureml-connector-admission-799865fc6-vv2gz
....
Events:
Type Reason Age From Message


Normal Scheduled 2m50s default-scheduler Successfully assigned azureml/azureml-connector-admission-799865fc6-vv2gz to aks-nodepool1-71059522-vmss000001
Warning FailedMount 47s kubelet Unable to attach or mount volumes: unmounted volumes=[admission-certs], unattached volumes=[admission-certs azureml-connector-admission-token-zj296]: timed out waiting for the condition
Warning FailedMount 42s (x9 over 2m50s) kubelet MountVolume.SetUp failed for volume "admission-certs" : secret "aml-volcano-admission-secret" not found

AML Operator installation fails on CPU nodes

Sauryas-MacBook-Pro:installer` sauryadas$ bash install.sh mcr.microsoft.com/azureml/cmk8s/agent-setup:replace-kubeconf-06-28
daemonset.apps/blobfuse-flexvol-installer unchanged
daemonset.apps/nvidia-device-plugin-daemonset unchanged
Error: uninstall: Release not loaded: install-job: release: not found
namespace "azureml" deleted
namespace/azureml created
NAME: install-job
LAST DEPLOYED: Mon Jul  6 10:56:00 2020
NAMESPACE: azureml
STATUS: deployed
REVISION: 1
TEST SUITE: None
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get pods --namespace azureml
NAME                   READY   STATUS             RESTARTS   AGE
cmaks-init-job-mdzhn   0/1     CrashLoopBackOff   1          10s
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get pods --namespace azureml
NAME                   READY   STATUS             RESTARTS   AGE
cmaks-init-job-mdzhn   0/1     CrashLoopBackOff   1          12s
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get pods --namespace azureml --watch
NAME                   READY   STATUS   RESTARTS   AGE
cmaks-init-job-mdzhn   0/1     Error    2          18s
cmaks-init-job-mdzhn   0/1     CrashLoopBackOff   2          29s
cmaks-init-job-mdzhn   0/1     Error              3          43s
cmaks-init-job-mdzhn   0/1     CrashLoopBackOff   3          55s
cmaks-init-job-mdzhn   0/1     Error              4          86s
cmaks-init-job-mdzhn   0/1     CrashLoopBackOff   4          98s
cmaks-init-job-mdzhn   0/1     Error              5          2m48s
cmaks-init-job-mdzhn   0/1     CrashLoopBackOff   5          3m2s
cmaks-init-job-mdzhn   0/1     Error              6          5m32s
cmaks-init-job-mdzhn   0/1     Terminating        6          5m32s
cmaks-init-job-mdzhn   0/1     Terminating        6          5m32s
cmaks-init-job-mdzhn   0/1     Terminating        6          5m34s
cmaks-init-job-mdzhn   0/1     Terminating        6          5m34s

Sauryas-MacBook-Pro:installer sauryadas$ kubectl logs pod/cmaks-init-job-w9zd7 --namespace azureml
required env var empty.

AML operator installation fails on GPU nodepool

Sauryas-MacBook-Pro:installer sauryadas$ bash install.sh mcr.microsoft.com/azureml/cmk8s/agent-setup:replace-kubeconf-06-28
daemonset.apps/blobfuse-flexvol-installer unchanged
daemonset.apps/nvidia-device-plugin-daemonset unchanged
Error: uninstall: Release not loaded: install-job: release: not found
secret "cmaks-image-pull-secret" deleted
secret "cmaks-image-pull-secret" deleted
namespace "azureml" deleted
namespace/azureml created
NAME: install-job
LAST DEPLOYED: Mon Jul  6 14:33:01 2020
NAMESPACE: azureml
STATUS: deployed
REVISION: 1
TEST SUITE: None
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get pods --namespace azureml
NAME                   READY   STATUS    RESTARTS   AGE
cmaks-init-job-sj9kl   1/1     Running   0          33s
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get all --namespace azureml
NAME                       READY   STATUS   RESTARTS   AGE
pod/cmaks-init-job-sj9kl   0/1     Error    2          53s

NAME                       COMPLETIONS   DURATION   AGE
job.batch/cmaks-init-job   0/1           54s        54s
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get all --namespace azureml
NAME                       READY   STATUS             RESTARTS   AGE
pod/cmaks-init-job-sj9kl   0/1     CrashLoopBackOff   2          67s

NAME                       COMPLETIONS   DURATION   AGE
job.batch/cmaks-init-job   0/1           68s        68s
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get pods --namespace azureml
NAME                   READY   STATUS             RESTARTS   AGE
cmaks-init-job-sj9kl   0/1     CrashLoopBackOff   2          72s
Sauryas-MacBook-Pro:installer sauryadas$ clear
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get crd
NAME                                    CREATED AT
healthstates.azmon.container.insights   2020-04-15T04:54:52Z
Sauryas-MacBook-Pro:installer sauryadas$ kubectl get pods --namespace azureml -o wide
NAME                   READY   STATUS             RESTARTS   AGE    IP           NODE                               NOMINATED NODE   READINESS GATES
cmaks-init-job-sj9kl   0/1     CrashLoopBackOff   3          115s   10.244.3.6   aks-gpupool1-31210500-vmss000001   <none>           <none>

Sauryas-MacBook-Pro:installer sauryadas$ kubectl logs pod/cmaks-init-job-sj9kl --namespace azureml
required env var empty.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.