Code Monkey home page Code Monkey logo

cloud-provider-alibaba-cloud's Introduction

Kubernetes Cloud Controller Manager for Alibaba Cloud

Thank you for visiting the cloud-provider-alibaba-cloud repository!

cloud-provider-alibaba-cloud is the external Kubernetes cloud controller manager implementation for Alibaba Cloud. Running cloud-provider-alibaba-cloud allows you build your kubernetes clusters leverage on many cloud services on Alibaba Cloud. You can read more about Kubernetes cloud controller manager here.

Development

Test project with command make test and Build an image with command make image

QuickStart

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

Testing

See more info in page Test

cloud-provider-alibaba-cloud's People

Contributors

andrewsykim avatar aoxn avatar bswang avatar cheyang avatar crazykev avatar cuericlee avatar damdo avatar ddbmh avatar devkanro avatar fedosin avatar gujingit avatar jia-jerry avatar joelspeed avatar jovizhangwei avatar k8s-ci-robot avatar lyt99 avatar mirake avatar mitingjin avatar openshift-ci-robot avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar racheljpg avatar radekmanak avatar ringtail avatar sunyuan3 avatar tengattack avatar xh4n3 avatar xichengliudui avatar yuzhiquan avatar

Watchers

 avatar  avatar

cloud-provider-alibaba-cloud's Issues

cloud-conf configmap expects accesskey and accesssecret to be base64

@elmiko @JoelSpeed
The installer for Alibaba is creating credentials and placing them in the cloud-conf configmap. The cloud-conf looks like this:

  cloud.conf: "{\n\t\"Global\": {\n\t\t\"kubernetesClusterTag\": \"\",\n\t\t\"nodeMonitorPeriod\":
    0,\n\t\t\"nodeAddrSyncPeriod\": 0,\n\t\t\"uid\": \"\",\n\t\t\"vpcid\": \"\",\n\t\t\"region\":
    \"us-east-1\",\n\t\t\"zoneid\": \"\",\n\t\t\"vswitchid\": \"\",\n\t\t\"clusterID\":
    \"test-jdbm2\",\n\t\t\"routeTableIDs\": \"\",\n\t\t\"serviceBackendType\": \"\",\n\t\t\"disablePublicSLB\":
    false,\n\t\t\"accessKeyID\": \"<redacted>\",\n\t\t\"accessKeySecret\":
    \"<redacted>\"\n\t}\n}\n"

The CCM will read the cloud-conf configmap once openshift/cluster-cloud-controller-manager-operator/pull/94 is fixed with the correct configmap and the code changes are merged.

I am seeing this error when attempting to run the CCM with --v=8 set in the CCM environment:

I0903 01:13:48.956363       6 alicloud.go:132] Alicloud: Try Accesskey and AccessKeySecret from config file.
E0903 01:13:48.956453       6 cloudprovider-alibaba-cloud.go:49] Run CCM error: verify ccm config: cloud provider could not be initialized: could not init cloud provider "alicloud": illegal base64 data at input byte 28

I was able to track that code down to this location and it appears the expectation here is that the accessKeyID and the accessKeySecret are base64 encoded.
https://github.com/openshift/cloud-provider-alibaba-cloud/blob/master/cloud-controller-manager/alicloud.go#L125-L137

Either the key needs to be written by the installer ( https://github.com/openshift/installer/pull/5018/files#diff-3c9c46884bade6b4d4d801ef3b69fdf08628e35dacc56deb4f1f5d67d45f1c24R9-R31 ) to base64 or this needs to just accept the value as is. This is currently blocking the CCM and allowing masters to achieve their Ready state.

Let me know what you think.

CCM uses "-mod readonly" when building

We have strict requirements on our build servers that do not allow any external library downloads. The -mod readonly setting here https://github.com/openshift/cloud-provider-alibaba-cloud/blob/master/Makefile#L32 forces dependencies to be downloaded during the build process. This behavior is strictly forbidden.

@gujingit Is this something we can change? Is there a reason why this is done in the upstream repository? We would like to ensure that all necessary files are in the vendor/ directory for the build process.

cc @elmiko

alicloud: unable to split instanceid and region from providerID

I am attempting to deploy openshift/cluster-cloud-controller-manager-operator#119 which deploys the Alibaba CCM. I have been placing debug statements in attempt to figure out why the CCM is not registering the master nodes in an OpenShift cluster.

The call is getting the name correctly from the v1.Node here https://github.com/openshift/cloud-provider-alibaba-cloud/blob/master/cloud-controller-manager/controller/node/controller.go#L604 and attempting to get the providerID

The error message I'm seeing is:

failed to get provider ID for node test-8bv8v-master0 at cloudprovider: failed to get instance ID from cloud provider: alicloud: unable to split instanceid and region from providerID, error unexpected providerID=test-8bv8v-master0

This is occurring here:
https://github.com/openshift/cloud-provider-alibaba-cloud/blob/master/cloud-controller-manager/instances.go#L88

Alibaba's code expects the providerID to be in a specific alicloud:// prefix or be in <region>.<node name> format.

We need some assistance in determining the expectation of the providerID variable in question.

@gujingit Would you mind providing some assistance on how we may proceed?

instanceid returned as providerID as incorrect format

While attempting to use the cloud controller manager I ran into some complications on what to use as the instance names. @gujingit was kind enough to show me the documentation on what is expected by the provider. For simplicity, I'll paste it here:

Update kubelet info with provider id info and restart kubelet: You should provide --hostname-override=${REGION_ID}.${INSTANCE_ID} --provider-id=${REGION_ID}.${INSTANCE_ID} arguments in all of your kubelet unit file. The format is ${REGION_ID}.${INSTANCE_ID}. See kubelet.service for more details.

When setting the instance name in this format, the code changes this providerID value to an instance.instanceID here:
https://github.com/openshift/cloud-provider-alibaba-cloud/blob/master/cloud-controller-manager/alicloud.go#L523



I'll attempt to explain the long call chain in detail.
nodeName=<region>.<instanceId>

The calls proceed as follows:
doAddCloudNode then calls
getProviderID then calls
cloudprovider.GetInstanceProviderID then calls
instances.InstanceID then calls
c.climgr.Instances().findInstancesByNodeName which calls
s.findInstanceByProviderID which takes the nodeName and converts it to a providerID. At this point providerID=<region>.<instanceID>. We then call nodeFromProviderID.
nodeFromProviderID This expects the providerID to be in the format of alicloud://<region>.<instanceID> or <region>.<instanceID>. This returns region, instanceid as separate values. These will be used to fetch the instance.

We then fetch the instance by using the region and the instanceID here:
s.getInstances
We then return the instance from findInstanceByProviderID to findInstanceByNodeName to
c.climgr.Instances().findInstancesByNodeName who returns the instance.InstanceId

This instance.InstanceId is no longer in the desired providerId format of the <region>.<instanceId> that is expected. The providerID is now set to the value of instance.InstaneID (e.g. i-0xii1bq9ydmbcpwxbfse). We then return to GetInstanceProviderID which prepends alicloud:// to the providerID.

ProviderID is now set to alicloud://<instanceId>. Notice there is no <REGION> in the providerID because it was created from instance.InstanceID. Continuing on...

The next call is that is made upon returning the original getProviderID is the call to getCloudInstance using the providerID set to alicloud://<instanceID>.
This calls ins.ListInstances with the providerID set to alicloud://<instanceID>.
ListInstances now calls nodeFromProviderID which is nodeFromProviderID with the providerID set to alicloud://<instanceID>. This function fails here because it does not include the region as part of the providerID.

--use-service-account-credentials flag does not appear to correctly populate kubeconfig

when starting the CCM with only the --use-service-account-credentials=true flag set for credentials it does not appear to accept the service account from the deployment for the kubeconfig.

for example, we are starting the CCM with this manifest:

kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: alibaba-cloud-controller-manager
  name: alibaba-cloud-controller-manager
  namespace: openshift-cloud-controller-manager
spec:
  replicas: 2
  selector:
    matchLabels:
      app: alibaba-cloud-controller-manager
  template:
    metadata:
      labels:
        app: alibaba-cloud-controller-manager
    spec:
      hostNetwork: true
      serviceAccountName: cloud-controller-manager
      priorityClassName: system-cluster-critical
      nodeSelector:
        node-role.kubernetes.io/master: ""
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchLabels:
                  app: alibaba-cloud-controller-manager
      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Exists
        - effect: NoExecute
          key: node.kubernetes.io/unreachable
          operator: Exists
          tolerationSeconds: 120
        - effect: NoExecute
          key: node.kubernetes.io/not-ready
          operator: Exists
          tolerationSeconds: 120
        - effect: NoSchedule
          key: node.cloudprovider.kubernetes.io/uninitialized
          operator: Exists
        - effect: NoSchedule
          key: node.kubernetes.io/not-ready
          operator: Exists
      containers:
        - command:
            - /bin/sh
            - -c
            - |
              #!/bin/sh
              set -o allexport
              if [[ -f /etc/kubernetes/apiserver-url.env ]]; then
                source /etc/kubernetes/apiserver-url.env
              fi
              exec /cloud-controller-manager \
              --address=127.0.0.1 \
              --allow-untagged-cloud=true \
              --leader-elect=true \
              --leader-elect-lease-duration=137s \
              --leader-elect-renew-deadline=107s \
              --leader-elect-retry-period=26s \
              --leader-elect-resource-namespace=openshift-cloud-controller-manager \
              --cloud-provider=alicloud \
              --use-service-account-credentials=true \
              --cloud-config=/etc/kubernetes/config/cloud-config.conf \
              --feature-gates=ServiceNodeExclusion=true \
              --configure-cloud-routes=false \
              --allocate-node-cidrs=false
          image: quay.io/openshift/origin-alibaba-cloud-controller-manager
          livenessProbe:
            failureThreshold: 8
            httpGet:
              host: 127.0.0.1
              path: /healthz
              port: 10258
              scheme: HTTP
            initialDelaySeconds: 15
            timeoutSeconds: 15
          name: cloud-controller-manager
          resources:
            requests:
              cpu: 200m
              memory: 50Mi
          volumeMounts:
            - mountPath: /etc/kubernetes/
              name: k8s
            - name: cloud-config
              mountPath: /etc/kubernetes/config
      volumes:
        - hostPath:
            path: /etc/kubernetes
          name: k8s
        - name: cloud-config
          configMap:
            name: cloud-conf
            items:
              - key: cloud.conf
                path: cloud-config.conf

of note are the following entries:

      serviceAccountName: cloud-controller-manager

and

             exec /cloud-controller-manager \
              --address=127.0.0.1 \
              --allow-untagged-cloud=true \
              --leader-elect=true \
              --leader-elect-lease-duration=137s \
              --leader-elect-renew-deadline=107s \
              --leader-elect-retry-period=26s \
              --leader-elect-resource-namespace=openshift-cloud-controller-manager \
              --cloud-provider=alicloud \
              --use-service-account-credentials=true \
              --cloud-config=/etc/kubernetes/config/cloud-config.conf \
              --feature-gates=ServiceNodeExclusion=true \
              --configure-cloud-routes=false \
              --allocate-node-cidrs=false

the service account is setup using this manifest.

when we run the controller though, we see this in the output

E0916 19:34:13.733843       1 reflector.go:178] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:kube-system:shared-informers" cannot list resource "endpoints" in API group "" at the cluster scope
E0916 19:34:18.256934       1 reflector.go:178] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:kube-system:shared-informers" cannot list resource "nodes" in API group "" at the cluster scope

which appears to indicate that the system:serviceacount:kube-system:shared-informers is being used for the kubeconfig user, instead of the expected cloud-controller-manager.

i looked at the code around building a client, https://github.com/openshift/cloud-provider-alibaba-cloud/blob/master/cmd/cloudprovider/app/ccm.go#L120 and it appears that the client function is using clientcmd.BuildConfigFromFlags(ccm.Master, ccm.Kubeconfig) to build the kubeconfig, but does not fall back to check if the service account credentials will be used if there is no kubeconfig or master specified.

the BuildConfigFromFlags function is used slightly differently in the client-go code (see https://github.com/kubernetes/cloud-provider/blob/master/options/options.go#L177), and in specific has a check to detect is the service account credentials should be used (see https://github.com/kubernetes/cloud-provider/blob/master/options/options.go#L197).

i think this may indicate that the client function will need to also check this flag in cases where no kubeconfig or master is specified.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.