Code Monkey home page Code Monkey logo

cluster-api-provider-gcp's Introduction

capicapi

Build Status Go Report Card


Kubernetes Cluster API Provider GCP

Kubernetes-native declarative infrastructure for GCP.

What is the Cluster API Provider GCP?

The Cluster API brings declarative Kubernetes-style APIs to cluster creation, configuration and management. The API itself is shared across multiple cloud providers allowing for true Google Cloud hybrid deployments of Kubernetes.

Documentation

Please see our book for in-depth documentation.

Quick Start

Checkout our Cluster API Quick Start to create your first Kubernetes cluster on Google Cloud Platform using Cluster API.


Support Policy

This provider's versions are compatible with the following versions of Cluster API:

Cluster API v1alpha3 (v0.3.x) Cluster API v1alpha4 (v0.4.x) Cluster API v1beta1 (v1.0.x)
Google Cloud Provider v0.3.x
Google Cloud Provider v0.4.x
Google Cloud Provider v1.0.x

This provider's versions are able to install and manage the following versions of Kubernetes:

Google Cloud Provider v0.3.x Google Cloud Provider v0.4.x Google Cloud Provider v1.0.x
Kubernetes 1.15
Kubernetes 1.16
Kubernetes 1.17
Kubernetes 1.18
Kubernetes 1.19
Kubernetes 1.20
Kubernetes 1.21
Kubernetes 1.22

Each version of Cluster API for Google Cloud will attempt to support at least two versions of Kubernetes e.g., Cluster API for GCP v0.1 may support Kubernetes 1.13 and Kubernetes 1.14.

NOTE: As the versioning for this project is tied to the versioning of Cluster API, future modifications to this policy may be made to more closely align with other providers in the Cluster API ecosystem.


Getting Involved and Contributing

Are you interested in contributing to cluster-api-provider-gcp? We, the maintainers and the community would love your suggestions, support and contributions! The maintainers of the project can be contacted anytime to learn about how to get involved.

Before starting with the contribution, please go through prerequisites of the project.

To set up the development environment, checkout the development guide.

In the interest of getting new people involved, we have issues marked as good first issue. Although these issues have a smaller scope but are very helpful in getting acquainted with the codebase. For more, see the issue tracker. If you're unsure where to start, feel free to reach out to discuss.

See also: Our own contributor guide and the Kubernetes community page.

We also encourage ALL active community participants to act as if they are maintainers, even if you don't have 'official' written permissions. This is a community effort and we are here to serve the Kubernetes community. If you have an active interest and you want to get involved, you have real power!

Office hours

  • Join the SIG Cluster Lifecycle Google Group for access to documents and calendars.
  • Participate in the conversations on Kubernetes Discuss
  • Provider implementers office hours (CAPI)
    • Weekly on Wednesdays @ 10:00 am PT (Pacific Time) on Zoom
    • Previous meetings: [ notes | recordings ]
  • Cluster API Provider GCP office hours (CAPG)
    • Monthly on first Thursday @ 09:00 am PT (Pacific Time) on Zoom
    • Previous meetings: [ notes|recordings ]

Other ways to communicate with the contributors

Please check in with us in the #cluster-api-gcp on Slack.

Github Issues

Bugs

If you think you have found a bug, please follow the instruction below.

  • Please give a small amount of time giving due diligence to the issue tracker. Your issue might be a duplicate.
  • Get the logs from the custom controllers and please paste them in the issue.
  • Open a bug report.
  • Remember users might be searching for the issue in the future, so please make sure to give it a meaningful title to help others.
  • Feel free to reach out to the community on slack.

Tracking new feature

We also have an issue tracker to track features. If you think you have a feature idea, that could make Cluster API provider GCP become even more awesome, then follow these steps.

  • Open a feature request.
  • Remember users might be searching for the issue in the future, so please make sure to give it a meaningful title to help others.
  • Clearly define the use case with concrete examples. Example: type this and cluster-api-provider-gcp does that.
  • Some of our larger features will require some design. If you would like to include a technical design in your feature, please go ahead.
  • After the new feature is well understood and the design is agreed upon, we can start coding the feature. We would love for you to code it. So please open up a WIP (work in progress) PR and happy coding!

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

cluster-api-provider-gcp's People

Contributors

alexander-demicev avatar cpanato avatar damdo avatar danil-grigorev avatar dependabot[bot] avatar detiber avatar dims avatar elmiko avatar enxebre avatar fiunchinho avatar frobware avatar ingvagabund avatar jayesh-srivastava avatar joelspeed avatar justinsb avatar k8s-ci-robot avatar kahun avatar kcoronado avatar michaelgugino avatar mkjelland avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar prajyot-parab avatar richardcase avatar richardchen331 avatar roberthbailey avatar samuelstuchly avatar sayantani11 avatar spew avatar vincepri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-api-provider-gcp's Issues

[Failing Test] (capg-conformance-main-ci-artifacts) No Control Plane machines came into existence

Duplicate of kubernetes/kubernetes#120481


Which jobs are failing?

master-informing:

Which tests are failing?

capg-e2e.[It] Conformance Tests Should run conformance tests

Screenshot_20230907_125350

Since when has it been failing?

Last 2 runs failed consecutively, first failure : https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-provider-gcp-make-conformance-main-ci-artifacts/1698303295132536832

Testgrid link

https://testgrid.k8s.io/sig-release-master-informing#capg-conformance-main-ci-artifacts

Reason for failure (if possible)

No Control Plane machines came into existence. 
Expected
    <bool>: false
to be true failed [FAILED] Timed out after 1800.000s.
No Control Plane machines came into existence. 
Expected
    <bool>: false
to be true
In [It] at: /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/[email protected]/framework/controlplane_helpers.go:153 @ 09/03/23 12:42:02.614
}

Allow for local-ssd

Given that local-ssds are very performant and with a 3 year commitment less expensive than pd-standard (I do not believe you can get commitment discounts on other storage) I would like to use them for my /var (and use a small pd-standard boot disk). The fact that they are ephemeral and I can not shutdown machines where they are present is not a concern for openshift nodes.

Unfortunately, the cluster-api does not support them. When I create a machine config with a disk of type local-ssd I get the following error message:

error launching instance: googleapi: Error 400: Invalid value for field 'resource.disks[1].type': 'PERSISTENT'. Cannot create local SSD as persistent disk., invalid

Here is the spec/providerSpec/disk[1] definition I used:

            - autoDelete: true
              image: blank
              sizeGb: 375
              type: local-ssd

When I look at the allowed configuration values and the reconciler I don't see any way to specify the type as required by the google api.

Additionally, I could not find a way to keep the sourceImage and 'sizeGB' from being specified or set the interface.

I am happy to work on a PR but have to major obstacles:

Any help or advice is greatly appreciated.

Unable to install Openshift on GCP with quick start.

Team we are trying to setup Openshift Container Platform on GCP project "XXXX" using steps mentioned in document

https://docs.openshift.com/container-platform/4.2/installing/installing_gcp/installing-gcp-default.html

Our GCP installation is failing with error.

time="2021-04-27T07:09:05Z" level=fatal msg="failed to initialize the cluster: Working towards 4.5.35: 86% complete, waiting on authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring"

attaching installation log file.

When i debugged more i found out worker nodes are not getting created as machine-controller was failing with error.

oc logs -f machine-api-controllers-76c494db9-qgmmz -c machine-controller -n openshift-machine-api | grep failed

outputs: ==>

E0427 10:21:34.152815 1 controller.go:258] controller-runtime/controller "msg"="Reconciler error" "error"="ocp-sda-cluster-445cl-master-0: failed to create scope for machine: error getting credentials secret "gcp-cloud-credentials" in namespace "openshift-machine-api": Secret "gcp-cloud-credentials" not found" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"ocp-sda-cluster-445cl-master-0"}

it was not finding gcp-cloud-credentials so i created it manually copying from gcp-credentials of kube-system

later it created worker nodes in GCP compute cloud but its failed to register to Openshift cluster

with error

W0427 10:38:13.369046 1 controller.go:315] ocp-sda-cluster-445cl-worker-c-d25cj: failed to create machine: requeue in: 20s
W0427 10:38:15.345235 1 controller.go:315] ocp-sda-cluster-445cl-worker-a-pr7br: failed to create machine: requeue in: 20s
W0427 10:38:17.318970 1 controller.go:315] ocp-sda-cluster-445cl-worker-a-bxpwn: failed to create machine: requeue in: 20s

I also created google openshift google groups for it.

https://groups.google.com/g/openshift/c/xhORse_wE9I/m/8PZYZP__AgAJ

Currently on master nodes i can see with command oc get nodes but not worker nodes is being created.

ubuntu@openshift-install:~/openshift-sda$ oc get nodes

NAME STATUS ROLES AGE VERSION
ocp-sda-cluster-445cl-master-0.c.flow-on-k8s-test.internal Ready master 30h v1.18.3+cdb0358
ocp-sda-cluster-445cl-master-1.c.flow-on-k8s-test.internal Ready master 30h v1.18.3+cdb0358
ocp-sda-cluster-445cl-master-2.c.flow-on-k8s-test.internal Ready master 30h v1.18.3+cdb0358

On GCP console i can see worker nodes but its not ggetting initialize.

worker node bootstrap have this logs

"resource": {
"type": "gce_instance",
"labels": {
"instance_id": "2636235209896160329",
"project_id": "XXXXX",
"zone": "us-central1-a"
}
},
"timestamp": "2021-05-04T13:59:01.459139363Z",
"severity": "ERROR",
"logName": "projects/XXXXX/logs/compute.googleapis.com%2Fshielded_vm_integrity",
"receiveTimestamp": "2021-05-04T13:59:03.468291729Z"

Termination handler may be slow to detect instance preemption while running inside the instance

Some doubts about the overall GCP termination handler capabilities. Just reading over docs - https://cloud.google.com/compute/docs/instances/preemptible#testing-preemption-settings is describing how to test that your apps handle the preemption well. Our app, in this case, is the termination handler, is doing its job perfectly if it is just a simulated event. But I don't think the event is actually doing its job of removing the instance immediately. In a real environment, I imagine spot instance will be transferred ASAP to different demand, so the actual time from event occurrence to the moment the instance is in the TERMINATING state could be pretty small, say less than default 5s we have now. This means that our termination handler pod process could be killed on the node faster, then it will know it is about to die. If that happens - we leave the node and machine running, the instance is terminated, and worst of all the pod is still running - the behavior I observed while manually stopping instance gcloud compute instances stop - described here as one of the test scenarios here. Even if we have guaranteed 30s margin before the hard stop described in the preemption process, our termination handle could be stopped as one of the firs in the running instance in much less then 5s.

This judgment comes from observing the recommended way to handle the preemption procedure described here. It is a dedicated runnable, executed by the cloud during the shutdown procedure, waiting for some main process to exit as a result of a soft shutdown. Our termination handler is not running as a shutdown script, so we should follow the other recommended way for handling this from outside - https://cloud.google.com/compute/docs/instances/create-start-preemptible-instance#gcloud_2 with hanging HTTP request, by prepending ?wait_for_change=true

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.