Code Monkey home page Code Monkey logo

baremetal-operator's People

Contributors

andfasano avatar ardaguclu avatar asalkeld avatar bfournie avatar demoncoder95 avatar dependabot[bot] avatar dhellmann avatar dtantsur avatar dukov avatar fmuyassarov avatar furkatgofurov7 avatar hellcatlk avatar honza avatar hroyrh avatar kashifest avatar lentzi90 avatar longkb avatar maelk avatar maxrantil avatar metal3-io-bot avatar mquhuy avatar n1r1 avatar namnx228 avatar russellb avatar s3rj1k avatar stbenjam avatar tuminoid avatar zainubw avatar zaneb avatar zhouhao3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

baremetal-operator's Issues

using secrets for user data is inconvenient

The BMO wants the user data values to come from a Secret because that's how OpenShift stores the data needed by a host running an RHCOS image to become a node. That isn't the most convenient form for other types of images, though, so adding another way that is easier would be nice. We could expand the BareMetalHost CRD to accept base64 encoded data directly, or to have a more defined structure for the data.

See https://github.com/metal3-io/metal3-dev-env/blob/master/provision_host.sh for an example of a script that uses the existing Secret mechanism.

Automatically determine IP address on the provisioning network

At the moment, pkg/provisioner/ironic/ironic.go includes a hard coded IP address for the baremetal-operator to use on its provisioning network:

		// FIXME(dhellmann): We need to get our IP on the
		// provisioning network from somewhere.
		driverInfo["deploy_kernel"] = "http://172.22.0.1/images/ironic-python-agent.kernel"
		driverInfo["deploy_ramdisk"] = "http://172.22.0.1/images/ironic-python-agent.initramfs"

As noted by the FIXME comment there, we need to add a way to get this IP that makes this easier to use different environments.

Introspection gives every NIC a network name of "pod networking"

Right now we hardcode a network name of "Pod networking" for every NIC. This should be either removed, or made dynamic somehow.

https://github.com/metal3-io/baremetal-operator/blob/master/pkg/provisioner/ironic/ironic.go#L378

Example:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"metal3.io/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{},"name":"kube-worker-0","namespace":"metal3"},"spec":{"bmc":{"address":"ipmi://192.168.111.1:6233","credentialsName":"kube-worker-0-bmc-secret"},"bootMACAddress":
"00:93:1e:b1:74:87","online":true}}
  creationTimestamp: "2019-05-03T18:24:45Z"
  finalizers:
  - baremetalhost.metal3.io
  generation: 2
  name: kube-worker-0
  namespace: metal3
  resourceVersion: "1382"
  selfLink: /apis/metal3.io/v1alpha1/namespaces/metal3/baremetalhosts/kube-worker-0
  uid: ba1d5285-6dd0-11e9-86cf-4c9a6490472b
spec:
  bmc:
    address: ipmi://192.168.111.1:6233
    credentialsName: kube-worker-0-bmc-secret
  bootMACAddress: 00:93:1e:b1:74:87
  hardwareProfile: ""
  online: true
status:
  errorMessage: ""
  goodCredentials:
    credentials:
      name: kube-worker-0-bmc-secret
      namespace: metal3
    credentialsVersion: "807"
  hardware:
    cpu:
      count: 2
      model: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
      speedGHz: 3.50401
      type: x86_64
    nics:
    - ip: 172.22.0.54
      mac: 00:93:1e:b1:74:87
      model: 0x1af4 0x0001
      name: eth0
      network: Pod Networking
      speedGbps: 0
    - ip: 192.168.111.23
      mac: 00:93:1e:b1:74:89
      model: 0x1af4 0x0001
      name: eth1
      network: Pod Networking
      speedGbps: 0
    ramGiB: 4
    storage:
    - model: QEMU QEMU HARDDISK
      name: /dev/sda
      sizeGiB: 50
      type: HDD
    - model: '0x1af4 '
      name: /dev/vda
      sizeGiB: 8
      type: HDD
  hardwareProfile: unknown
  lastUpdated: "2019-05-03T18:29:58Z"
  operationalStatus: OK
  poweredOn: true
  provisioning:
    ID: c718759b-518e-446b-afd2-010374971f81
    image:
      checksum: ""
      url: ""
    state: ready

Ensure Ironic is configured to erase metadata

Issue #32 discusses the option of configuring Ironic to completely wipe disks in the future.

In the meantime, we should default to at least erasing metadata during deprovisioning to ensure that powering the node back on will not boot back into the OS previously provisioned.

According to Julia Kreger, the Ironic configuration we want is:

[conductor]
automated_clean = True 

[deploy]
erase_devices_priority = 0 
erase_devices_metadata_priority = 10

This will ensure we always drive a node through cleaning when we setup a node. It also ensures we skip the general disk wipe/erase, and turns on metadata only wiping.

Integrate Ironic discovery workflow

Ironic is capable of doing discovery of new hosts. This should be an optional capability of the baremetal-operator. When we discover a new host, it should result in automatically creating a corresponding BareMetalHost object.

actively monitor the power status of each host

The baremetal operator should actively monitor the power on each host to ensure it matches the desired state.

We should show the power status of each host as one of:

  • powering on
  • powered on
  • powering off
  • powered off

And we should adjust the power state when it does not match what the CRD shows the user wanted.

Support RAID setup for baremetal server

We'd like to deploy baremetal server with RAID configuration. In order to
setup/unset RAID, using vendor driver is necessary. This issue proposes a new yaml
attribute to setup RAID with Fujitsu PRIMERGY server by using iRMC driver.

Support BIOS setup for baremetal server

In order to setup/reset BIOS, vendor driver is necessary. This issue proposes a new yaml
attribute to setup BIOS with Fujitsu PRIMERGY server by using iRMC driver.

introspection data shows nic speed as 0

I just checked the resulting hardware details from introspection on the BareMetalHost objects via metal3-io/metal3-dev-env and noticed that the NICs are listed with a speed of 0.

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"metal3.io/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{},"name":"kube-worker-0","namespace":"metal3"},"spec":{"bmc":{"address":"ipmi://192.168.111.1:6233","credentialsName":"kube-worker-0-bmc-secret"},"bootMACAddress":
"00:93:1e:b1:74:87","online":true}}
  creationTimestamp: "2019-05-03T18:24:45Z"
  finalizers:
  - baremetalhost.metal3.io
  generation: 2
  name: kube-worker-0
  namespace: metal3
  resourceVersion: "1382"
  selfLink: /apis/metal3.io/v1alpha1/namespaces/metal3/baremetalhosts/kube-worker-0
  uid: ba1d5285-6dd0-11e9-86cf-4c9a6490472b
spec:
  bmc:
    address: ipmi://192.168.111.1:6233
    credentialsName: kube-worker-0-bmc-secret
  bootMACAddress: 00:93:1e:b1:74:87
  hardwareProfile: ""
  online: true
status:
  errorMessage: ""
  goodCredentials:
    credentials:
      name: kube-worker-0-bmc-secret
      namespace: metal3
    credentialsVersion: "807"
  hardware:
    cpu:
      count: 2
      model: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
      speedGHz: 3.50401
      type: x86_64
    nics:
    - ip: 172.22.0.54
      mac: 00:93:1e:b1:74:87
      model: 0x1af4 0x0001
      name: eth0
      network: Pod Networking
      speedGbps: 0
    - ip: 192.168.111.23
      mac: 00:93:1e:b1:74:89
      model: 0x1af4 0x0001
      name: eth1
      network: Pod Networking
      speedGbps: 0
    ramGiB: 4
    storage:
    - model: QEMU QEMU HARDDISK
      name: /dev/sda
      sizeGiB: 50
      type: HDD
    - model: '0x1af4 '
      name: /dev/vda
      sizeGiB: 8
      type: HDD
  hardwareProfile: unknown
  lastUpdated: "2019-05-03T18:29:58Z"
  operationalStatus: OK
  poweredOn: true
  provisioning:
    ID: c718759b-518e-446b-afd2-010374971f81
    image:
      checksum: ""
      url: ""
    state: ready

consider recording provisioning history somewhere besides events

We use Event objects to record history for operations associated with a given host. However, events have a finite lifetime that is shorter than that of the host object, and so when they are cleaned up we will lose some of the history.

To address that, we should consider adding more details to the status block of the host object itself. We probably don't want to keep the full history of the host indefinitely, but we may want to keep more information than we have today.

References:

add OpenAPI validation to CRD

The operator-sdk supports generating OpenAPI validation parameters for the CRD automatically ("operator-sdk generate opensapi"). We need to

  • add the current generated data, without losing any of the metadata in that file like printer columns and short names (can that also be auto-generated?)
  • add a Travis test to ensure the data never grows stale

See https://github.com/operator-framework/operator-sdk/blob/master/doc/sdk-cli-reference.md#openapi and https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions

Expose hostname provided by DHCP in introspection data

Currently, the network info we have on a BareMetalHost looks like this:

      nics:
      - ip: 172.22.0.86
        mac: 00:5a:10:3f:c2:3d
        model: 0x1af4 0x0001
        name: eth0
        network: Pod Networking
        speedGbps: 0
      - ip: 192.168.111.21
        mac: 00:5a:10:3f:c2:3f
        model: 0x1af4 0x0001
        name: eth1
        network: Pod Networking
        speedGbps: 0

If a hostname is provided by DHCP, I would like to see it as a new field in here.

This is needed by metal3-io/cluster-api-provider-baremetal#49

The issue is that we eventually need all of the addresses that show up on a Node to also be in the Machine status. Right now that includes both IP and hostname. The info we have so far will let us populate the Machine status field with the expected IPs, but not hostname.

replace machine reference with more generic "consumer ID" field

The current machine reference field holds the name of a Machine object, which ties the host objects closely to the Cluster API. Let's rename that and make it a simple string to hold a "consumer ID" so we can still track that something is using the host, but not require that the something be a Machine.

Ironic Provisioning Error not Published Reliably

When a BareMetalHost is provisioned, if there is an issue during the deployment, and the image hasn't changed, we will try again. In fact, we will try forever.

I expect that between each failure, we will bubble up the error message from Ironic per https://github.com/metal3-io/baremetal-operator/blob/master/pkg/provisioner/ironic/ironic.go#L809-L828

However, I do not see that playing out consistently. Take the following example where the md5sum for ub16-password-is-ubuntu.qcow2 was intentionally corrupted and then eventually fixed allowing the provisioning to complete. I do not see any error history. When we try the same for ub-16.04-test.img and in the middle of that process, switch the image to ub16-password-is-ubuntu.qcow2 (this time with a correct checksum), we do see the error bubble up for the ub-16.04-test.img attempt.

I am still struggling to find the exact cause of this and would be happy to fix the bug if I could find it. I suspect--but have not confirmed--as long as the image doesn't change, https://github.com/metal3-io/baremetal-operator/blob/master/pkg/provisioner/ironic/ironic.go#L823-L824 may not be doing what we anticipate but this is only conjecture.

Events:
  Type    Reason                  Age   From                         Message
  ----    ------                  ----  ----                         -------
  Normal  DeprovisioningStarted   100m  metal3-baremetal-controller  Image deprovisioning started
  Normal  PowerOn                 98m   metal3-baremetal-controller  Host powered on
  Normal  DeprovisioningComplete  98m   metal3-baremetal-controller  Image deprovisioning completed
  Normal  ProvisioningStarted     88m   metal3-baremetal-controller  Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
  Normal  ProvisioningStarted     84m   metal3-baremetal-controller  Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
  Normal  ProvisioningStarted     83m   metal3-baremetal-controller  Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
  Normal  ProvisioningStarted     81m   metal3-baremetal-controller  Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
  Normal  ProvisioningStarted     80m   metal3-baremetal-controller  Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
  Normal  ProvisioningStarted     78m   metal3-baremetal-controller  Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
  Normal  ProvisioningComplete    75m   metal3-baremetal-controller  Image provisioning completed for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
  Normal  DeprovisioningStarted   69m   metal3-baremetal-controller  Image deprovisioning started
  Normal  DeprovisioningComplete  67m   metal3-baremetal-controller  Image deprovisioning completed
  Normal  ProvisioningStarted     67m   metal3-baremetal-controller  Image provisioning started for http://172.22.0.1/images/ub-16.04-test.img
  Normal  ProvisioningError       64m   metal3-baremetal-controller  Image provisioning failed: node be43421d-c2dd-4684-9eee-c49108520e5c command status errored: {u'message': u'Error verifying image checksum: Image failed to verify against checksum. location: ub-16.04-test.img; image ID: /tmp/ub-16.04-test.img; image checksum: bad md5sum; verification checksum: fd7659f1fb028049596608f4659d5923', u'code': 500, u'type': u'ImageChecksumError', u'details': u'Image failed to verify against checksum. location: ub-16.04-test.img; image ID: /tmp/ub-16.04-test.img; image checksum: bad md5sum; verification checksum: fd7659f1fb028049596608f4659d5923'}
  Normal  ProvisioningStarted     61m   metal3-baremetal-controller  Image provisioning started for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2
  Normal  ProvisioningComplete    58m   metal3-baremetal-controller  Image provisioning completed for http://172.22.0.1/images/ub16-password-is-ubuntu.qcow2

introspection data logged as encoded value

PR #147 adds introspection. The logging it does or the collected data isn't very useful, because it comes out as a big encoded string.

2019-04-22T10:11:02.682-0400    INFO    baremetalhost_ironic    Received introspection data     {"host": "openshift-worker-0", "data": "eyJhbGxfaW50ZXJmYWNlcyI6eyJldGgwIjp7ImNsaWVudF9pZCI6bnVsbCwiaXAiOiIxNzIuM\
jIuMC4xMiIsIm1hYyI6IjAwOjdlOmUxOjAzOjMwOjNlIiwicHhlIjp0cnVlfSwiZXRoMSI6eyJjbGllbnRfaWQiOm51bGwsImlwIjoiMTkyLjE2OC4xMTEuMjUiLCJtYWMiOiIwMDo3ZTplMTowMzozMDo0MCIsInB4ZSI6ZmFsc2V9fSwiYm9vdF9pbnRlcmZhY2UiOiIwMDo3ZT\
plMTowMzozMDozZSIsImNwdV9hcmNoIjoieDg2XzY0IiwiY3B1cyI6NCwiZGF0YSI6W1siZGlzayIsImxvZ2ljYWwiLCJjb3VudCIsIjEiXSxbImRpc2siLCJ2ZGEiLCJzaXplIiwiNTMiXSxbImRpc2siLCJ2ZGEiLCJ2ZW5kb3IiLCIweDFhZjQiXSxbImRpc2siLCJ2ZGEiLCJ\
vcHRpbWFsX2lvX3NpemUiLCIwIl0sWyJkaXNrIiwidmRhIiwicGh5c2ljYWxfYmxvY2tfc2l6ZSIsIjUxMiJdLFsiZGlzayIsInZkYSIsInJvdGF0aW9uYWwiLCIxIl0sWyJkaXNrIiwidmRhIiwibnJfcmVxdWVzdHMiLCIyNTYiXSxbImRpc2siLCJ2ZGEiLCJzY2hlZHVsZXIi\
LCJtcS1kZWFkbGluZSJdLFsic3lzdGVtIiwicHJvZHVjdCIsIm5hbWUiLCJLVk0iXSxbInN5c3RlbSIsInByb2R1Y3QiLCJ2ZW5kb3IiLCJSZWQgSGF0Il0sWyJzeXN0ZW0iLCJwcm9kdWN0IiwidmVyc2lvbiIsIlJIRUwgNy42LjAgUEMgKGk0NDBGWCArIFBJSVgsIDE5OTYpI\
l0sWyJzeXN0ZW0iLCJwcm9kdWN0IiwidXVpZCIsIjg3MTRiMDE3LWE3NTAtNGJkOS1hNjg1LWQ3NWU4MGYxODg1MCJdLFsiZmlybXdhcmUiLCJiaW9zIiwidmVyc2lvbiIsIjEuMTEuMC0yLmVsNyJdLFsiZmlybXdhcmUiLCJiaW9zIiwiZGF0ZSIsIjA0LzAxLzIwMTQiXSxbIm\
Zpcm13YXJlIiwiYmlvcyIsInZlbmRvciIsIlNlYUJJT1MiXSxbIm1lbW9yeSIsInRvdGFsIiwic2l6ZSIsIjE3MTc5ODY5MTg0Il0sWyJuZXR3b3JrIiwiZXRoMCIsImJ1c2luZm8iLCJ2aXJ0aW9AMCJdLFsibmV0d29yayIsImV0aDAiLCJpcHY0IiwiMTcyLjIyLjAuMTIiXSx\

BMO doesn't retry on a host with wrong bmc credentials

If a BMH CRD has wrong credentials, host is added but is not retried even after credentials were fixed.

---
apiVersion: v1
kind: Secret
metadata:
  name: openshift-node-4-bmc-secret
type: Opaque
data:
  username: YmFkdXNlcgo=
  password: YmFkcGFzcwo=

---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: openshift-node-4
spec:
  online: true
  bmc:
    address: ipmi://192.168.122.1:6234
    credentialsName: openshift-node-4-bmc-secret
  bootMACAddress: 52:54:00:32:78:1a
  image:
    url: "http://172.22.0.1/images/rhcos-ootpa-latest.qcow2"
    checksum: "http://172.22.0.1/images/rhcos-ootpa-latest.qcow2.md5sum"

BMH ends up in error state:

Events:
  Type    Reason             Age   From                         Message
  ----    ------             ----  ----                         -------
  Normal  Registered         15m   metal3-baremetal-controller  Registered new host
  Normal  RegistrationError  15m   metal3-baremetal-controller  Failed to get power state for node 65e845dc-7a5c-4e13-96f7-5f29639e4a28. Error: IPMI call failed: power status.

and is not retried even after the credentials are fixed

{"level":"info","ts":1558441798.9009247,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-node-4"}
{"level":"info","ts":1558441798.9010067,"logger":"baremetalhost_ironic","msg":"ironic settings","endpoint":"http://localhost:6385/v1/","inspectorEndpoint":"http://localhost:5050/v1/","deployKernelURL":"http://172.22.0.1/images/ironic-python-agent.kernel","deployRamdiskURL":"http://172.22.0.1/images/ironic-python-agent.initramfs"}
{"level":"info","ts":1558441798.9010217,"logger":"baremetalhost","msg":"registering and validating access to management controller","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-
node-4","provisioningState":"registering"}
{"level":"info","ts":1558441798.9010265,"logger":"baremetalhost_ironic","msg":"validating management access","host":"openshift-node-4"}
{"level":"info","ts":1558441798.92309,"logger":"baremetalhost_ironic","msg":"found existing node by ID","host":"openshift-node-4"}
{"level":"info","ts":1558441798.9481359,"logger":"baremetalhost","msg":"stopping on host error","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-node-4","provisioningSta
te":"registering","message":"Failed to get power state for node 65e845dc-7a5c-4e13-96f7-5f29639e4a28. Error: IPMI call failed: power status."}
{
    "apiVersion": "v1",
    "data": {
        "password": "Z29vZHBhc3N3b3JkCg==",
        "username": "Z29vZHVzZXIK"
    },
    "kind": "Secret",
    "metadata": {
        "annotations": {
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"data\":{\"password\":\"Z29vZHBhc3N3b3JkCg==\",\"username\":\"Z29vZHVzZXIK\"},\"kind\":\"Secret\",\"metadata\":{\"annotations\":{},\"name\":\"openshift-node-4-bmc-secret\",\"namespace\":\"openshift-machine-api\"},\"type\":\"Opaque\"}\n"
        },
        "creationTimestamp": "2019-05-21T12:28:59Z",
        "name": "openshift-node-4-bmc-secret",
        "namespace": "openshift-machine-api",
        "ownerReferences": [
            {
                "apiVersion": "metal3.io/v1alpha1",
                "blockOwnerDeletion": true,
                "controller": true,
                "kind": "BareMetalHost",
                "name": "openshift-node-4",
                "uid": "02aae0a7-7bc4-11e9-abd0-525400f8c71d"
            }
        ],
        "resourceVersion": "287714",
        "selfLink": "/api/v1/namespaces/openshift-machine-api/secrets/openshift-node-4-bmc-secret",
        "uid": "02a8cb40-7bc4-11e9-abd0-525400f8c71d"
    },
    "type": "Opaque"
}```

Consider how to DHCP on all interfaces with cloud-init based images

The default network configuration applied by cloud-init is to DHCP on the first interface only. In bare metal environments, we will often want more than that. For example, in metal3-io/metal3-dev-env, the hosts are created with two network interfaces: one for provisioning and one as the primary, external network. The provisioning network is the first interface. When we provision these hosts with a cloud-init based image, we need to provide additional configuration to get the second interface to come up.

Right now this is done manually via extra cloud-init configuration passed through user-data. See: https://github.com/metal3-io/metal3-dev-env/blob/master/provision_host.sh#L15-L45

It would be nice to consider how the baremetal-operator could help make this easier.

BaremetalHost can't be deleted from enroll/verifying state

If you register a baremetal host with incorrect IPMI credentials it becomes undeletable, since we only attempt delete from nodes.Manageable state in pkg/provisioner/ironic/ironic.go.

Probably we should handle this case and delete, since it works OK performing a delete from this state via the ironic CLI/API.

document launching the operator within the cluster

The dev instructions talk about launching the operator outside of the cluster for development. We should add instructions for launching it inside the cluster, like one would do for production systems.

add real hardware profile matching

Now that we have basic introspection working, we should add proper hardware profile matching to detect the type of hardware being used on a host.

change the name of BareMetalHost.BMC.IP to something more generic

Some BMCs require a URL rather than just an IP address, so we want to make the field name more generic. Some discussion within the team came up with "Address" and "Location". "Address" is not sufficiently clear that it might include a URL. Are there other options?

add coverage info to unit test output

Go's test framework has built-in support for coverage reporting, but we don't use it. It would be good to have that information available as part of the output when unit tests are run.

need upgrade story

When we upgrade the baremetal operator we will exercise the design principle that the ironic database is ephemeral. We need to verify that the operator will re-register the host objects so that ironic can manage them, but they won't be reprovisioned. We deal with adopting control plane nodes already, but we don't want previously provisioned hosts to show up as "externally provisioned" so we may need a different workflow for this case. Perhaps ironic's "adopt" feature?

expand image checksum type support

The BareMetalHost.Spec currently supports providing a checksum for the image to be provisioned. The checksum can be provided directly as a string, or as a URL which will be fetched.

These questions need to be answered:

  • Is the checksum assumed to be md5sum?
  • Should we allow the checksum type to be specified somehow, either as a separate field, or as part of the same field? example of the latter: "sha256:abcdef..."

Whatever answers emerge, the documentation for the Image struct should be improved with more detail.

Operational history for BMH is not displayed

Problem description

Events about BMH state transition is not displayed with resource describe:

oc describe baremetalhost discovered-node-0 -n openshift-machine-api
Name:         discovered-node-0
Namespace:    openshift-machine-api
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"metalkube.org/v1alpha1","kind":"BareMetalHost","metadata":{"annotations":{},"name":"discovered-node-0","namespace":"openshi...
API Version:  metalkube.org/v1alpha1
Kind:         BareMetalHost
Metadata:
  Creation Timestamp:  2019-05-03T09:19:31Z
  Finalizers:
    baremetalhost.metalkube.org
  Generation:        2
  Resource Version:  350682
  Self Link:         /apis/metalkube.org/v1alpha1/namespaces/openshift-machine-api/baremetalhosts/discovered-node-0
  UID:               8f2d0fbb-6d84-11e9-b472-525400a4453e
Spec:
  Bmc:
    Address:           
    Credentials Name:  discovered-node-0-bmc-secret
  Boot MAC Address:    52:54:00:b7:e8:e8
  Hardware Profile:    
  Online:              true
Status:
  Error Message:  Empty BMC address Missing BMC connection detail 'Address'
  Good Credentials:
  Hardware Profile:    
  Last Updated:        2019-05-03T09:19:31Z
  Operational Status:  discovered
  Powered On:          false
  Provisioning:
    ID:  
    Image:
      Checksum:  
      URL:       
    State:       
Events:          <none>

While in baremetal-operator logs:

{"level":"info","ts":1556875171.6777682,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0"}
{"level":"info","ts":1556875171.6778715,"logger":"baremetalhost","msg":"adding finalizer","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0","existingFinalizers":[],"newValue":"bareme
talhost.metalkube.org"}
{"level":"info","ts":1556875171.6889548,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0"}
{"level":"info","ts":1556875171.6891785,"logger":"baremetalhost","msg":"updating owner of secret","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0"}
{"level":"info","ts":1556875171.7072365,"logger":"baremetalhost","msg":"publishing event","reason":"Discovered","message":"Discovered host with unusable BMC details: Empty BMC address Missing BMC connection deta
il 'Address'"}
{"level":"info","ts":1556875171.726009,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"discovered-node-0"}

Steps to reproduce

  1. Create BMH CR with missing BMC details, for e.g.:
---
apiVersion: v1
kind: Secret
metadata:
  name: discovered-node-0-bmc-secret
type: Opaque
data:
  username: YWRtaW4=
  password: cGFzc3dvcmQ=

# BMC address intentionally left empty to trigger transition
# to 'Discovered' state
---
apiVersion: metalkube.org/v1alpha1
kind: BareMetalHost
metadata:
  name: discovered-node-0
spec:
  online: true
  bmc:
    address:
    credentialsName: discovered-node-0-bmc-secret
  bootMACAddress: 52:54:00:b7:e8:e8
  1. Realize those resources:
oc apply -f discovered_node_cr.yaml -n openshift-machine-api
  1. Check resource's defintion
oc describe baremetalhost discovered-node-0 -n openshift-machine-api

Add option to wipe disks when deprovisioning hosts

Ironic has the ability to wipe disks as part of the deprovisioning workflow. This is time consuming, so we assume this will not be default behavior. We still may want to provide this as optional behavior in the future.

add firmware details to host CRD

The firmware details from inspection are available in gophercloud, but not exposed in the host CRD.

On a libvirt/kvm VM I get:

    "firmware": {
      "bios": {
        "date": "04/01/2014",
        "version": "1.11.0-2.el7",
        "vendor": "SeaBIOS"
      }
    },

It would be useful to be able to display that information about a host.

Some OpenShift-isms to cleanup

See:

$ git grep -n '[^a-zA-Z]oc '
docs/dev-setup.md:126:The output can be passed directly to `oc apply` like this:
docs/dev-setup.md:129:$ go run cmd/make-virt-host/main.go openshift_worker_1 | oc apply -f -
docs/publishing-images.md:69:       oc apply -f deploy/dev-operator.yaml
docs/publishing-images.md:71:To monitor the operator, use `oc get pods` to find the pod name for
docs/publishing-images.md:73:`oc log -f $podname` to see the console log output.

Log seems overly verbose when BMC secret not found

When the BMC secret is not found, I get a pretty verbose error in the log. This seems like a condition we can expect and not log as a full backtrace, if possible.

{"level":"info","ts":1554830713.4522338,"logger":"controller_baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-worker-2"}
{"level":"error","ts":1554830713.4526749,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"metalkube-baremetalhost-controller","request":"openshift-machine-api/openshift-worker-2","error":"BMC credentials are invalid: failed to fetch BMC credentials from secret reference: Secret \"openshift-worker-2-bmc-secret\" not found","errorVerbose":"Secret \"openshift-worker-2-bmc-secret\" not found\nfailed to fetch BMC credentials from secret reference\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).getValidBMCCredentials\n\t/go/src/github.com/metalkube/baremetal-opera
tor/pkg/controller/baremetalhost/baremetalhost_controller.go:638\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metalkube/baremetal
-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:243\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:213\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-opera
tor/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vend
or/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361\nBMC credentials are invalid\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_c
ontroller.go:245\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/
sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:213\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/g
o/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/met
alkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361","stacktrace":"github.com/metalkube/baremetal-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/metalkube/baremet
al-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller
/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.