Code Monkey home page Code Monkey logo

Comments (7)

alanmeadows avatar alanmeadows commented on September 7, 2024

I think this may be implied above but just want to be sure.

Inserting hosts already marked as provisioned so they aren't touched but can be represented the same as net-new hosts is probably not a large ask. Perhaps as simple as some Machine and BareMetalNode object fabrication so everything is... "success."

I'd urge us to consider as well the scenario where we don't just want to adopt them as objects in Kubernetes without provisioning them, but also offer operators a chance to inventory them. In other words, support a method of reliably asking the system to perform an ironic inspection (and get that back into the CRD as normal) on them for parity with net-new hosts, but ensure no other ironic process (e.g. clean, provision, etc) is engaged and they remain marked as already deployed.

I feel like this could be accomplished with something as simple as a "force-reinventory" hook (I haven't thought this through clearly), because you want operators to be able to insert them with zero impact at any time of their choosing, and to schedule a potentially impactful reboot allowing ironic inspection to run, at a time of their choosing, for each machine when we're talking about existing machines that might be running workloads.

from baremetal-operator.

hardys avatar hardys commented on September 7, 2024

@alanmeadows that's an interesting idea - do you have a specific external provisioning method in mind, where it would not already have the data?

One scenario we've been testing is to deploy the initial controlplane (typically 3 masters) via Ironic, then you have the opportunity to collect the introspection data during deployment (which is fairly inexpensive if we can use the fast-track feature enabled in metal3-io/ironic-image#21), so we'd just need a way to pass that data to the CR via the resource definition, which would avoid the disruption and risk of a post-deploy introspection?

Sounds like you have a different use-case in mind, so it'd be good to further understand it :)

from baremetal-operator.

alanmeadows avatar alanmeadows commented on September 7, 2024

Ah, I think the use case you are thinking of on retaining introspection is from a Metal3-IO cluster to Metal3-IO, like a clusterctl like pivot. Assuming I am understanding correctly, how you address the challenge of that first ephemeral k8s cluster provisioning the first node (or N nodes), and then pivoting their deployed state, along with their introspection data, into the target cluster, along with potentially additional machines the target cluster needs to deploy itself (depending on your flow).

I agree that also needs to be solved for...

The case I am making here is what I have a 200 node cluster I want to bring into the metal3-io fold. I don't really want to reprovision all 200 of the machines that are doing work right now -- I'd like to represent them as deployed and be given some way to have them introspect (at an opportune time, perhaps some maintenance window) so all my BareMetalNode objects look the same, whether its new hosts I've added since started to leverage metal3-io in this site, or nodes already deployed by something other than metal3-io. When I wish to redeploy any of those hosts, I would of course follow a standard metal3-io flow for that. The value here is that one can adopt metal3-io for both brownfield sites, and start using it in greenfield sites and all Machines and BareMetalNode objects look alike (aka, avoid brownfield CRDs are bereft of data other than indicating "dont touch me" where as greenfield is populated with rich introspection) - otherwise it gets weird fast.

from baremetal-operator.

hardys avatar hardys commented on September 7, 2024

Thanks @alanmeadows for clarifying, I can see the benefit of the brownfield use-case you're describing, I guess I've been mostly focussed on the greenfield case so far :)

I still feel a little nervous about rebooting existing production nodes into the introspection/deployment ramdisk, but provided it's an opt-in feature it certainly sounds like it's reasonable to look into.

from baremetal-operator.

dhellmann avatar dhellmann commented on September 7, 2024

Yes, @alanmeadows, this ticket is describing allowing the operator to "manage" hosts that it did not provision, including monitoring power and inventory. The power status updates are easy, because those can happen out of band. The inventory collection is trickier, since we typically do that with the discovery image. One way we have discussed solving that is to have a different version of the discovery agent running in a DaemonSet in the cluster and reporting inventory information from live systems. That will allow us to handle the pivot case (we just wait for that DaemonSet to update the hardware details for the host) and also handle hot-swap hardware replacement.

from baremetal-operator.

dhellmann avatar dhellmann commented on September 7, 2024

There is more discussion about that design in metal3-io/metal3-docs#26

from baremetal-operator.

juliakreger avatar juliakreger commented on September 7, 2024

Background: https://docs.openstack.org/ironic/latest/admin/adoption.html

I'm unsure if this is an enhancement as much as a defensive bug fix to align the operator's use of the API.

from baremetal-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.