I was deploying DeepSea in a SLE12_SP2 image and when running stage 0 it stopped and r

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I think <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

stage 0: kernel update requires "kernel-default" package to be installed,about suse/deepsea

Comments (64)

l-mb commented on May 25, 2024

stage0 should make sure SLES is fully uptodate, not just the kernel. Then it doesn't matter what the kernel package name is.

from deepsea.

rjfd commented on May 25, 2024

@l-mb how do we check that "SLES is fully uptodate"?

from deepsea.

swiftgist commented on May 25, 2024

@rfd yeah, I ran into the same issue. I have a fix in eric-sle12sp2.

@lmb the kernel seems to be a special case using zypper directly. I am concerned about how to convert these to the Salt equivalents using the zypper salt module. (As in, I don't think there's an equivalent currently.)

from deepsea.

smithfarm commented on May 25, 2024

I think @rjfd is using a JeOS image. This is good because it exposes implicit dependencies in the code.

from deepsea.

rjfd commented on May 25, 2024

@swiftgist @l-mb so regarding my question, does the kernel package influences anything in DeepSea, i.e., do we really need to have the uptodate version of kernel-default package, or we can have the uptodate version of kernel-default-base package?

from deepsea.

swiftgist commented on May 25, 2024

@rfd In general, we should have an up to date kernel. In SES3, we had specific features for iSCSI that were not available in the base kernel. I can see that happening again and it's somewhat a complicated issue to make behave gracefully for the customer in all cases.

@smithfarm we're both using the JeOS which comes with kernel-default-base. I ran into the same issue, but had only reached the point of making a branch.

from deepsea.

rjfd commented on May 25, 2024

@swiftgist does iSCSI in SES4 needs something not available in kernel-default-base package in SLE12_SP2?

from deepsea.

smithfarm commented on May 25, 2024

It would be nice to have an option to get along with just kernel-default-base, if possible, to avoid unnecessarily loading up the test VM.

from deepsea.

swiftgist commented on May 25, 2024

@rjfd I don't know. I haven't looked at the differences between kernel-default-base and kernel-default.

from deepsea.

swiftgist commented on May 25, 2024

@smithfarm Always possible.... everything can be overridden and disabled. So, if you wanted to skip the whole update call (or do it with your own base image), you can. I just suspect that a "zypper up" is normally desirable.

from deepsea.

smithfarm commented on May 25, 2024

I don't understand how running "zypper up" causes kernel-default to be installed, though.

from deepsea.

smithfarm commented on May 25, 2024

Ah, it's because deepsea does "zypper --non-interactive --no-gpg-checks up kernel-default"

Which is the same as saying Ceph requires kernel-default and will not run properly with kernel-default-base ?

from deepsea.

smithfarm commented on May 25, 2024

We don't want to be disabling everything, though - that will have a deleterious effect on test coverage.

from deepsea.

swiftgist commented on May 25, 2024

Is that not true? At least, specifically for SLES.

from deepsea.

rjfd commented on May 25, 2024

@smithfarm DeepSea was assuming that kernel-default was installed and so it only tries to update it. But in a system that has kernel-default-base installed instead the zypper --non-interactive --no-gpg-checks up kernel-default does not have any effect

from deepsea.

swiftgist commented on May 25, 2024

@rfd Are you okay with this change for now? This particular file needs to change to handle Ubuntu and others. While I think pkg.install, etc. will get us further, I think we will need to spend some time going through the failure conditions of the salt zypper module.

from deepsea.

rjfd commented on May 25, 2024

@swiftgist I'm reviewing it.

from deepsea.

rjfd commented on May 25, 2024

@swiftgist I'm still with mixed-feelings about enforcing the installation of kernel-default package.
I think we should only enforce it if it is really needed by any of the services deployed by DeepSea.

IMO, unless you check that iSCSI really needs the kernel-default package, I would prefer that DeepSea to just cope with whatever kernel is installed.

What is the opinion of others about this?

from deepsea.

jecluis commented on May 25, 2024

I'm with @rjfd on this, if I understand it correctly. Unless there's a strong dependency on some kernel, and such dependency is unavoidable, we should always honor whatever the users are using.

from deepsea.

swiftgist commented on May 25, 2024

Does ceph.ko count? :)

from deepsea.

swiftgist commented on May 25, 2024

Comparing the two, all the iscsi related modules are absent from the kernel-default-base. These are target_* and tcm_*.

from deepsea.

smithfarm commented on May 25, 2024

There are various kernel-* packages - take a look: https://build.suse.de/project/show/Devel:Kernel:SLE12-SP2

At least "kernel-debug" and "kernel-vanilla" look like they might need investigation. Maybe we should ask @ddiss and/or @jan--f to weigh in on whether the kernel package for Ceph will always be called "kernel-default" and nothing else?

from deepsea.

swiftgist commented on May 25, 2024

Just as a side note: The only reason we are encountering this issue is that I didn't change the kernel in the vagrant box since JeOS is defaulting to kernel-default-base. Had I changed that, we would still be using kernel-default.

Ulterior motive: The base JeOS is extremely minimal and so is that vagrant box. I wish to ask permission to share the vagrant box since it's not terribly useful without a SLES subscription. Once Ubuntu is supported by DeepSea, it will technically be easier for someone to run DeepSea on Ubuntu (since Ubuntu vagrant boxes are available from Hashicorp) then it would be on SUSE. So, I have not asked yet. It's on the todo list.

from deepsea.

smithfarm commented on May 25, 2024

@swiftgist The vagrant box is not the only reason. The OpenStack images we intend to use for the Jenkins CI are JeOS, too.

That aside, though, we should not assume that kernel-default is the only SUSE kernel package anyone will ever use, just because it's the only one we ever used. We should first find out what the set of possible package names is and then modify the code so it blindly updates all of those packages on the assumption that the kernel that is actually running will be from one of them.

We have to do it this way, because we don't have a good way of knowing which kernel package holds the kernel that is actually being used (unless we want to get into parsing GRUB2 configuration?). Someone might have kernel-default installed, but be running a kernel from a different package, like kernel-debug (which, if I'm not mistaken, would be a legitimate supported scenario). In such a case, the current code would be updating the wrong kernel, right?

from deepsea.

jan--f commented on May 25, 2024

I think @smithfarm has it right. We can't infer that a certain kernel is booted (or will be booted) just by the fact that a certain kernel package is installed.

So we might as well just go with a best effort variant that updates all packages and maybe a reboot if kernel packages were updated. If we rely on certain kernel versions or features, a check might be more suiteable in the validation step. Then we could at least warn the admin. Seems unfeasable to cover all scenarious in an automated fashion.

from deepsea.

smithfarm commented on May 25, 2024

It would be OK to cover 95% of scenarios :-) My proposal, which I think achieves this, is (for any particular distro):

enumerate the package names of distro kernel packages known to work with any legitimate Ceph deployment scenario (this step has to be done before writing any code, and kernel-default-base would be included)
blindly update all of those packages in stage 0 - updating does not cause any packages to be installed

We would not need to check for kernel versions or indeed check if any kernel package is installed at all. We would document that running a non-standard kernel (i.e. any kernel not in this list of packages) means DeepSea will not ensure that your kernel is up-to-date.

from deepsea.

swiftgist commented on May 25, 2024

@smithfarm From what I wrote above, kernel-default-base will not work for Ceph. It's missing cephfs and iscsi related kernel modules. Also, with the zypper command, this is strictly related to SUSE and I only know of one person that had been running on Leap. I believe that the default behavior of DeepSea should support all features of Ceph (at least what we've implemented.) I do not know anybody that wants to get error messages about cephfs and iSCSI failing only to go back and switch kernels on several systems.

I can change the "zypper up kernel-default" to "zypper up {{ running kernel }}", but that does not solve my problem. I can keep my change personally or add it as an alternate default. The end result is that many/most would have to select it if using JeOS based images. I believe the default behavior should match what most users need.

As for other distros, the same logic may not apply and will likely be buried in a Salt module specific to that distribution.

Also, remember that Salt is not interactive. The best we can do is run a check and fail.

Lastly, if we want to remove updating packages in general from Stage 0, I can change the default. However, the whole point of Stage 0 is to make up the difference between the bare metal provisioning and what our minimum bar is. Having multiple minimum bars does not help with testing.

from deepsea.

smithfarm commented on May 25, 2024

@swiftgist Forget kernel-default-base. I'm running kernel-debug which supports all the ceph features. Does stage 0 fail for me?

from deepsea.

swiftgist commented on May 25, 2024

@smithfarm It has not been working for you as intended. I'm assuming 'zypper up kernel-default' has been failing gracefully for you?

from deepsea.

smithfarm commented on May 25, 2024

So you're saying that failing gracefully is the right thing to do in that use case? That is where we do not agree, I guess.

from deepsea.

swiftgist commented on May 25, 2024

No, I'm expecting that Stage 0 hasn't worked for you and I don't why I didn't hear about it.

from deepsea.

swiftgist commented on May 25, 2024

In other words, you could have shortened your argument to "your change would switch my kernel from kernel-debug to kernel-default".

from deepsea.

smithfarm commented on May 25, 2024

"I" in the comment above is the hypothetical user who has kernel-debug installed. @smithfarm is just imagining a valid use case that might not be served well by the current behavior, and is asking the DeepSea experts if his suspicion is correct.

(sorry, kernel-debug - fixed)

from deepsea.

swiftgist commented on May 25, 2024

My point is that the current behavior made it all the way through and shipped with SES4. I have been working from the presumption that SES relies on kernel-default and is the acceptable default. In the current situation, I would be requiring the hypothetical you to override the behavior (i.e. copy the default.sls to custom.sls, modify as desired, tell DeepSea to use it).

Now, if my presumption is incorrect and we generically support multiple kernels, that's fine. I can modify this change to be an alternate default and remove the hard coded 'kernel-default' to always update the currently running kernel whatever it may be.

My question is "Is that what we want?". Do we intend to test Ceph with all imagined kernels? Honestly, I do not believe we have the bandwidth to verify them. I also think it adds another avenue for the non-developer to make an unintentional choice.

I suppose this boils down to making a choice for the admin for the default behavior. My belief is that many admins will be new to both Ceph and Salt. Making enough default choices so that DeepSea "just works" out of the box is the best option. Customization and overrides are available at every level. Also, we have alternate defaults to help with common, but not necessarily the most common choices.

I could create a default-kernel-debug.sls (and probably should since you pointed it out.) Looking at the kernel-*, I believe we are comparing kernel-default-base, kernel-default, kernel-debug, kernel-vanilla and kernel-source. Selecting kernel-default and providing a tested example of kernel-debug will likely match the 95% you mentioned.

What are your thoughts?

from deepsea.

smithfarm commented on May 25, 2024

There is no way to support both kernel-default and kernel-debug in a single sls file?

from deepsea.

swiftgist commented on May 25, 2024

Without resorting to Jinja or salt modules, not really. The closest conditionals you can use are the unless and onlyif. Those are most suitable for making something idempotent. I have avoided Jinja conditionals where I can. I have heard enough warnings from the SaltStack people. There's another type of conditional in Salt that runs depending on the behavior of the previous stanza. I have avoided those as well.

I can make the default behavior always update the current kernel and create an alternate default for those like me that have an unsuitable image and want the kernel changed. That still brings the question of what do we support? If the first reaction to any issue is "What kernel are you running? Try kernel-default and see if that resolves your issue", then I think not selecting kernel-default is just being mean. Stage 0 is effectively saying "whatever kernel you choose" and Support says "no, not so much" does not endear admins. If Stage 0 says "use kernel-default", the admin says "but I want this one", then it's not a terrible surprise when Support asks "does the problem exist with kernel-default?".

from deepsea.

smithfarm commented on May 25, 2024

Things I've been reading:

relevant SLE-12 release note https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12/#fate-317738
kernel-debug package description: "This kernel has several debug facilities enabled that hurt performance. Only use this kernel when investigating problems."

Based on the above it does look like kernel-default might satisfy the "95%".

from deepsea.

rjfd commented on May 25, 2024

Let's try to summarize the strengths and weaknesses of the solutions discussed so far, so that we can reach a consensus on how we should proceed.

Solution A -- enforce the installation of `kernel-default` package

Strengths:

All Ceph services work with this kernel

Weaknesses:

User/admin might be using another kernel and don't want to have it changed due to "reasons"
User/admin doesn't want to use neither iSCSI or CephFS, and therefore kernel-default-base would be enough, but DeepSea swaps it by kernel-default nonetheless

Solution B -- upgrade all kernel packages that are currently installed

Strengths:

The current kernel flavor is not changed, important when such flavor is a requirement to the user/admin
It also works if the current kernel is from the kernel-default package.
Can be a more generic solution that applies to other distros

Weaknesses:

Some Ceph services might not work when deploying them in stage 4
Difficult to ensure a 100% kernel coverage (which kernel supports which Ceph services)

If someone has more points to add to the strengths/weaknesses list, please add them as comments and I'll edit this comment to make it uptodate.

from deepsea.

jan--f commented on May 25, 2024

Wouldn't Solution B also enable us to use the same logic for all distros? Provided we can at some point use the pkg module...

Thats a strenght I'd say.

from deepsea.

smithfarm commented on May 25, 2024

In the Hippocratic Oath, it says "Primum non nocere" ("Above all else, do no harm"), and blowing away someone's kernel would seem to violate that principle.

Blowing away someone's kernel is simply too heavy-handed. Especially when the only people who will hit this are those who have gone out of their way to install a non-default kernel and are not expecting some tool to come along and revert it without warning. Not even systemd does that. . .

Instead of forcing our idea of the "correct" kernel, it would make more sense for Stage 0 to fail if kernel-default is (not installed) or (installed, but not running). The failure message could point the user to the relevant section of the documentation explaining the issue and how to work around it.

This logic could be extended to other distros, assuming they have a single kernel that is considered "correct".

from deepsea.

rjfd commented on May 25, 2024

Wouldn't Solution B also enable us to use the same logic for all distros? Provided we can at some point use the pkg module...

@jan--f this is probably true, I'll add it to my comment.

I also agree with @smithfarm 's last comment, we should not change the Kernel without the user intervention. But I think the best approach would be to, make sure that the system is uptodate regardless of which kernel is installed, and output a warning (with link to documentation) in the case that kernel-default is not being used.

@swiftgist I also found out this salt state pkg.list_upgrades, which outputs all the installed packages that have upgrades available. We can leverage this to decide if an upgrade is needed or not.

from deepsea.

rjfd commented on May 25, 2024

Just to reinforce my previous comment.

But I think the best approach would be to, make sure that the system is uptodate regardless of which kernel is installed, and output a warning (with link to documentation) in the case that kernel-default is not being used.

Since, 95% of the times users will use kernel-default package, no warning will be issued, and DeepSea will run smoothly.

from deepsea.

swiftgist commented on May 25, 2024

@rjfd And are you using kernel-default in your image? That is why you opened the bug because DeepSea currently expected kernel-default. Also, we cannot use pkg.list_upgrades in an sls file. The sls file must do the right thing and it's not interactive.

While I do not understand why the experienced admin is favored over the newcomer that will more than likely make an unintentional choice by using an image such as kernel-default-base, I think it's a case of schadenfreude and I will relent. Also, I wanted this to be working for others that use vagrant. I suspect this will be true of a few cloud environments too. I will resubmit this as an alternate default but leave it to someone else to add the check and produce a suitable warning. I think the exercise using Salt would be educational.

As far as the Salt calls, I have higher expectations. It must fail well and currently the calls do not which is why I am calling zypper directly. I plan to address that separately.

from deepsea.

rjfd commented on May 25, 2024

I believe the best approach would be to:

update the current installed kernel package (whatever that is)
if the kernel package is not kernel-default issue a warning saying that only kernel-default is guaranteed to fully work with all the Ceph services

from deepsea.

smithfarm commented on May 25, 2024

+1 although we should avoid using the word "guarantee". The message could be: "kernel packages other than kernel-default are not tested and may not fully work with all Ceph daemons."

from deepsea.

swiftgist commented on May 25, 2024

Are either of you interested in doing this? Or are you waiting on me?

from deepsea.

smithfarm commented on May 25, 2024

Shouldn't we reach a consensus, first?

from deepsea.

swiftgist commented on May 25, 2024

Simply, I do not agree. We are moving the problem from one group of people to another. I favor disappointing somebody by giving them a working cluster and do not favor allowing the creation of a partially functioning (i.e. broken from a new user's perspective) cluster because they did not see a warning in the noise of the Salt output. In other words, an experienced admin is much more capable of overriding the undesired behavior. The newcomer to Ceph and Salt may walk away with the impression that DeepSea makes poor decisions. I joke that putting the self destruction button next to the "on" button on the coffee button is bad. I've known too many coffee drinkers.

However, I am in the minority and I need to move on. A few of you have said this is the best answer and I said, I will relent. If either of you have the time and desire to implement this, I would appreciate it. If not, I will do it, but I would prefer to finish current branches. One of those is #61 to allow me to share my vagrant box with others and instructions on how to override the default behavior.

from deepsea.

smithfarm commented on May 25, 2024

Isn't #61 blowing away the existing kernel if it isn't kernel-default?

from deepsea.

swiftgist commented on May 25, 2024

Only if that is selected. The "default" from init.sls uses default.sls. The name in #61 is srv/salt/ceph/updates/default-zypper-kernel-default.sls. I (and any others) will have to select it. I named these alternate defaults and a few components have them. These match "default-*.sls".

from deepsea.

rjfd commented on May 25, 2024

@swiftgist why don't you see the problem from the opposite perspective. By installing a different kernel, you are guaranteeing that DeepSea will do the job correctly, but you may be also breaking other existing software that was dependent on the previous installed kernel.

None of us want to break anything for any kind of user, and for that matter, none of the two approaches discussed until now is perfect.

Since no consensus has been reached yet, I'm going to propose a third solution (not a perfect one either).
If stage 0 finds a kernel package different from kernel-default, it fails hard and stops executing, and outputs a message to the user saying that if the user wants to continue using the current kernel it has to run again stage 0 with the keep_kernel=True parameter, otherwise if the user is willing to accept that DeepSea changes the kernel package then it runs stage 0 again with change_kernel=True parameter.

The drawback of this approach is that it requires user intervention, but that is already the case when we update the kernel and DeepSea reboots the machine.

What do you think?

from deepsea.

rjfd commented on May 25, 2024

The third approach (described in my previous comment) has the benefit of making the user aware of the problem and requires an explicit solution from the user.

from deepsea.

swiftgist commented on May 25, 2024

@rjfd You said it right there "guaranteeing that DeepSea will do the job correctly". I am having difficulty imagining this customer that pays for support of SLES and yet builds their own kernel. However, if such a customer does exist, I am not saying that cannot do it, but I select them as the ones that must do the manual intervention. After all, this customer seems to have enough ability to build a Ceph compatible kernel and include additional modules and settings not provided by kernel-default.

In the second solution, this customer with their custom kernel has to do no work, but the number of users that want to give DeepSea and Ceph a try are likely not aware of the dependencies, especially kernel dependencies. Depending on the image they start, these newcomers might luck out. Some will not. I believe this unlucky group is larger than the custom kernel group.

With the third solution, we actually have a default that fails for both groups. The custom kernel customer must update the pillar and rerun; however, the newcomer needs to understand that they can select an alternate default as in #61. This customer must update the pillar and rerun as well.

Remember that the purpose of Stage 0 is to finish provisioning, whatever that may mean. Some sites have their own vetted solutions and may disable Stage 0 entirely. So, Stage 0 is not really for them. Some may use SUSE Manager and disable several parts of Stage 0. Stage 0 defaults are not really for them either. We are left with the group of admins and customers that rely on these complex steps working. I have the disposition to imitate an appliance for the default path, that is, the fewest initial decisions.

If the general concern is that the customer with the custom kernel is in for a surprise, does making that user aware resolve things? When the kernel-default-base is running, we cannot tell intent but we do know that much of the reason to use Ceph will not work. That is the wrong default. For those headstrong admins that willfully select combinations expected not to work, let them make an active selection.

I believe we must provide both sls files, one that enforces kernel-default and one that upgrades the current kernel. The master branch works for neither of these at the moment. Deciding which is the default and which should get a warning/documentation is what remains.

from deepsea.

jan--f commented on May 25, 2024

Imho #61 is a good solution to this.

from deepsea.

rjfd commented on May 25, 2024

@swiftgist will you consider any solution where the default behavior is NOT to install the kernel-default package without user's authorization?

from deepsea.

rjfd commented on May 25, 2024

@swiftgist I want to clarify one thing about the third solution I proposed.

The custom kernel customer must update the pillar and rerun; however, the newcomer needs to understand that they can select an alternate default as in #61. This customer must update the pillar and rerun as well.

None of the users need to update any pillar file. The users just need to rerun stage 0 like this:

salt-run state.orch ceph.stage.0 keep_kernel=True

or,

salt-run state.orch ceph.stage.0 change_kernel=True

This approach will benefit automation scripts that need to decide what to do by just passing the correct parameter.

from deepsea.

swiftgist commented on May 25, 2024

@rjfd I believe that's what the collective you wanted is to remove the kernel-default or possibly the zypper update from default.sls. What solution does not demand action from the admin that picked a JeOS image that defaults to kernel-default-base? None that I'm aware except the one I originally intended. All other solutions are effectively the same. Force the least capable admin to deal with a broken cluster.

As far as adding parameters to salt-run, there's two separate issues. One is that state.orch only accepts a couple of parameters currently and those do not get passed down. The other is that there is "no automation scripts". That would be DeepSea. The configuration for DeepSea is the Salt pillar. If an admin passes it on the command line, then all admins must always do that.

Jan made a particular point. If a customer has a custom kernel that is needed for that particular hardware, forcing a switch of the kernel could brick the system. While I do not believe it likely (I have a decent bit of faith in kernel-default), I will concede that possibility would be catastrophic.

My only other suggestion is that rather than address any kernel, that I be allowed to address kernel-default-base specifically. If that's the running kernel, then change it. I am against software that tells me "I know something is broken, but you have to push the button anyways". However, I do not think I would like the sls to make that happen. And my enthusiasm is waning.

from deepsea.

jecluis commented on May 25, 2024

I am having difficulty imagining this customer that pays for support of SLES and yet builds their own kernel.

And I'm having difficulty understanding why this is all about SLES when DeepSea is an open-source project that's meant to be used, even if at some point in the future, with other distributions and by a wider audience than simply SUSE customers.

One thing is to tailor some things to our internal use cases. Another completely different thing is assuming that the wider audience that may benefit from DeepSea will have to play by the same rules our customers have.

And, if you do decide that the kernel is going to be changed without consulting the user first, I would suggest you write that in big bold letters on top of the README file. At least then, newcomers will have a chance to make an informed decision prior to using DeepSea.

from deepsea.

swiftgist commented on May 25, 2024

@jecluis With respect to other distributions, these particular sls files will not matter since they are SUSE specific. So, this particular issue is related specifically to the kernel selection available from SLES repos. I expect more than this topic to come back for other distributions as well. Does DeepSea carry repo examples for each distribution? It sounds nice if we at least verify these things. I'm sure we could go through the list of everything in Stage 0 and imagine how Ubuntu users might answer differently than CentOS users. Personally, I would go with "whatever" that group wanted to some degree. If Arch Linux uses a different time service in their documentation and default setup, why not assume to use that default for DeepSea? It could be wrong. I think that's okay though since it can be changed and likely fits the 4 out of 5 Arch users prefer it.

As far as the wider audience, I think this would be admins and not developers. I do not think that group has a terribly different agenda than the internal customers.

As much as I would like to, I feel I have been shouted down enough. If I thought the group would accept the big bold letters at the top of the README that DeepSea is quite opinionated and may make default choices that you do not want, I would do it. I would also try to find any other choices that may not sit well and list them (like our pharmaceutical commercials here in the US) as side effects. :)

Regardless of the distro, not everybody wants an open source project to be an adventure when they try to use it for the first time. In my experience, many would like things to just work.

from deepsea.

jecluis commented on May 25, 2024

With respect to other distributions, these particular sls files will not matter since they are SUSE specific. So, this particular issue is related specifically to the kernel selection available from SLES repos. I expect more than this topic to come back for other distributions as well.

While this issue may be specific to SLES and its sls files, from a technical point-of-view, what has been discussed goes far beyond the technical specificity of any distribution: it's framing the philosophy of DeepSea when it comes to modifying an existing environment, overriding (or not) whatever kernel is being used.

This is a conversation that we only need to have once. All the remaining discussions will be on what package names are to be used and distro-specific technical aspects.

As far as the wider audience, I think this would be admins and not developers. I do not think that group has a terribly different agenda than the internal customers.

Different admins have different needs. While our customers may be interested in deploying ceph with all the knobs turned on, some admins may not, and may be okay with using a smaller kernel for their endeavors. Heck, they may be consuming packages from upstream directly; I know of a few that actually compile their kernels. So, presuming that we know best is not the nicest thing to do.

This does not mean that we can't presume we know best, but we should first get the user to allow us to do that. I don't personally care how you do that. For all I care, you can have two different Stage 0 - one that will let DeepSea do whatever it wants to do, and another that will do everything it can without changing the environment. Or you can force the user to set some flag on the CLI, or on a config file, if they want to allow DeepSea to do whatever DeepSea wants. But changing the environment without having the user check the metaphorical "I know what I'm doing, and I don't care what DeepSea does"-box is the least we can do.

And as a last remark,

As much as I would like to, I feel I have been shouted down enough.

I may have read too much into this, but you should not feel like you're being shouted down.

People are discussing. Opinions diverge. If tomorrow I wake up to a wall of rants against my opinion, I'll clarify, or address whatever arguments people have against it. It may just be that I'm missing something, I may not have thought this all through, or maybe I'm suffering from tunnel vision and I can't see past my initial assumptions. It's okay for me to change my opinion. Or maybe I'm just flat out wrong. That's also okay. Sometimes that happens. Eventually everyone will reach a consensus and we can move on. None of this means I've been shouted down - well, unless someone calls me on my phone and literally shouts me down, in which case I'd appreciate it if you were to do that after 9am.

All this to say, I'm sorry if you feel like you're being shouted down. Believe me, and I think I speak for everyone in this thread, that's not anyone's intention.

from deepsea.

rjfd commented on May 25, 2024

Although PR #61 was already merged, #61 did not completely fixed the problem described in the first comment of this issue, because DeepSea's current default state file for update the system didn't change.

I decide to give it a try to the third solution that I proposed in some of my previous comments, and implemented that solution and submitted a PR ( #63 ), and for you to see how that solution would work I made a demo and uploaded to github, you can check it here:

https://github.com/rjfd/DeepSea/blob/wip-deepsea-kernel-default-demo/doc/demos/deepsea-kernel-default.gif

The basic idea is that when user runs stage 0, DeepSea will check if there is any minion that is not using the kernel provided by kernel-default package, and if that is the case, it will not execute any more steps and shows a message to the user explaining what the user can do about it.

from deepsea.

jan--f commented on May 25, 2024

To keep the discussion here:

I think we should rather go towards a less complex default case. Ideally our default.sls would even be distribution agnostic. Imho this is only sensible with basically runing [zypper up, apt-get upgrade, pacman -Suy] whenever salt is able to do that reliably.

This I think has several advantages:

it's easy to maintain
it won't break anything
it'll simply work in many cases
it's obvious in its functionality and can be customized easly

Until then I'm happy with what was commited in #61, with two addendums:

remove the explicit update of kernel-default in default.sls as iiuc this will error out if anything else is installed
add @swiftgist suggestion of dealing with kernel-default-base explicitly (i.e. install kernel-default) since we know it won't produce a fully working ceph environment and we can savely assume it won't break anything

This is a easy to understand and maintain default that won't change much functionality-wise once we can move to a distro-agnostic default. Adding customized behaviour is easy enough since this functionality was added by @swiftgist for exaclty this case.

Regarding #63, as user friendly as it is, imho it goes into the wrong direction complexity-wise and when it comes to being distro-agnostic. It also won't work too well with the long term goal of automating the deployment and automating cluster changes.

from deepsea.

rjfd commented on May 25, 2024

@jan--f thanks for your analysis. I agree with your point of view. I submitted a new PR #64 that addresses the problem as you suggested and fixes the bug described in this issue.

from deepsea.

jan--f commented on May 25, 2024

Closing since we seem to have reached some sort of compromise.

from deepsea.

stage 0: kernel update requires "kernel-default" package to be installed about deepsea HOT 64 CLOSED

Comments (64)

Solution A -- enforce the installation of `kernel-default` package

Solution B -- upgrade all kernel packages that are currently installed

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (64)

Solution A -- enforce the installation of kernel-default package

Solution B -- upgrade all kernel packages that are currently installed

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Solution A -- enforce the installation of `kernel-default` package