Hi, According to the page <a href="https://microsoft.github.io/coyot

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thx <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thx <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Cold State Liveness checking about coyote HOT 9 CLOSED

microsoft commented on July 26, 2024

Cold State Liveness checking

from coyote.

Comments (9)

akashlal commented on July 26, 2024

When a test iteration stops with a Monitor in a hot state, then the tester will report this as a bug. No flag is needed to enable this check and the LivenessTemperatureThreshold has no role to play here. However, it must be the case that the test iteration stopped on its own (i.e., all Actors went idle). It should not be that the max-steps bound was hit (in which case, the temperature of the monitor must exceed LivenessTemperatureThreshold for a violation to be reported by the tester). So, two questions:

Are you using Tasks or Actors/StateMachine?
Can you confirm that the test iteration stops on its own (with monitor in hot state)? Or does it hit max-steps?

from coyote.

yunpengxiao commented on July 26, 2024

I see. Yes, we are using Tasks and because it has a big loop in the code, it actually hit max-steps.

So is there any way to trigger the cold state check after hitting max-steps? or Can I call the check explicitly?

from coyote.

akashlal commented on July 26, 2024

@yunpengxiao I create a very simple liveness test using Tasks to make the discussion more concrete. The test creates two tasks that go on forever. There is a monitor that only remains in the hot state. When I run this test with --iterations 1 --max-steps 10 what happens is the following. The tester will run the iteration for 100 steps maximum (it goes up to 10 times the supplied max-steps). The temperature threshold is set to 50. Each time the monitor stays in a hot state (which is every step for this test), the temperature of the monitor goes up by 1. As expected, the tester reports a liveness violation after 50 steps.

So in some sense, if you have a monitor stuck in a hot state, then you are likely to hit the liveness violation before max-steps is reached. What must be happening in your case is that the monitor is flipping between hot and cold states. The temperature is reset to 0 on moving to a cold state. (The temperature remains same if the monitor state is neither hot nor cold.) If the temperature does not go up sufficiently during an iteration, then the tester does not surface the liveness violation (because it worries about raising a violation prematurely).

We currently don't have a knob to assert the monitor state when max-steps is hit. One can supply a TestIterationDispose method that is called at the end of a test iteration. Maybe you can use that to assert something at the end of a test iteration, but I have never tried adding an assert in that method, so unsure what the tester will do.

from coyote.

yunpengxiao commented on July 26, 2024

Thx for your explanation. However, in our case, we don't care about how high the temperature goes. We only care about if the temperature goes to 0 at the end of execution.

We can take the 3 replica problem for example. At the beginning of the execution, we only have 1 replica. Our code is an infinity loop that checks the replica status and makes the replica. If we define less-than-three replicas as hot state. The cold state is we have more than 3 replicas. And at the end of the execution, we need to check if we are in the cold state. However, we are not sure how many steps we need here so we have to do some experiments about the steps. In this case, it seems that no matter how many steps are given, there will be no liveness violation if the max steps are hit. But it seems better to provide a liveness check here to tell that the replica is not enough at the end of execution. Or provide a switch to check the liveness at the end of execution if max step is hit.

Thanks,

from coyote.

lovettchris commented on July 26, 2024

@yunpengxiao how about you use a LivenessMonitor that has a 3rd message telling it when the test is complete like this:

    internal class LivenessMonitor : Monitor
    {
        public bool _hot;

        public class BusyEvent : Event { }

        public class IdleEvent : Event { }

        public class CompletedEvent : Event { }

        [Start]
        [Cold]
        [OnEventGotoState(typeof(BusyEvent), typeof(Busy))]
        [OnEventDoAction(typeof(CompletedEvent), nameof(OnCompleted))]
        [IgnoreEvents(typeof(IdleEvent))]
        [OnEntry(nameof(OnIdleEnter))]
        private class Idle : State { }

        private void OnIdleEnter()
        {
            _hot = false;
        }

        [Hot]
        [OnEventGotoState(typeof(IdleEvent), typeof(Idle))]
        [OnEventDoAction(typeof(CompletedEvent), nameof(OnCompleted))]
        [IgnoreEvents(typeof(BusyEvent))]
        [OnEntry(nameof(OnHotEnter))]
        private class Busy : State { }

        private void OnHotEnter()
        {
            _hot = true;
        }

        private void OnCompleted()
        {            
            Specification.Assert(!_hot, "Monitor found system is still in the hot state");
        }
    }

So your test then can trigger this final check by sending that completion event like this:

        [Test]
        public async static CoyoteTask Execute(IActorRuntime runtime)
        {
            runtime.RegisterMonitor<LivenessMonitor>();

            SimpleTest test = new SimpleTest();
            for (int i = 0; i < 2; i++)
            {
                await test.CheckStatus();
            }

            Specification.Monitor<LivenessMonitor>(new LivenessMonitor.CompletedEvent());
        }

from coyote.

akashlal commented on July 26, 2024

@lovettchris I think the issue here is that the test is infinite. It relies on max-steps to kill the iteration.

@yunpengxiao I will think about how to add a custom check when a test iteration ends (specifically, when it is killed via the max-steps bound). An example where this is needed, however, would really help because I'm not convinced that it will solve your problem, or that you should express liveness properties this way. Can you come up with a simple test case?

For instance, I modified my example to include the replicas liveness monitor. Intuitively, the monitor states the property "eventually there are three replicas alive, even under failures". The program is written so that it does something every 100 steps. If there are less than three replicas, then it will bring one up, otherwise it will fail one. The count of replicas starts at 0. Therefore, the program will take 301 steps before it can bring up three replicas. After that the count will alternate between 2 and 3 every 100 steps.

Running this program with max-steps set to 60 or less results in a liveness violation. With max-steps set to 60, the temperature threshold is 300 and 300 steps is not enough to bring up three replicas. Such a violation is a false violation because the system perfectly well satisfies the liveness specification. Choosing any value for max-steps that is larger than 60 will not result in a violation, which is the behavior that we want for this example.

If you were to assert that the monitor should not be hot at the end of the test iteration, then the test will almost surely fail all the time because the monitor turns cold briefly before going hot again.

In general, I would not recommend getting into thinking about the "number of steps required". The programmer just supplies the eventuality liveness properties. Think of max-steps flag only as a knob/heuristic. Turning it up will reduce false positives at the cost of slower test performance.

Of course, if I'm completely missing the point, then please do share an example :)

from coyote.

yunpengxiao commented on July 26, 2024

Thx @akashlal for the detailed explanation and example.
Just as you mention, our system has two important features:
* It's infinite and relies on max-steps to kill it.
* It potentially will switch between hot and cold states.
I think your replica liveness example models my test pretty well. I now understand why it's like that by design. But doesn't it mean I have to set a proper temperature threshold if I'd like a liveness check for an infinite test? This is not easy for a large system like us. The state-space can go very large because of the concurrency.

@lovettchris 's method seems like an acceptable workaround for us:)

from coyote.

akashlal commented on July 26, 2024

@yunpengxiao I've never had to set a custom temperature threshold. I just increase max-steps when in doubt, which also increases the threshold and makes false positives less likely. Note that the threshold is only one heuristic, and there are other search algorithms that we may employ in the future that will not require the threshold. For now, best to tune just the max-steps.

from coyote.

yunpengxiao commented on July 26, 2024

Thx @akashlal for the clarification and thx @lovettchris for the help. I will close this issue.

from coyote.

Cold State Liveness checking about coyote HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent