Code Monkey home page Code Monkey logo

Comments (6)

zcin avatar zcin commented on September 28, 2024

@kyle-v6x Could you describe what behavior you would expect? Setting a small downscaling factor could block downscaling altogether, even if the traffic is 10% of the expected amount of traffic the current N running replicas should handle, defined by your own autoscaling config.

The purpose of a small downscaling factor is to prevent the serve controller from making drastic decisions, e.g. dropping from 100 replicas to 10 replicas. However it should not be prevent the number of replicas from dropping from 2 replicas to 1 replica if the current traffic is way below what 2 replicas can handle.

from ray.

kyle-v6x avatar kyle-v6x commented on September 28, 2024

@zcin

So there's a few issues with the merged solution:

  1. It changes the way the scaling factor operates for all values. Let's say you have a scaling factor of 0.6, the new code will drastically change the behaviour of the autoscaler as it will always remove one replica despite the smoothed value. For a cluster with 5 workers during a low-traffic period, this could cause a 20% mismatch in capacity. There is no way to easilly explain this behaviour, and it contradicts the config parameter
  2. Previously, low scaling factors could actually be a benefit by providing a 'soft-min' for replica counts. After this point, the cluster would only scale down if the traffic was 0.

I would argue that the previous behaviour was not an issue, as it followed the expected behaviour of a low multiplicative factor. The new behaviour breaks smoothing for any clusters with a low numbers of replicas. It would be nice to revert the changes at least untill another solution is found that doesn't also break large scaling values (>=0.6).

Regarding potential long-term solutions after reverting the current solution:

  • Compute a momentum value based on previous scaling decisions, so each decision is not made indepentantly
  • Apply some log factor to squish the possible inputs
  • Apply some conditional logic to increase the smoothing factor for low replica counts: if smoothing_factor < 0.55: smoothing_factor = min(0.55, smoothing_factor + 1/num_running_replicas)

From my own testing, the third option seems the most ideal.

from ray.

zcin avatar zcin commented on September 28, 2024

@kyle-v6x Could you help me better understand your issue?

For a cluster with 5 workers during a low-traffic period, this could cause a 20% mismatch in capacity.

In this case, Serve will decrease the number of replicas only if traffic is consistently at less than 80% of what your 5 workers is expected to handle. If you don't want the number of replicas to decrease in this case, you can tune target_ongoing_requests (if you'd rather have each worker handle less traffic), min_replicas (if you want to have each worker handle the same amount of traffic, but don't want the number of workers to drop below 5), or downscale_delay_s (if you want to decrease the frequency of downscale decisions to prevent oscillation).

Previously, low scaling factors could actually be a benefit by providing a 'soft-min' for replica counts. After this point, the cluster would only scale down if the traffic was 0.

Although this was a side effect of low scaling factors before, I think this is not the original intended purpose. Our motivation for introducing scaling factor was to prevent Serve from introducing too big of a "delta" in the number of replicas running in the cluster, to prevent oscillation in the replica graph. If "soft min" is your goal (i.e. only decrease replicas if traffic is zero, but otherwise keep it at some positive minimum of replicas) in my opinion this is more of a feature request. Do you want to file a github issue for this, and assign it to the Serve team?

from ray.

kyle-v6x avatar kyle-v6x commented on September 28, 2024

Of course.

In this case, Serve will decrease the number of replicas only if traffic is consistently at less than 80% of what your 5 workers is expected to handle.

With the changes made, it will decrease the workers by one on the first downscaling_delay check, completely bypassing the scaling factor. As the warning previously mentioned, the scaling factor would only cause issues less than ~0.6. These changes are applied to all ranges of scaling factors, even those above 0.6. So the new logic will always result in 1 less worker than before the change. With 100 workers, it's marginal, but at something like 5 workers, this is a 20% error. This change fundamentally breaks the downscaling_factor for clusters with a low number of replicas. Even when the scaling_factor is greater than 0.6.

or downscale_delay_s (if you want to decrease the frequency of downscale decisions to prevent oscillation)

In this case, downscale_delay is not a solution. As per the logic here, the decision to downscale is only enforced if it has occured consistently. However, the decision_num_replicas is only calculated from the last decision. For a deployment with spikes in traffic, the downscale_delay therefore only results in huge delayed oscillations. This is exactly why the downscaling factor has been so useful. Alternatively, you could also store an average or min decision_num_replicas over the downscale_delay_s, but that is beyond the scope of this issue.

Although this was a side effect of low scaling factors before, I think this is not the original intended purpose.

I understand this was not the intended purpose, but it is a natural result of applying a scaler to the decision. If you would like to remove this behaviour, something as I suggested above does so without breaking scaling at low replica counts.

I notice this issues was created internally, so perhaps it was causing issues with your own deployments?

from ray.

kyle-v6x avatar kyle-v6x commented on September 28, 2024

One more note:

Since the new changes only apply a maximum of -1 replica per scaling decision, I can try tune a larger downscale_delay_s to handle lower replica scaling, but this would come at the cost of more agile downscaling during peak time, and seems like a workaround rather than a solution.

from ray.

kyle-v6x avatar kyle-v6x commented on September 28, 2024

Although I still think this change should be reverted, the feature suggestion here would provide a valid solution/workaround as well.

from ray.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.