Comments (3)
Could you try to remove
env: - name: RAY_enable_autoscaler_v2 # Pass env var for the autoscaler v2. value: "1"
This enables the experimental autoscaler v2 which might have some bugs. Removing this will use the default autoscaler v1.
That worked, thanks. I had copied the example from the repo and missed that it was an using an alpha autoscaler. Thought the v2 was standard since it was the example.
Thanks :)
from ray.
As an Additional. Seems like if you try to launch new pods while the previous are in Terminating state, the newer ones will hang as well and never spawn.
from ray.
Could you try to remove
env:
- name: RAY_enable_autoscaler_v2 # Pass env var for the autoscaler v2.
value: "1"
This enables the experimental autoscaler v2 which might have some bugs. Removing this will use the default autoscaler v1.
from ray.
Related Issues (20)
- [core] placement group stuck in pending mode
- ```suggestion
- [data] Support multiple label columns for `to_torch`
- [Serve] Latency on Model Multiplexing
- [ray[default]==2.22.0] .write_parquet(s3_path, mode="overwrite") doesn't work properly HOT 3
- Release test single_node_oom.aws failed HOT 1
- [Ray Dashboard] show better hints when gpus are available but not used by actors
- Release test dataset_shuffle_random_shuffle_1tb.aws failed HOT 2
- [Ray component: Core] Runtime Envs should support package installation with uv
- [RLlib] Running RLlib example using Actor causes worker to die unexpectedly HOT 1
- Release test dataset_shuffle_push_based_random_shuffle_1tb.aws failed HOT 1
- Release test dataset_shuffle_sort_1tb.aws failed HOT 1
- Release test single_node_oom.aws failed HOT 2
- [Dashboard] Ray dashboard crash when usiking kill with unknown actor
- [Core|Data] read_csv(): Exception from as task of operator "ReadCSV->SplitBlocks(100)"
- [Core] Deserialization of generic pydantic models
- CI test linux://rllib:test_algorithm_export_checkpoint is consistently_failing HOT 1
- Cloudpickle can't serialize python's new generic syntax for ParamSpec HOT 1
- [data] to_tf does not properly convert list[float] to a Tensor with dtype float
- [data] Mark restarting actors are pending actors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.