Comments (9)
can reproduce - with a slow rt env agent + task pressure
from ray.
from ray.
@rynewang Hi, is it convenient to say from which Ray version this problem started, We recently wanted to update the Ray version used on our system (currently using 2.9.0), but found some new problems with Ray 2.20.0 as feedback in the github issue (autoscaler instability, RuntimeEnvSetupError, etc.). So I'm looking for a slightly more stable version to upgrade, Thank you!
from ray.
@yx367563 could you elaborate more on the issues you are facing. Have you created github issues for them?
from ray.
P0 label reflects it already -- but the issue is quite severe; it causes intermittent failures of any production job in which workers access a runtime environment.
from ray.
Nice!
"task pressure" certainly describes our use-case
(we schedule many tasks per workload :) )
from ray.
@yx367563 could you elaborate more on the issues you are facing. Have you created github issues for them?
@jjyao @rynewang The problem faced is in this issue (#45311). I mean, can you know which Ray version this problem was introduced from? I will temporarily avoid upgrading to this version of Ray.
from ray.
@yx367563 we don't have an accurate "culprit" commit. You can stay at your current version until this issue is resolved (and to the next version), or do some stress testing with your workload to confirm.
from ray.
PR under review
from ray.
Related Issues (20)
- Ray Dashboard is susceptible to a Local File Inclusion bug with default settings HOT 6
- [Core] Show per task/actor GPU usage metric HOT 1
- Release test agent_stress_test.aws failed HOT 3
- [Core] `ray stop` does not clean up `ray_current_cluster` file HOT 2
- [Tune] lightGBM callback cannot write locally during cluster run HOT 1
- [Core] Ray Worker stuck in launching state - Azure AKS HOT 3
- [Serve] Provide backpressure on handle metrics push HOT 1
- [Serve] Amortize handle metrics pushing by grouping metrics by process
- [Serve] [Core] Terminating a Serve deployment that sets `object_store_memory` logs native errors HOT 1
- [Serve] Optimize the get_deployment_statuses function HOT 1
- [Serve] Optimize the _get_live_deployments function
- [Train] Broken Dependencies Causing SegFault in Import HOT 2
- [Serve] Optimize the _get_live_deployments function HOT 1
- Ray tune on Mac M2/M1 never stop
- RayTrain: error in local ranks calculated for every worker HOT 1
- [Serve] Deployment called from actor not autoscaling
- [Data] files written by ray-data cannot be read back
- [RLlib] New API Stack does not support RLlib's own VectorEnv HOT 1
- [RLib] AssertionError using Simplex with default concentration
- C++ Python Cross-language invocation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ray.