Comments (14)
cc @jerqi
from incubator-uniffle.
@smallzhongfeng What do you think?
from incubator-uniffle.
I think this is a parameter setting. If the retry interval
* retry times
is greater than the heartbeat
time passed by the server to the coordinator, the check of AccessCandidatesChecker
is meaningful. If theretry interval
* retry times
is less than the heartbeat
time of the server, Can be solved by setting the number of retries to 0 ? @zuston
from incubator-uniffle.
Firstly i think your PR is meaningful for cluster load access checker.
But in other access checker, sometimes we needn’t retry. So I will introduce a special acess result of NON_TRANSIENT_ACCESS_DENIED
to avoid retry in some checkers.
For example
If we enable two checkers
- when cluster-loader check deny but candidates checker pass, so we need to retry
- when cluster-loader check pass but candidates check deny, retry isn’t needed
- when two checks deny, retry isn’t needed
from incubator-uniffle.
I got your point. If you raise a pr, I'm glad to review.@zuston WDYT? @jerqi
from incubator-uniffle.
A little complex, are there similar mechanisms in the other systems? In my opinion, we shouldn't retry when we use candidate checker. when we only use health checker, we can retry, we can scale out our RSS at the same time. I doubt whether we need this mechanism?
from incubator-uniffle.
If not having this mechanism, how to handle the multiple checkers retry?
In our internal env, we will use the multiple checkers, including health checker(need to retry) and customize checker(no need to retry).
from incubator-uniffle.
If not having this mechanism, how to handle the multiple checkers retry?
In our internal env, we will use the multiple checkers, including health checker(need to retry) and customize checker(no need to retry).
You can choose not to retry.
from incubator-uniffle.
If not having this mechanism, how to handle the multiple checkers retry?
In our internal env, we will use the multiple checkers, including health checker(need to retry) and customize checker(no need to retry).You can choose not to retry.
No retry is OK. But this will not solve the problem described in the issue #127
from incubator-uniffle.
Maybe you could choose not to retry when you use multiple checker, because in your description, the scenario of multiple checkers seems to be more dependent on the results of the candidates checker, in this way, it is not enough meaningful to retry, but the default checker is only AccessClusterloadChecker
, and the issue #127 I proposed is mainly adapted to this checker, when you use this checker, you can choose to retry.
from incubator-uniffle.
But when having multiple checkers, and the apps are in candidates list, for these apps, it need retry.
from incubator-uniffle.
So do we need this feature ? cc @jerqi
from incubator-uniffle.
So do we need this feature ? cc @jerqi
I don't think we need so complex retry mechanism ...
from incubator-uniffle.
OK. Close it.
from incubator-uniffle.
Related Issues (20)
- [Bug] Assertions will not take effect during production runtime
- [Improvement] Operator should support K8S 1.24 HOT 4
- [Flaky Test] Tests may fail on different machines
- [Bug] Incorrect disk size for local storage HOT 1
- [Bug] When a application is expired in one of shuffle servers assigned, all application data on HDFS will be deleted HOT 3
- [Improvement] Introduce the local_storage_is_writable metric HOT 3
- [Improvement] use the disk size obtained from periodic check to determine whether the disk can be written
- [FEATURE] Support pending tasks number metrics for Netty EventLoopGroup
- [FEATURE] Show read_used_buffer_size in DashBoard HOT 1
- [Bug] Asynchronous verification causes invalid resending of data blocks. HOT 3
- [Flaky Test] Tests fail because of VM crash HOT 2
- [Improvement] Upgrade from commons-collections:commons-collections:3.2.2 to org.apache.commons:commons-collections:4.4
- [Improvement] Bump Netty from 4.1.106.Final to 4.1.109.Final
- [Improvement] Bump gRPC from 1.61.1 to 1.63.0
- [Improvement] Upgrade Jetty to the latest stable version
- [Improvement] Upgrade the default NodeJS and npm versions of dashboard.
- [FEATURE] support use skip list to store shuffleBuffer in memory HOT 2
- [FEATURE] Introduce pluggable clientConf access in coordinator when clients fetch client conf
- [FEATURE] Refactor reconfigurable conf framework and apply to shuffleServer module
- [Improvement] Log message should indicate RPC error during after close / shutdown
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from incubator-uniffle.