I figured this could serve as a place to document our discussion on the strategy for c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

fairshare: discussions on strategy, implementation about flux-accounting HOT 18 CLOSED

flux-framework commented on August 11, 2024 1

fairshare: discussions on strategy, implementation

from flux-accounting.

Comments (18)

grondo commented on August 11, 2024 3

Let's make sure we all have the same big picture. Here's the set of building blocks I have in mind, though I may not have all the information so this is just a starting point for a discussion:

jobinfo db (flux-core): stores data for inactive jobs so they can be purged from memory. Enables out-of-band sql queries against completed job information, etc.
utilization reports (external): should be able to query jobinfo db directly
bank/accounting db (flux-accounting): stores users, banks, accounts, and "associations", uses jobinfo db to do necessary updates of current user/bank usage.
priority plugin (flux-core): plugin in job-manager used to adjust or supplement the primary job priority. A plugin may be a worker or set of workers similar to implementation of job-ingest validator.
multi-factor priority plugin (flux-accounting): a job-manager priority plugin/script which calculates a multi-factor priority for jobs including fairshare priority

The flow of data for jobs might look like:

inactive jobs are sucked into the jobinfo db, optionally purged from memory
utilization reports are generated directly from this database when required
accounting information is generated/derived from the jobinfo db and fed into the accounting/fairshare db on a periodic interval
accounting/fairshare db is used to fetch or push fair-tree factor into multi-factor priority plugin/script
priority plugin in job-manager runs multi-factor priority calculation on each job, possibly using worker script, like validator

This design can have 3 work streams going in parallel

accounting/fairshare db and multi-factor priority script (flux-accounting)
jobinfo db (flux-core) -- we'll need this anyway for system instance
job-manager priority plugin (flux-core)

Each of these can go in parallel once the interfaces have been agreed upon. Interfaces include:

job-manager priority plugin: How does script-based priority worker get jobspec, userid, t_submit etc (JSON on stdin?)
jobinfo db: gather requirements from flux-accounting for query interface

from flux-accounting.

grondo commented on August 11, 2024 2

FYI - as a comparison, here is the list of factors Slurm uses in its multi-factor plugin:

https://slurm.schedmd.com/priority_multifactor.html#mfjppintro

Edit: note especially that fairshare is just one factor in a multi-factor priority calculation

from flux-accounting.

grondo commented on August 11, 2024 2

Can this be its own sub-project when some of the data source that it requires would come from flux-core? For example, queue time?

As part of job-manager priority plugin development we would design an interface that would allow all known information to be shared, e.g. t_submit (queue time), primary priority, etc.

from flux-accounting.

cmoussa1 commented on August 11, 2024 1

Here's a summary about what we talked about. If I missed anything/incorrectly summarized something, feel free to correct me:

Instead of defining partitions in its own tables (where limits would be defined in a second location, as they are also defined in a cluster_association_table), @SteVwonder had a good idea where we could instead provide a label that's used when users are submitting jobs in order to associate it the max amount of resources it can utilize. example: a debug label would limit a user to 30 minutes and can only half the nodes available on a cluster.

It's necessary to analyze where our gaps are in terms of tracking factors for a multi-factor priority for jobs. I plan on doing this over the next couple of days, eventually posting a table containing all of the factors and where we would include it in our software architecture. This would help us narrow down the large scope that is user/job priority 😅.

from flux-accounting.

SteVwonder commented on August 11, 2024 1

Originally, I was under the impression that fairshare values were calculated by passing in a user id, fetching its association id from the accounting database, and performing a Level Fairshare calculation based on the user's association information and current jobs in the queue. Essentially, I had thought that fairshare calculations would be constantly querying information from the accounting database in order to generate a priority value.

FWIW, I think it is totally reasonable to start with this implementation as a proof-of-concept. Once you have a working version of this, you could then refactor for performance to cache certain historical values in memory, etc.

from flux-accounting.

grondo commented on August 11, 2024 1

If we decide to go with a unified database within flux-core, do we expect the user and account tables can be tracked there? Seems a bit monolithic...

No I think the flux-core job-info db could be used to store job accounting information, then the flux-accounting project would house the user/account hierarchy, and would query the job accounting db to update user banks, calculate historical usage to get fair-share priority, etc

from flux-accounting.

dongahn commented on August 11, 2024

@grondo: Thank you for starting up the big picture architecture discussion! We definitely need this to push forward this discussion. I have a few questions to make sure we are looking at the same page.

We still haven't decided whether multi-factor priority plugin will sort jobs at the job-manager level or the external scheduler (e.g., flux-sched) level. While my preference is to do this at the job-manager level, we have to ensure this will not lead to "ALLOC" thrashing problem. Let me open up a ticket and reason about whether the "ALLOC" thrashing will be a real issue or not.
It is not immediately clear to me if flux-accounting can provide a multi-factor priority plugin in its entirety. It will only have a subset of data needed for multi-factor priority calculation. I notice you mentioned "a job-manager priority plugin/script". So perhaps flux-accounting can provide a python command that will output some factors needed for multi-factor priority plugin and the plugin itself will be implemented at the level (decided from the further discussion from point 1 above)?

BTW, I love your ways to have the notion of parallel work stream. We really need this to be effective for this item.

from flux-accounting.

grondo commented on August 11, 2024

While my preference is to do this at the job-manager level, we have to ensure this will not lead to "ALLOC" thrashing problem. Let me open up a ticket and reason about whether the "ALLOC" thrashing will be a real issue or not.

Yeah, you are right. My thought is that we need to get started somewhere, and this choice has the benefit of dividing up the work even further, which may have a big benefit.

Another benefit is that this approach would allow a user to insert a custom priority plugin at runtime for a non-system flux instance. I'm not sure what exactly you could do with that, but it seems like it would be a nice feature.

So perhaps flux-accounting can provide a python command that will output some factors needed for multi-factor priority plugin and the plugin itself will be implemented at the level (decided from the further discussion from point 1 above)?

That might be a good approach, though I think eventually maybe the advanced multi-factor priority plugin could either be its own sub-project or just included with flux-accounting...

from flux-accounting.

chu11 commented on August 11, 2024

jobinfo db (flux-core) -- we'll need this anyway for system instance

Had a side discussion with @grondo, in the past it was assumed that there would be two job history databases, a "core" one and a "sched" one, mostly so that we could work in parallel and not have development hindered on either path. Then we could "merge together" if necessary.

@grondo's feeling is that in order to save time, we should nix that, upping the "job-info" job history DB to a higher priority.

from flux-accounting.

dongahn commented on August 11, 2024

That might be a good approach, though I think eventually maybe the advanced multi-factor priority plugin could either be its own sub-project or just included with flux-accounting...

Can this be its own sub-project when some of the data source that it requires would come from flux-core? For example, queue time?

from flux-accounting.

dongahn commented on August 11, 2024

Then we could "merge together" if necessary.

If we decide to go with a unified database within flux-core, do we expect the user and account tables can be tracked there? Seems a bit monolithic...

from flux-accounting.

dongahn commented on August 11, 2024

Yeah, you are right. My thought is that we need to get started somewhere, and this choice has the benefit of dividing up the work even further, which may have a big benefit.

Another benefit is that this approach would allow a user to insert a custom priority plugin at runtime for a non-system flux instance. I'm not sure what exactly you could do with that, but it seems like it would be a nice feature.

Like I said I certainly do hope that our reasoning on ALLOC trashing problem can lead us to this architecture.

from flux-accounting.

chu11 commented on August 11, 2024

No I think the flux-core job-info db could be used to store job accounting information, then the flux-accounting project would house the user/account hierarchy, and would query the job accounting db to update user banks, calculate historical usage to get fair-share priority, etc

Agreed. The job-info module's database is effectively storing job history for its own purposes. Anyone else that wants to read from it can do so at its own discretion.

But of course if the internal database changes, any scripts / fair share calculations, etc. would have to adjust. This is the risk of having just 1 job history db.

from flux-accounting.

cmoussa1 commented on August 11, 2024

But of course if the internal database changes, any scripts / fair share calculations, etc. would have to adjust. This is the risk of having just 1 job history db.

This is a good point. But as long as the core information needed for fair share calculation remains attainable, even if the interface to get the data changes, I think it should be okay.

from flux-accounting.

dongahn commented on August 11, 2024

But of course if the internal database changes, any scripts / fair share calculations, etc. would have to adjust. This is the risk of having just 1 job history db.

This is a good point. But as long as the core information needed for fair share calculation remains attainable, even if the interface to get the data changes, I think it should be okay.

Does this call for an RFC for job history database schema, then?

from flux-accounting.

chu11 commented on August 11, 2024

Does this call for an RFC for job history database schema, then?

Maybe ... after the coffee time talk a few questions came up. I'm putting together a discussion in flux-core.

from flux-accounting.

dongahn commented on August 11, 2024

Sorry I couldn't join. Stuck in creating a writeup.

from flux-accounting.

cmoussa1 commented on August 11, 2024

I think we have pretty much settled on the design/implementation for calculating fairshare values now (a combination of using the weighted tree library introduced in #65 and fetching and calculating job usage values from the job-archive DB from #79), so I can close this issue. Don't mind re-opening if others feel otherwise.

from flux-accounting.

fairshare: discussions on strategy, implementation about flux-accounting HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent