Code Monkey home page Code Monkey logo

Comments (30)

rhc54 avatar rhc54 commented on June 26, 2024

Don't look at me - I haven't looked at the ompi schizo component in a very long time. Command line parsing starts with your parse_cli function.

from prrte.

wckzhang avatar wckzhang commented on June 26, 2024

It looks like it's because the options becomes --pmixmca mca_garbage_param 1
It looks like the ompi_schizo component does not look at this prefix (pmixmca), I can add another loop for it

from prrte.

wckzhang avatar wckzhang commented on June 26, 2024

So it turns out that --mca mca_base_param_files is not ignored, it was just that pmixmca prefixes were all ignored. The relevant code is all in place

from prrte.

wckzhang avatar wckzhang commented on June 26, 2024

Actually let me ask a clarifying question: Should the parameter be prefixed with pmixmca and should this be read by the ompischizo component? (This prefixing seems to happen somewhere before parse_cli is called)

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

You have a problem. PRRTE and PMIx share a common MCA "base", and so any generic MCA param on the cmd line gets interpreted as belonging to PMIx. Those get converted at the very beginning of prte as we have to process the PMIx and PRRTE params prior to calling their init functions.

Problem you have is that this means all generic MCA params that begin with mca are going to be converted to PMIx. Only solution I can see is to mandate that PMIx MCA params that belong to the MCA "base" must be prefixed with pmix or prte on the cmd line. I'll have to think about that as this feels more like an OMPI problem for failing to prefix their base params, but I suspect your side of the house will strenuously object to having to change.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

Tentative proposal: we reject all generic --mca params. All params must be prefixed by their intended project. It's the only solution I can come up with that doesn't unfairly penalize one project over another.

Alternative option: apply that only to params that start with mca_ as they are ambiguous.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

@jsquyres @bwbarrett Any thoughts? You two generally care the most about such things.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

Alternative option: we reject all generic params that start with mca_ as they are ambiguous.

In the absence of any feedback, I'm going to implement this option. Please note that this applies to more than just the mca_base_param_files option.

from prrte.

naughtont3 avatar naughtont3 commented on June 26, 2024

@rhc54 There was discussion on this point during weekly ompi-dev call, but several key people were out due to travel/conflicts. So not sure if there was any consensus. I'll defer to @wckzhang & others , but just wanted to chime in just in case.

from prrte.

wckzhang avatar wckzhang commented on June 26, 2024

Yes, I've been trying to get @jsquyres or @bwbarrett to chime in but they've been busy. I've seen push back from George and Edgar about the change as it would break lots of user scripts. George proposed everyone just check every argument, and there was some debate about why pmix prefixed an --mca mca_ as --pmixmca, we could leave it generic and have each component check to see if they can handle it.

I didn't want to make a decision either way without more consensus from our community, but I have not forgotten about this issue.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

about why pmix prefixed an --mca mca_ as --pmixmca, we could leave it generic and have each component check to see if they can handle it.

I'm afraid that isn't possible. We only change it for the MCA base params, and you cannot have those generally apply as the various layers can't discern which layer is the intended target. It winds up generating a great deal of confusion.

I'm probably not available this coming Tues, but I could participate on the OMPI call the week after if it would help.

from prrte.

jsquyres avatar jsquyres commented on June 26, 2024

Sorry for the huge delay in replying to this; my bandwidth has been sucked up elsewhere. 😦

Hey @rhc54 Could we just add mca_base_param_files to the list of prefixes that are examined by OMPI schizo to know that this parameter belongs to OMPI? We already pass in a list of all of Open MPI's frameworks via the OMPI_MCA_PREFIXES env variable, which is then examined by

static void setup_ompi_frameworks(void)
{
if (ompi_frameworks_setup) {
return;
}
ompi_frameworks_setup = true;
char *env = getenv("OMPI_MCA_PREFIXES");
if (NULL == env) {
return;
}
// If we found the env variable, it will be a comma-delimited list
// of values. Split it into an argv-style array.
char **tmp = PMIX_ARGV_SPLIT_COMPAT(env, ',');
if (NULL != tmp) {
ompi_frameworks = tmp;
}
}
static bool check_generic(char *p1)
{
setup_ompi_frameworks();
/* See if the parameter we were passed belongs to one of the OMPI
frameworks or prefixes */
for (int j = 0; NULL != ompi_frameworks[j]; j++) {
if (0 == strncmp(p1, ompi_frameworks[j], strlen(ompi_frameworks[j]))) {
return true;
}
}
return false;
}

Having OMPI add mca_base_param_files to the comma-delimited list would effectively disambiguate that name and make it "owned" by OMPI (we can/probably should also do the same thing to mca_base_env_list, which is already special-cased -- and potentially any other mca_-prefixed OMPI variable...?).

However, I notice that by the time OMPI schizo is invoked, any --mca CLI tokens have been converted to --pmixmca, making at least the current logic ineffective

if (NULL != (opt = pmix_cmd_line_get_param(results, "mca"))) {

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

The problem is that PMIx has a duplicate of every MCA base parameter, so you cannot disambiguate them. Root problem is that the MCA base doesn't prefix its params by project, and we now have multiple projects that contain an MCA base.

We do look for frameworks and convert them - no ambiguity there as we already ensure that the params are properly prefixed with the framework/component name. It is just the MCA base that is the problem, and only when the user doesn't prefix the --mca cmd line option.

One possible solution is to say that non-prefixed --mca shall be reserved for OMPI. I can kinda wrap my head around it from the standpoint that direct PRRTE users are outnumbered by pre-existing OMPI users (I don't worry about PMIx users as we are talking about prterun, a PRRTE tool here). Just concerned that it creates more confusion as you can use a non-prefixed cmd line option for PMIx/PRRTE frameworks/components - but now we say you can't for their MCA base.

Probably no ideal solution 🤷‍♂️

from prrte.

bosilca avatar bosilca commented on June 26, 2024

Depends on what perspective you are looking from. From a user perspective what is not ideal is to be forced to learn the internal naming scheme of a software I am using. I am passing an argument to OMPI, and everything behind shall obey it. If PRRTE wants to have their own parameter naming other than --mca their problem, but when I provide a file with the MCA parameters it shall be read and used by the entire software stack.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

So this problem has two parts:

  • how to deal with --mca on the cmd line? Like I said, we have no problem with framework/component level params. The issue is what to do with the MCA base params. You have to understand how the MCA system works - i.e., cmd line options get translated into envars, and it is only envars that get processed.
  • how to process param pre-existing param files? People specified ORTE params in those too, and now they will be ignored. Probably need to pass the contents of such files thru the param converter to get them properly prefixed.

Another issue, I suppose, is that we lost the "source" tag for a param - i.e., all our params get converted to envars, and so the MCA system really doesn't know how the envar got set. I'm not sure there is a simple way to resolve that one, but it might merit some thought as it can be useful to know how a param got set.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

There is a fourth problem: we don't currently intercept and translate envar params. For example, if someone sets OMPI_MCA_oob_tcp_if_include=foo in their environment, it will currently be ignored instead of transparently translated it to PRTE_MCA_oob_tcp_if_include. It does happen for the cmd line, but not for anything placed in the environment.

from prrte.

jsquyres avatar jsquyres commented on June 26, 2024

We chatted on the OMPI webex about this today and came up with plans for this. I posted the info on the wiki: https://github.com/open-mpi/ompi/wiki/WeeklyTelcon_20230425#pmix-mca-parameter-issues

@rhc54 Let me know if my text accurately reflects what we discussed.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

Looks accurate, but incomplete - you missed the fourth problem, that we don't currently translate the environment variables. However, the fix for the param file problem will solve the envar one as well, so the text is probably fine as-is.

from prrte.

qkoziol avatar qkoziol commented on June 26, 2024

I've been reviewing the code, in order to address this issue. I see that the fix that @rhc54 merged into PMIx is exposing parameters that start with "mca_base_" and not parameters that start with just "mca_". I believe that the intention was to expose the latter (which would include the former). @jsquyres - Is that a correct reading of your notes on the wiki?

from prrte.

qkoziol avatar qkoziol commented on June 26, 2024

Looking into the tuned file aspect of it now

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

Is that a correct reading of your notes on the wiki?

Sigh - I feel like this has become the never ending treadmill of explanations. Let's try one more time.

The MCA component parameters are already dealt with - there is no ambiguity there as we know what framework they belong to, and which project includes that framework. So we can trivially expose those and already did so.

The same is not true for the "mca_base_" params as they all refer to the MCA base, and we have multiple projects with their own MCA bases. So we needed a way of resolving them, which is what I provided in the referenced fix.

The tuned file support already had that implementation (minus the "mca_base_" problem), so it didn't need changing.

from prrte.

qkoziol avatar qkoziol commented on June 26, 2024

So, just the tuned files need updating?

from prrte.

qkoziol avatar qkoziol commented on June 26, 2024

Or, is this completely finished now, and I can close the issue?

from prrte.

qkoziol avatar qkoziol commented on June 26, 2024

This is the current behavior, for command line parameters:

dev-dsk-qkoziol-2b-7eef74db % ./mpirun --np 1 -N 1 --mca mca_garbage_param 1 --mca opal_mca_garbage_param 1 --mca garbage_param 1 --mca mca_base_garbage_param 1 --hostfile ../../hostfile /bin/true
parse_cli:438 - pargv[1] = '--np'
parse_cli:438 - pargv[2] = '1'
parse_cli:438 - pargv[3] = '-N'
parse_cli:438 - pargv[4] = '1'
parse_cli:438 - pargv[5] = '--pmixmca'
parse_cli:438 - pargv[6] = 'mca_garbage_param'
parse_cli:438 - pargv[7] = '1'
parse_cli:438 - pargv[8] = '--mca'
parse_cli:438 - pargv[9] = 'opal_mca_garbage_param'
parse_cli:438 - pargv[10] = '1'
parse_cli:438 - pargv[11] = '--mca'
parse_cli:438 - pargv[12] = 'garbage_param'
parse_cli:438 - pargv[13] = '1'
parse_cli:438 - pargv[14] = '--mca'
parse_cli:438 - pargv[15] = 'mca_base_garbage_param'
parse_cli:438 - pargv[16] = '1'
parse_cli:438 - pargv[17] = '--hostfile'
parse_cli:438 - pargv[18] = '../../hostfile'
parse_cli:438 - pargv[19] = '/bin/true'
parse_cli:438 - pargv[1] = '--np'
parse_cli:438 - pargv[2] = '1'
parse_cli:438 - pargv[3] = '-N'
parse_cli:438 - pargv[4] = '1'
parse_cli:438 - pargv[5] = '--pmixmca'
parse_cli:438 - pargv[6] = 'mca_garbage_param'
parse_cli:438 - pargv[7] = '1'
parse_cli:438 - pargv[8] = '--mca'
parse_cli:438 - pargv[9] = 'opal_mca_garbage_param'
parse_cli:438 - pargv[10] = '1'
parse_cli:438 - pargv[11] = '--mca'
parse_cli:438 - pargv[12] = 'garbage_param'
parse_cli:438 - pargv[13] = '1'
parse_cli:438 - pargv[14] = '--mca'
parse_cli:438 - pargv[15] = 'mca_base_garbage_param'
parse_cli:438 - pargv[16] = '1'
parse_cli:438 - pargv[17] = '--hostfile'
parse_cli:438 - pargv[18] = '../../hostfile'
parse_cli:438 - pargv[19] = '/bin/true'

from prrte.

qkoziol avatar qkoziol commented on June 26, 2024

The first MCA parameter ("mca_garbage_param") is getting the "pmixmca" component prefix, but the others are not. Is this correct behavior?

from prrte.

qkoziol avatar qkoziol commented on June 26, 2024

Also, is there a document that describes the overall (intended) behavior?

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

What @jsquyres wrote was an accurate description of what they wanted to do - much of it was already implemented. The referenced PR addressed the remaining pieces. The only piece not yet addressed is what to do with envars set by the user prior to invoking an executable.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

@qkoziol What the blazes are you talking about? Are you quoting the current behavior on OMPI v5??? If so, try updating the submodule pointers before generating outdated reports.

from prrte.

qkoziol avatar qkoziol commented on June 26, 2024

@qkoziol What the blazes are you talking about? Are you quoting the current behavior on OMPI v5??? If so, try updating the submodule pointers before generating outdated reports.

That output is from a fresh checkout.

from prrte.

rhc54 avatar rhc54 commented on June 26, 2024

That output is from a fresh checkout.

OF WHAT BRANCH?

Truly, I'm going to close this issue as it has passed beyond being useful and is just an irritation.

from prrte.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.