Code Monkey home page Code Monkey logo

Comments (6)

dun avatar dun commented on August 22, 2024

If people are allowed to have root on these external machines, and these machines are part of the same MUNGE realm as the Slurm cluster (i.e., they share the same munge.key), then there's nothing MUNGE can do to prevent them from impersonating any user on the cluster. Since they have root access on the external machines, they could undo any restriction added by munged when it encodes the credential. If they have access to the cluster's munge.key, they can create any valid MUNGE credential they choose for use within the Slurm cluster; they could then use this forged credential within the cluster and Slurm would be unable to tell that it originated from an external node.

I don't know if Slurm supports any form of an ID remapping as you propose. I recommend you post on the slurm-dev mailing list for their suggestions. If you're wanting to have MUNGE authenticate the users of these external nodes with the Slurm submit node, you'll want to use separate keys for the external nodes.

from munge.

jbd avatar jbd commented on August 22, 2024

We understood that people with root access could circumvent any restrictions and of course we don't want that. Even if IP restrictions were implemented we cannot trust the IP address within the credential payload or they could simply change their host IP address for an "authorized" one (this could be handled with network segmentation and vlan though, but that's not a solution the network team will like :D).

If you're wanting to have MUNGE authenticate the users of these external nodes
with the Slurm submit node, you'll want to use separate keys for the external nodes.

Exactly. Are you suggesting that this is something possible today with munge ?

That's why I proposed something with multiple munge key and ACLs associated with them. On a pure technical basis, does this suggestion makes sense to you ? Does it fit with the actual munge design or will it require too much work ? If you're interested in implementing that, my employer could sponsor you (with money). I would need some time to formalize that on my side though =)

We've got a support contract with SchedMD and we've contacted them yesterday for suggestions. By looking quickly at the code, we lost the socket information (to check the source IP address) of job RPC submission in the munge stage (in batch mode at least), so I don't really see what is possible here. We will let the slurm experts give us an answer here ;)

EDIT: https://bugs.schedmd.com/show_bug.cgi?id=3324

Thank you for your answer !

from munge.

dun avatar dun commented on August 22, 2024

The MUNGE daemon currently only supports the use of a single key. However, you can run multiple MUNGE daemons on a node. Each daemon listens on a unique Unix domain socket and can use a distinct key. The interface was designed to facilitate the testing of a development release while the stable release was in use. But we actually have a few instances where we run multiple MUNGE daemons on a given node in production.

Support for multiple keys has been on the proverbial To-Do list for some time. In addition to multi-realm support, multi-key support would facilitate transitioning the key for a given realm (#19). This multi-key support has been the defining feature of the ever-elusive 0.6 release. But a new key format would also be the ideal time to upgrade the cryptographic algorithms and key derivation function, expand the credential format, etc. Changing the key and/or credential format is something I try to limit, so I'd prefer to have all of the breakage happen in the same release. Consequently, the 0.6 release has grown in scope while I've been working on other projects.

As for your dilemma, I think you'd ideally want to have each external node where the user has root to be in its own MUNGE realm with a distinct key. The submit node (where these external users don't have root) would either run a single multi-key munged (which doesn't exist yet and would be a substantial development effort) or multiple single-key mungeds. The submit node would also presumably need a new Slurm Authentication Plugin that could handle multiple MUNGE realms and map IDs in one realm to IDs (or a single ID) in another realm.

So this might work, at least in theory. I don't know what your time constraints are for having something that works in production. I also don't know how many of these external nodes you have. I'm just starting to work on the roadmap for 0.5.13. I'll try to read up on Slurm Authentication Plugins over the next week and see if this still makes sense as I get more into the details.

I read Dominik's response at SchedMD. I think we all agree that use of a single MUNGE key/realm won't solve your problem.

from munge.

jbd avatar jbd commented on August 22, 2024

OK, thank you for this exhaustive answer, this is very much appreciated.

It confirms that I've got a quite valid picture of the problem here. The multiple single-key munged would indeed require some work on a new slurm plugin side. It could be a reasonable workaround. We'll try to investigate this and keep you posted if we managed to have something useful. It could work as long as you don't have too much of those external nodes =) (We are talking about a dozen of those machine here).

We don't need to have something that works in production, we can certainly find a non-technical solution that won't please our users ;) The multikey feature would be something very cool to have, allowing us to distribute munge key for advanced users wanting to submit job from the machine they manage. We initially thought that it could be interesting to have feedback from you because you're really the only that could evaluate the amount needed of worked need. It's really nice that this is something already on the roadmap for the 0.6. We also understand that this will require quite some work. If you think being sponsored could you to have the multikey feature implemented sooner, we can try to find a solution.

I'll try to read up on Slurm Authentication Plugins over the next week and see if this still makes
sense as I get more into the details.

If it doesn't make sense, please tell us =)

from munge.

dun avatar dun commented on August 22, 2024

I've looked at the Slurm Authentication Plugin for MUNGE (auth_munge.c). I think support for your setup could be added with some minor changes to auth_munge.c along with a couple new keywords to the slurm.conf AuthInfo parameter. Furthermore, this proposed design doesn't require any changes to MUNGE so it would work with any existing MUNGE release. Disclaimer: I'm not overly familiar with Slurm internals.

For this setup, I'm picturing a couple of external users that have root on their local desktop, Alice and Bob. Each is running a local munged with a unique key, K(a) and K(b) respectively. The Slurm cluster has its own unique key K(s) shared between the Master, Submit, and all Compute nodes. But the Master node is running a munged process for each unique key: K(a), K(b), and K(s); each munged process here binds to a unique Unix domain socket (e.g., munge.socket.alice, munge.socket.bob, munge.socket.2).

The slurm.conf AuthInfo parameter adds two new keywords:

  1. tag=string
    The tag keyword is used for encoding credentials. It adds a label identifying the key needed to decode it.
  2. map=tag,socket,uid,gid
    The map keyword is used for decoding credentials. If a received credential contains a tag, the map entry with a matching tag will be used to specify the socket of the corresponding munged process. If the credential is valid and a >=0 uid and/or gid value is specified in the map entry, then the mapped uid and/or gid value will be substituted upon successful credential decoding; if the map uid and/or gid value is <0, then no substitution will be performed and the credential's values will be used. If no matching tag is found, the default socket (specified by the existing socket keyword) will be used.

The auth_munge.c plugin makes the following changes:

  1. The _slurm_auth_credential struct adds a char *tag to store the tag string. It also adds a uid_t map_uid and a gid_t map_gid.
  2. slurm_auth_create() will check opts for a tag string. If found, it will set cred->tag to a copy of this string.
  3. slurm_auth_verify() checks for c->tag; if found, it queries opts for a matching map entry to obtain the corresponding socket, uid, and gid. The socket value is assigned to socket. The uid value is assigned to cred->map_uid. The gid value is assigned to cred->map_gid.
  4. slurm_auth_get_uid() will return cred->map_uid (if set) for a valid credential.
  5. slurm_auth_get_gid() will return cred->map_gid (if set) for a valid credential.
  6. slurm_auth_pack() will pack the cred->tag string, or the empty string if (cred->tag == NULL).
  7. slurm_auth_unpack() will unpack the cred->tag string. If it is the empty string, it will set cred->tag to NULL.

Continuing my example setup above, Alice would have an AuthInfo section with tag=alice, and Bob would have an AuthInfo section with tag=bob. The Master node would have an AuthInfo section with map=alice,/var/run/munge/munge.socket.alice,1001,100 and map=bob,/var/run/munge/munge.socket.bob,1002,100. I didn't look too closely at what would be needed to add support for multiple map entries or what the maximum string length is, but I'm sure those are surmountable. This is just a proposed configuration syntax.

If Alice tried to impersonate Cindy (who has more funding for CPU time) by su'ing to her account, Alice's desktop would create a credential with Cindy's UID (e.g., 1003) using K(a). This credential would be sent to the Master node along with a tag=alice label. On Master, the auth_munge plugin would extract the alice tag, lookup the map entry associated with this tag, obtain the /var/run/munge/munge.socket.alice socket, send the credential to munged over the munge.socket.alice Unix domain socket for decoding, receive a response indicating a valid credential with UID=1003, and map that back to UID=1001 thereby identifying Alice. Note that if Alice was to change her slurm.conf to specify tag=cindy, the credential would fail to decode on the Master node since Alice does not have Cindy's key K(c).

The primary limitation to this approach is the number of munged processes running on the Master node. A dozen processes should be no problem (aside from the initial setup). If you're not using supplementary group membership authentication, you could run munged with --group-update-time=-1 to disable the periodic processing of supplementary groups. You could also specify --num-threads=1 for the munged processes corresponding to single users since those probably won't benefit from having additional threads. My plan is to include support for systemd socket activation in 0.5.13; that could benefit you here since it would defer the start of a given munged process until its socket was accessed. If startup is quick enough (which it should be), I can also add an option to terminate the daemon once it has been idle for a given duration. That would allow these additional munged processes to only run when needed.

As I mentioned earlier, true multi-realm support is planned for the ever-elusive 0.6 release, but I have no ETA for when that might happen. If the above design doesn't meet your needs and/or you're interested in sponsoring some work on multikey, we could discuss that later.

from munge.

jbd avatar jbd commented on August 22, 2024

Hello dun,

Whow. thank you again for such an exhaustive answer. That's more that we need to wrap our head around a possible solution that will match our use case. We'll try to evaluate the impact of code modification in the auth plugin for future slurm release with the help of the schedmd team.

As I mentioned earlier, true multi-realm support is planned for the ever-elusive 0.6 release, but I
have no ETA for when that might happen. If the above design doesn't meet your needs and/or
you're interested in sponsoring some work on multikey, we could discuss that later.

i'll keep that in mind and let my managers decide on the strategy. It will depends of our local slurm experts and schedmd feedback.

I'll try to keep you posted of our experiments.

from munge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.