Code Monkey home page Code Monkey logo

Comments (12)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

The more I dig the more confused I get. munge mixes pthreads and getgrent calls, why was this not migrated to the reentrant versions of the calls?

Original comment by ericew on 28 Oct 2010 at 2:34

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

Without patch:

-bash-3.2$ id
uid=18070(ew2193) gid=972(faculty) 
groups=972(faculty),1000(shared),1015(strnmr),1016(mullin),1024(lcg),1067(ritsta
ff),1106(evalcon),1131(bsmlab),1223(pvc)
-bash-3.2$ munge -n -g mullin > /tmp/bug_test
-bash-3.2$ unmunge < /tmp/bug_test
unmunge: Error: Unauthorized credential for client uid=18070 gid=972
-bash-3.2$ newgrp mullin
bash-3.2$ unmunge < /tmp/bug_test
STATUS:           Success (0)
ENCODE_HOST:      hermes.rit.albany.edu (169.226.65.36)
ENCODE_TIME:      2010-10-28 15:04:44 (1288292684)
DECODE_TIME:      2010-10-28 15:05:16 (1288292716)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha1 (3)
ZIP:              none (0)
UID:              ew2193 (18070)
GID:              faculty (972)
GID_RESTRICTION:  mullin (1016)
LENGTH:           0

With patch:

-bash-3.2$ id
uid=18070(ew2193) gid=972(faculty) 
groups=972(faculty),1015(strnmr),1016(mullin),1024(lcg),1067(ritstaff),1106(eval
con),1131(bsmlab),1223(pvc)
-bash-3.2$ munge -n -g mullin > /tmp/bug_test
-bash-3.2$ unmunge < /tmp/bug_test
STATUS:           Success (0)
ENCODE_HOST:      hermes-04.rit.albany.edu (169.226.65.69)
ENCODE_TIME:      2010-10-28 15:06:16 (1288292776)
DECODE_TIME:      2010-10-28 15:06:23 (1288292783)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha1 (3)
ZIP:              none (0)
UID:              ew2193 (18070)
GID:              faculty (972)
GID_RESTRICTION:  mullin (1016)
LENGTH:           0

Right now it's a big patch to gids.c, but really it should be cleaned up some since there are essentially two different ways to fix the problem. The easier one removes all the hashing code and uses getgrouplist which is nscd/ldap friendly and works. The other is a fix to the existing group scanning code to use reentrant (yes, it makes a difference on some systems). Really if getgrouplist is on the system then we should jettison a lot of code and drop some command line arguments.

Original comment by ericew on 28 Oct 2010 at 7:11

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

As a work-around, the gids code path can be disabled by running munged with --group-update-time=-1. The only functionality you'll lose is restricting credentials by supplementary GIDs.

So far, I've been unable to reproduce this behavior here under RHEL 5.5. One difference between our systems is that we're not running LDAP.

The same UID can be mapped to different user names. User names and group names are treated as case-sensitive strings like they are with the standard password & group routines. I don't understand what the problem is with this behavior. I'm not very familiar with LDAP.

Re: comment 1, getgrent_r() and getpwnam_r() were not used since they are only called in/from _gids_hash_create(), and only one instance of that routine can be running at a time within munged. As such, the static buffers in getgrent() and getpwnam() didn't seem to be an issue.

Re: comment 2, getgrouplist() is non-standard. It's not available on AIX or Solaris, so an alternate method would still be required.

Also, the comment 2 "without patch" excerpt is curious. I don't see why newgrp was needed there unless you had disabled the gids hash (e.g., via --group-update-time=-1). If you build or configure with CFLAGS='-DGIDS_DEBUG', the gids hash will be dumped to stdout every time it is rebuilt. Of course, this is only useful if you're running munged in the foreground (-F).

I had a similar bug report a while back with 0.5.8 under RHEL 5.2 (glibc-2.5-25) where munged would segfault in getpwnam(). That user was also running LDAP. I created a small reproducer (gids-test.c attached) that mostly mimicked the calls in gids.c. He confirmed the reproducer worked (i.e., segfaulted). Note that it is not threaded. Try it and let me know whether it works for you.

I'm still tracing through gids.c, but nothing major has jumped out at me yet.

gids-test.c.gz

Original comment by chris.m.dunlap on 29 Oct 2010 at 3:55

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

Results from gids-test was identical to munge. The test end in a

*ERROR* cannot query group info: Permission denied

Which mirrors the error I was getting

2010-10-27 20:21:21 Error:     Unable to query group info: Permission denied

As I have worked on this there are a number of factors that have come up.

  1. getgrent and getgrent_r are two different code paths in glibc. Altering gids-test to be re-entrant causes the error to go away. There also appears to be some hostility in the glibc community against using the getgrent for anything more than legacy code. (below is diff to gids-test.c).

    --- gids-test.c 2010-10-29 07:53:54.000000000 -0400
    +++ gids-test-r.c       2010-10-29 08:13:16.000000000 -0400
    @@ -6,6 +6,8 @@
    #include <string.h>
    #include <sys/types.h>
    
    +#define BUFLEN 4096
    +
    int
    main (int argc, char *argv[])
    {
    @@ -17,21 +19,21 @@
     uid_t           uid;
     gid_t           gid_tmp;
    
    +    struct group grp;
    +    char buf[BUFLEN];
    +    int i;
    +
     setgrent ();
    
     while (1) {
    -        errno = 0;
    -        gr_ptr = getgrent ();
    +       i = getgrent_r(&grp, buf, BUFLEN, &gr_ptr);
    +
    +       if (i == ERANGE) {
    +               fprintf(stderr, "*ERROR* ERANGE Insufficient buffer space supplied. Try again with larger buffer.\n");
    +       }
    +       if (i)
    +               break;
    
    -        if (gr_ptr == NULL) {
    -            if ((errno == 0) || (errno == ENOENT))
    -                break;
    -            if (errno == EINTR)
    -                continue;
    -            fprintf (stderr, "*ERROR* cannot query group info: %s",
    -                    strerror (errno));
    -            exit (EXIT_FAILURE);
    -        }
         gid = gr_ptr->gr_gid;
         printf ("  group [%s] (gid=%u)\n",
                 (gr_ptr->gr_name ? gr_ptr->gr_name : "*NULL*"), gid);

    http://cygwin.ru/ml/libc-alpha/2006-08/msg00029.html

  2. The error does not crash munge right away, but causes some lingering problem that crashes the daemon on the next refresh of the group information. If the first run fails we should either be bailing out completely with some sane error message or we should be turning off refreshes with a note to the logs. Silently failing on the next refresh was infuriating to track down. I see at least one other case in the slurm mailing lists where this appears to be the case.

  3. Even if this bug is fixed the supplemental gid information will still be incomplete and inaccurate for some systems. On many enterprise systems including some LDAP, NIS, and AD style authentication a non-root user does not have access to a dump of the passwd or group databases. The way around this on linux and bsd systems with nscd is to use the glibc getgrouplist. I understand that this is not a standard call which is why my patch also modifies the autoconf to test for its presence before enabling that feature. When found it jettisons 100% of the code for gid hashing and replaces the supplemental group test function with a call to getgrouplist.

I plan on cleaning up and submitting 3 patches. I hope to have them by the end of the day barring any interruptions.

  1. Changes gids.c to re-entrant code. This should be pretty small and benign.
  2. A patch for getgrouplist support.
  3. A patch against 0.5.9 configure.in and configure so that people don't have to rebuild it themselves.

Original comment by ericew on 29 Oct 2010 at 1:04

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

Thanks for running this down, especially since I've been unable to reproduce it here. I'm somewhat surprised getgrent() was the culprit under the circumstances, but gids-test.c & gids-test-r.c seem to confirm it. Fortunately, it's a fairly straightforward correction to make.

Re: 2), this matches the comment 2 "without patch" curious behavior and explains why newgrp was needed there. In gids.c, return values are checked and errors are logged. If an error is encountered while building the next map, the previous map is kept and updates continue as scheduled since the error is likely transient. I'm guessing the silent crash-on-next-update behavior was memory corruption from getgrent() that will go away with the switch to getgrent_r() & getpwnam_r(). Time will tell.

Re: 3), munged can be run as root if needed; clients still won't need root privileges. I have concerns with getgrouplist() in the credential decode path. Response time and availability of a network service are two issues that immediately come to mind. Then there's the issue of performance since you also need getpwuid() to convert the uid within the credential into a username for getgrouplist(). I'm not comfortable with adding support for getgrouplist() at this time.

Original comment by chris.m.dunlap on 31 Oct 2010 at 8:53

  • Changed title: munged crashes when processing supplementary group info under glibc
  • Changed state: Accepted

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024
  1. What about a lazy caching scheme? getgrouplist and getpwuid results saved for a time using the hashes. Timecode would determine if getgrouplist needs to be refreshed. Possibly merging the two ways to get group info where available and offering it as a command line option or even having a stub, like the gids-test, that can provide some feedback to the user to see which method might be more appropriate.

The lazy caching scheme would add a field to the gid hash and otherwise be low impact. The getgrouplist implementation can be in parallel code not enabled by default while it "bakes". Maybe I can also add a one-off check after the decode path to confirm that getgrent_r and getgrouplist agree and throw a warning into the logs if not.

I'll rework my patch to separate the code paths and not jettison the original code. That will allow the patch back in without any compromises at this point. Users can enable it with a flag if it's appropriate for their environment.

Original comment by ericew on 1 Nov 2010 at 1:03

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

This issue was updated by svn:r854 / 842ec31.

Created issue-2 branch.

Original comment by chris.m.dunlap on 3 Nov 2010 at 8:19

  • Changed state: Started

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

getgrent -> getgrent_r patch. It's pretty minimal. I have been swamped, but I do plan on submitting a getgrouplist branch that can be enabled via command line switch.

munge-issue2.patch.gz

Original comment by ericew on 3 Nov 2010 at 9:48

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

Thanks for confirming that getgrent_r() here fixes the problem under glibc.

I can't simply replace getgrent() with getgrent_r() -- some platforms don't have it, and others use different prototypes. It's somewhat of a pain to deal with, which is one reason why the non-reentrant routines were used in the first place. Ditto for getpwnam(). But I'll deal with it.

As for getgrouplist(), I won't allow that to be called from a worker thread. The reason the gids map is built in a separate thread is to prevent it from adversely impacting credential processing time.

Original comment by chris.m.dunlap on 4 Nov 2010 at 3:16

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

Out of curiosity, does running gids-test under valgrind provide any more information as to where the underlying problem lies?

Original comment by chris.m.dunlap on 4 Nov 2010 at 3:19

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

This issue was closed by svn:r889 / dfe5ea5.

Original comment by chris.m.dunlap on 20 Jan 2011 at 1:42

  • Changed state: Fixed

from munge.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 22, 2024

Original comment by chris.m.dunlap on 4 Feb 2011 at 3:31

  • Added labels: Milestone-0.5.10

from munge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.