Code Monkey home page Code Monkey logo

hwloc's People

Contributors

awlauria avatar bgoglin avatar cbordage avatar civodul avatar clementfoyer avatar dawid-lukwinski avatar ggouaillardet avatar grzegorz-andrejczuk avatar haampie avatar hannesweisbach avatar jjhursey avatar jpeyton52 avatar jsquyres avatar jyvet avatar mark-mb avatar michalbiesek avatar miketxli avatar ncorgan avatar nfurmento avatar ompiteam avatar pavanbalaji avatar philippemilink avatar pioy avatar pnacht avatar roblatham00 avatar scivision avatar sthibaul avatar tavisrudd avatar tkoeppe avatar xiongzubiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hwloc's Issues

convert between "L2cache"-like string and (type,depth) and actual depth within the tree

We usually cannot currently convert between a type and a depth if the type is cache or group because we ignore the corresponding depth attribute. People have to manually handle the HWLOC_TYPE_DEPTH_MULTIPLE.

Since r3255, hwloc-calc uses this internally:
int hwloc_obj_type_sscanf(const char *string, hwloc_obj_type_t *typep, unsigned *depthattrp)
It could be added to the public interface. But we need to make sure that unsigned depthattr is the only object attribute that may ever appear in the complete type name. It currently works for L2Cache and Group3. I don't know what else we could ever have.

Then, we could have something converting between (hwloc_obj_type_t type, unsigned depthattr) and an actual depth in the tree (unsigned depth). But we need good names for these. We already have get_type_depth and get_depth_type. Maybe this:

int hwloc_get_type_depthattr_depth(topology, type, depthattr, &depth);
int hwloc_get_depth_type_depthattr(topology, depth, &type, &depthattr);

Ticket #50 talks about adding instruction caches to hwloc, it would then be needed in the aforementioned function. depthattr may then become a union containing an int for groups (depth) and two ints for cache (depth + cachetype).

hwloc_distribute should handle asymetric topologies

hwloc_distribute currently assumes that all children of an object have
the same weight. There should be at least variants which take into
account the cpuset/gpuset etc.

It should also likely ignore children with empty CPU sets (happens with CPU-less NUMA nodes).

Hwloc build errors on SPARC Solaris with native compiler

Attempting a build on Solaris 10 with Sun compiler tools v 5.9, configured with:
/configure --target=sparc-sun-solaris2.10 --prefix=.../hwloc-sparc --enable-debug=no --disable-xml --disable-cairo --disable-visibility CC="cc -xc99=all" CXX="CC" CFLAGS="-m64" CXXFLAGS="-m64" LDFLAGS=""

hwloc_have_cpuid() was undefined (and referenced), as was hwloc_cpuid(). I couldn't find a configuration option that would fix this, so ended up changing include/private/cpuid.h as shown in the attachment. The config.h comes up with HWLOC_HAVE_CPUID=1.

Properly gather/support Linux Cgroup/Cpuset in remote topologies

http://www.open-mpi.org/community/lists/hwloc-devel/2010/12/1717.php

We need to:

  • gather /proc/mounts
  • gather the relevant cpuset/cgroup mount point in hwloc-gather-topology.sh (or warn if we didn't gather it)
  • make sure we properly read those in src/topology-linux.c when fsroot was changed
  • update the expected topologies of tests/linux/cpuset (might be wrong right now)
  • stop ignoring failures at the end of test-gather-topology.sh.in (or at least only ignore when Linux cpuset/cgroup are enabled)

Note that you need to mount a cpuset or cgroup/cpuset mount point to reproduce the problem.

support get_area_membind on Linux

Do get_mempolicy (with MPOL_F_ADDR) on each virtual page in the area and combine the result. This should work because get_mempolicy seems to only look at VMA mempolicy (not at current task policy) in this case.

Requested by Alfredo Buttari.

Pave the way for network support

  • the top object may not always be a system.
  • objects may not have a cpuset.
  • there is no global notion of cpuset, only relative to a tree of
    objects representing a machine.

get nbprocs on the command-line

As suggested by Samuel, we could have an easy way to get the number of processors from the command line:

shell$ lstopo --n<proc|core|socket|node|machine|system>
4

vsnprintf warnings

When using super-picky compilation warning flags, hwloc gets warnings about vsnprintf:

{{{
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c:
In function 'hwloc_snprintf':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c:37:
warning: implicit declaration of function 'vsnprintf'
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c:37:
warning: nested extern declaration of 'vsnprintf'
}}}

Here's the flags that Pavan used to generate these warnings:

{{{
libtool: compile: gcc -DHAVE_CONFIG_H -I.
-I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src
-I../include/private -I../include/hwloc
-I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include
-I../include -std=c99 -Wall -Wmissing-prototypes -Wundef -Wpointer-arith
-Wcast-align -O2 -Wall -Wextra -Wno-missing-field-initializers
-Wstrict-prototypes -Wmissing-prototypes -DGCC_WALL
-Wno-unused-parameter -Wno-unused-label -Wshadow -Wmissing-declarations
-Wno-long-long -Wfloat-equal -Wdeclaration-after-statement -Wundef
-Wno-endif-labels -Wpointer-arith -Wbad-function-cast -Wcast-align
-Wwrite-strings -Wno-sign-compare -Waggregate-return
-Wold-style-definition -Wno-multichar -Wno-deprecated-declarations
-Wpacked -Wnested-externs -Winvalid-pch -Wno-pointer-sign
-Wvariadic-macros -std=c89 -Wno-format-zero-length -Wno-type-limits
-D_POSIX_C_SOURCE=199506L -MT cpuset.lo -MD -MP -MF .deps/cpuset.Tpo -c
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c
-o cpuset.o >/dev/null 2>&1
}}}

Pavan suggested the following (on #16):

The vsnprintf warnings occur because snprintf and vsnprintf are present only in C99, not C89. There are a few solutions possible:

  1. Check in configure to (i) add a prototype for snprintf/vsnprintf where needed and (ii) add an alternative implementation for them for platforms that don't provide them.

  2. An alternative (simpler) solution is to include MPL ( https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpl) into hwloc and just use MPL_snprintf and friends everywhere.

  3. Check if snprintf/vsnprintf exist in configure and abort if they don't. Other libraries relying on hwloc can see this error and not build hwloc in those cases.

    Not sure if either approach is acceptable for you guys, so I'm leaving this ticket as closed. Please reopen if appropriate.

misc TODO

Tools

  • bind process on 2 cores "near" physical proc id 3 ?
    • hwloc-calc: add an option to request a cpuset containing of n close entries among the generated cpuset
  • internationalize the output of lstopo? object types and memory size units
  • hwloc-top, like lstopo, but keeps printing every 3s or so, and show bound threads as well as the used CPU%

Doc

  • automatically generate the pngs?
    • see doc/images/HACKING

Support

  • add info about supported instructions (sse, avx, ...)
  • add info about available execution units (fpu)
    • and say if they are shared between threads/cores
      • this could help improving the current ambiguity between two real cores, one hyperthreaded core, and AMD dual-fake-core compute units
  • reduce distance matrices so that parent objects get distances between them as well (just like we do when computing group distances after inserting groups)
  • parallelize the discovery ? :)

I/O

  • CCI interoperability to get cci_device and/or cci_device->name locality
    • use cci_device->pci.{domain,bus,dev,func} to retrieve the PCI device
    • wait for the CCI API to be stable
  • Add a ofed plugin to gather ofed device info without relying on Linux sysfs
    • Not sure whether ofed works the same on other OS anyway

Backends and Ports

  • Try to make the distance grouping code a separate component ?
  • QNX
    • _syspage_ptr() SYSPAGE_ENTRY(entry)
    • ThreadCtl/Thread_ctl_r(_NTO_TCTL_RUNMASK)
  • BSD
    • sys/sched.h: sched_bind/sched_unbind, but that's in-kernel only for now.
  • AIX
  • Cray Catamount?

Fix icc warnings

There's a truckload of warnings generated when icc 11.1.056 is used to compile hwloc. Most are in one of three types:

  • Variable/parameter is never referenced
  • Variable is set but never used
  • Mix enum with another type

The first two should probably be fixed; we may or may not care about fixing the third.

32 bit builds fail

hwloc fails to build when CFLAGS=-m32. It has shown up in nightly Open MPI test builds, but is easy to reproduce in standalone builds:

{{{
$ ./configure CFLAGS=-m32
...
$ make
...
Making all in src
make[1]: Entering directory /nfs/rinfs/san/homedirs/jsquyres/svn/hwloc/src' depbase=echo topology-x86.lo | sed 's|[^/]_$|.deps/&|;s|.lo$||';\ /bin/sh ../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../include/private -I../include/hwloc -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -std=gnu99 -fvisibility=hidden -I/usr/include/libxml2 -std=gnu99 -fvisibility=hidden -m32 -pipe -I/u/jsquyres/svn/hwloc/include -Wall -Wunused-parameter -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -MT topology-x86.lo -MD -MP -MF $depbase.Tpo -c -o topology-x86.lo topology-x86.c &&\ mv -f $depbase.Tpo $depbase.Plo libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../include/private -I../include/hwloc -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -I/u/jsquyres/svn/hwloc/include -std=gnu99 -fvisibility=hidden -I/usr/include/libxml2 -std=gnu99 -fvisibility=hidden -m32 -pipe -I/u/jsquyres/svn/hwloc/include -Wall -Wunused-parameter -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic -MT topology-x86.lo -MD -MP -MF .deps/topology-x86.Tpo -c topology-x86.c -fPIC -DPIC -o .libs/topology-x86.o /u/jsquyres/svn/hwloc/include/private/cpuid.h: In function ‘hwloc_cpuid’: /u/jsquyres/svn/hwloc/include/private/cpuid.h:54: error: can't find a register in class ‘BREG’ while reloading ‘asm’ make[1]: *_\* [topology-x86.lo] Error 1 make[1]: Leaving directory /nfs/rinfs/san/homedirs/jsquyres/svn/hwloc/src'
make: *** [all-recursive] Error 1
}}}

Any ideas?

distances

INTRO:
Some people want a fake/virtual/topological distance between random pairs of objects, from the tree point of view, not from the physical point of view. The distance between A and B is basically the depth difference between the highest of A and B and their lowest common ancestor. This is probably too simple to deserve some discussion here. At most, we'll add a new helper.

DONE:
Some people want to know the actual physical distance between objects (especially numa nodes). We already had the full matrices of distances between all pairs of objects of the same level (when given by the BIOS/OS). In the distances branch, we are now also exporting this "latency" matrix (after normalization to floats with 1.0 on the diagonal).

If we ever have the distances between a subset of objects, we store the matrix in the common ancestor instead of the root. No problem there, we can group objects and report distances the same.

Distance matrices may also be given by the user between init and load (as unsigned right now, maybe use float there too?).

TODO:
Some people may want the topological graph connecting objects, which means we have a number of hops (ou a route?) between peers instead of a latency. It could be another matrix (unsigned, 0 on the diagonal).

NUMA topology could also be exported as a series of proximity domains (like solaris's lgrps). ** TODO explain what this means **

Instruction Cache

We currently only detect Data and Unified caches. Some people want Instruction caches as well.

  • We can easily detect those and add a cache type attribute. But we we would add a new level to most exiting topologies (L1i above or below L1d).
  • We can make the detection depend on a new topology flag, so that the topology does not change much with next release
  • We can store both data and instruction sizes in the same object. Unfortunately, AMD Bulldozer has L1i and L1d with different sharing.

We'll have to take this new attribute into account in #41.

support user-defined processor restriction

Use sched_getaffinity etc. to restrict discovery to the current cpumask.

  1. Add a configuration flag to limit the discovery to the current binding of the process. Could let the user choose between using the CPU or using the memory binding, and between using the current process or the current thread binding. But those variants are not very important and they can be implemented with (2) anyway. So just keep the important variant(s?). I'd say the current thread CPU binding (HWLOC_TOPOLOGY_FLAG_RESTRICT_TO_BINDING)

(2) Add a configuration function to limit the discovery to a given cpuset. To get the current binding of the process, one has to run a first discovery, then use get_cpubind, then run a second one with the configuration. This is tedious, the API works this way.

hwloc_topology_restrict_to_cpuset(topology, cpuset);

(no need for a nodeset flavor, it won't be used often, and we have conversion functions anyway)

(3) Add a function to restrict a discovered topology to a given cpuset. This looks like within the scope of the functions we are thinking about for network use (extract part of a topology, merge topologies).

network topology support

How do we gather multiple machine information and store them in the same big topology so that the process manager has a global knowledge of the cluster?

  • Need a way to merge multiple "local" topology in a single big one

    /* create a topology with only a System object root /
    hwloc_topology_create_empty()
    /
    load a XML topology and insert it below a given object */
    hwloc_topology_insert_xml_by_parent()

A new utility would use these to agregate multiple XML topologies. You
would have to run lstopo foo.xml on each node and run this new utility
to create the global XML topology. Finally, you can run hwloc with the
new global topology and do whatever you want.

mpirun lstopo .xml
hwloc_xml_agregate cluster.xml *.xml
export HWLOC_XMLFILE=cluster.xml

  • Need to extend cpusets, either by extracting the local part before binding, or by adding a "network id" attribute internally, or a local flag to objects (set to 0 by default when agregating topologies, and process can then set it back to 1 their own Machine object and children).
  • Need network topology detection

Note: For OFED, ibnetdiscover provides the network topology

add memory binding API

  • add hwloc_set_membind(topology, beginaddr, endaddr, HWLOC_MEMBIND_BIND/FIRSTTOUCH/INTERLEAVE, hwloc_cpuset_t)
    • size instead of endaddr?
    • if beginaddr=endaddr=NULL, setmempolicy?
    • reverse routine?
    • no level = empty mask, and we may want an easy alias for "whole machine"
  • allocation with a given policy
    • get Samuel's code from pm2's marcel_sysdep.c
  • apply a policy to a given area (not all OSes support that).

hwloc fails to link with gcc >= 4.3 with -fexceptions

Initially reported by LANL on the OMPI trac (https://svn.open-mpi.org/trac/ompi/ticket/2778), I have similar linking problems if I use a hand-installed gcc 4.5 on RHEL5. Notes:

  • Happens with hand-install gcc 4.5 on RHEL5, but not the built-in gcc 4.1
  • Happens with hwloc 1.1.2, but not with hwloc 1.2 or trunk

Here's the specific link failure I get if I compile hwloc by itself (i.e., not as part of OMPI):

{{{
[11:36] svbu-mpi:/svn/hwloc-1.1 % ./configure CFLAGS=-fexceptions
...lots of output...
[11:36] svbu-mpi:
/svn/hwloc-1.1 % make
Making all in src
make[1]: Entering directory /home/jsquyres/svn/hwloc-1.1/src' CC topology.lo CC traversal.lo CC topology-synthetic.lo CC bind.lo CC cpuset.lo CC misc.lo CC topology-xml.lo CC topology-linux.lo CC topology-x86.lo CCLD libhwloc.la .libs/traversal.o: In function__pthread_cleanup_routine':
traversal.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/topology-synthetic.o: In function__pthread_cleanup_routine':
topology-synthetic.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/bind.o: In function__pthread_cleanup_routine':
bind.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/cpuset.o: In function__pthread_cleanup_routine':
cpuset.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/misc.o: In function__pthread_cleanup_routine':
misc.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/topology-xml.o: In function__pthread_cleanup_routine':
topology-xml.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/topology-linux.o: In function__pthread_cleanup_routine':
topology-linux.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here .libs/topology-x86.o: In function__pthread_cleanup_routine':
topology-x86.c:(.text+0x0): multiple definition of __pthread_cleanup_routine' .libs/topology.o:topology.c:(.text+0x0): first defined here collect2: ld returned 1 exit status make[1]: *** [libhwloc.la] Error 1 make[1]: Leaving directory/home/jsquyres/svn/hwloc-1.1/src'
make: *** [all-recursive] Error 1
}}}

Need to investigate this more to see if it's worthwhile to issue a 1.1.3 or not.

PLPA-like API (or at least PLPA-like information retrieval)

Most core/socket/processor-id conversion routines are easy to implement.

One thing that we miss is the number of offline processors.
We could add offline_procs to struct topology_info, but should be put ignored procs there as well (in case of cpuset or other administrator-disabling thing) ?
Might be worth fixing before 0.9.1 so that we don't change struct topology_info later.

Or we could just drop struct topology_info since it became very small now (we didn't want ten different accessors but it's not the case anymore). Maybe make it hwloc_get_topology_depth() and hwloc_is_thissystem() ?

Add cpulist-string to/from cpuset conversion routines?

multinode graphical lstopo output

lstopo currently uses boxes for everything, except when a system object contains multiple Machine objects (it draws a network).

With custom topologies, we can now easily get multiple levels of Groups between Machine and System. And we can also get multiple System levels if we assemble multiple times.

Ideally, this special drawing would even be used as soon as we have objects with cpusets above objects with cpusets.

throughput distance matrix

Add a throughput matrix on the side of the existing latency one (basically the same behavior except that the grouping code looks at maximum instead of mininum values). set_distance() doesn't have a latency/throughput parameter. So we will look at the matrix to find out if it's throughput (diagonal is maximum) or latency (all other cases)

If we rework the distance API because of tickets #48, #67 and #68, it might be good to add a parameter specifying if the given matrice is latency/throughput/number-of-hops/...

XLC/AIX build warnings

Reported by Mathieu Faverge.

Summary of warnings:

{{{
"lstopo-text.c", line 292.12: 1506-077 (E) The wchar_t value 0x250c is not valid.

  • Just a warning that it won't work in a non-UTF-8 locale. We check that at runtime indeed so not a problem.
    }}}

heterogeneous topology support

what if we have a machine with different processors? for instance if one socket has a shared L3 and the other one doesn't?

  • sthibaul: should work fine already

support levels that do not cover the whole machine? (no L3 above the cores of the second socket above)

  • sthibaul: I do not understand

support object whose children are not in the exact next level? (socket pointing to cores instead of cache above)

  • sthibaul: should work fine already

need to be decided if we want to put GPUs as hwloc_obj_t, see ticket:5

Make hwloc CLI commands all default to same index bias

After #25, make all hwloc CLI commands uniformly default to both output and accept as input either physical/OS or hwloc-logical index values.

Simple example: hwloc-bind should accept as input the index values output by the default output of lstopo.

See also the thread started here:

http://www.open-mpi.org/community/lists/hwloc-devel/2009/12/0456.php

distances vs multinode

Some user want distances in multinode topologies.

  1. It's currently disabled because we use cpusets/nodesets to find/create a common ancestor where the matrix is attached. Multinode objects have no cpusets/nodesets.

One way to solve this would be to add a hostset or machineset bitmap to each object to identify the hosts/machines it corresponds too.

  1. We'll need a better way to identified objects in multinode topologies. Most distance insertion routines currently use physical indexes, but those are meaningless in multinode systems (and even not always meaningfull in single-node systems because core ids are not always unique).

See also ticket #67 when people don't care about grouping.

"./configure --enable-xml" doesn't fail if XML can't be built

Andreas Kupries noticed that if you configure hwloc with:

{{{
./configure --enable-xml ...
}}}

but configure fails to find XML support, it'll still continue and just give you an hwloc without XML support.

This violates the Law of Least Astonishment. If someone asks for --enable- and configure fails to find the Right Stuff for , then configure should abort.

--enable-xml is definitely broken in this regard; the other --enable- and --with- options should be checked for this kind of behavior as well.

hwloc-calc hierarchical output formatting

From http://www.open-mpi.org/community/lists/hwloc-users/2011/02/0276.php

hwloc-calc may currently convert anything into a list of objects given as type:index. The above message suggests that it may be useful to report as type1:index1.type2:index2 but there is no easy to guess what type1 and type2 should be (and the user may want more levels).

So maybe do hwloc-calc --ho socket,core to report a hierarchical output as socket:X.core:Y

If multiple cores are included in the input, just append another socket:T.core:Z string.

If the input is smaller than a single core, two solutions:

  • socket:X.core:Y.L1Cache:Z
  • socket:X.core:Y and specify in the doc that the output may be larger than the input

I/O device support

Updated TODO-list:

  • Add iterators to find GPUs, NICs, ...
  • Update documentation
  • Find a pci lib for MacOSX (neither pciutils nor pciaccess seems available, and pciaccess doesn't expose the hierarchy of brdiges anyway)
  • Add some hwloc_insert_object_by_pcisomething, e.g. for a CUDA plugin which provides extended information to the object (e.g. number of streaming processors, etc.), which the core merges with the objects created by the libpci module.
    • provide functions like:

hwloc_obj_t hwloc_get_path_obj(hwloc_topology_t topo, const char *path);
hwloc_obj_t hwloc_get_fd_obj(hwloc_topology_t topo, int fd);

(the latter may return a network device or a disk device, depending on whether it's a socket or a file. Mmm and how about nfs-mounted files!)

hwloc build fails with strict compiler flags

Here's a snippet of the error:

{{{
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src -I../include/private -I../include/hwloc -I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include -I../include -std=c99 -Wall -Wmissing-prototypes -Wundef -Wpointer-arith -Wcast-align -Wall -Wextra -Wno-missing-field-initializers -Wstrict-prototypes -Wmissing-prototypes -DGCC_WALL -Wno-unused-parameter -Wno-unused-label -Wshadow -Wmissing-declarations -Wno-long-long -Wfloat-equal -Wdeclaration-after-statement -Wundef -Wno-endif-labels -Wpointer-arith -Wbad-function-cast -Wcast-align -Wwrite-strings -Wno-sign-compare -Waggregate-return -Wold-style-definition -Wno-multichar -Wno-deprecated-declarations -Wpacked -Wnested-externs -Winvalid-pch -Wno-pointer-sign -Wvariadic-macros -std=c89 -Wno-format-zero-length -Wno-type-limits -D_POSIX_C_SOURCE=199506L -g -MT topology.lo -MD -MP -MF .deps/topology.Tpo -c /home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology.c -fPIC -DPIC -o .libs/topology.o
In file included from /home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology.c:20:
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include/hwloc.h: In function 'hwloc_get_obj_by_type':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include/hwloc.h:425: warning: declaration of 'index' shadows a global declaration
/usr/include/string.h:309: warning: shadowed declaration is here

[...snip...]

/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology.c:1313: warning: ISO C90 forbids mixed declarations and code
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology.c:1337: warning: ISO C90 forbids mixed declarations and code
make[2]: *** [topology.lo] Error 1
make[2]: Leaving directory /home/balaji/projects/mpich2/hydra/build/tools/bind/hwloc/hwloc/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory/home/balaji/projects/mpich2/hydra/build/tools/bind/hwloc/hwloc'
make: *** [all-recursive] Error 1
}}}

This is causing MPICH2's builds to fail when configured with strict compiler options.

hwloc-ps core dump on Altix, with CPU set interactions

hwloc 1.2's hwloc-ps dumps core when executed from a user's non-root
CPU set:

{{{
cs@altix-02$ cat /proc/self/cpuset
/
cs@altix-02$ hwloc-ps
cs@altix-02$ echo $$ | sudo cpuset -a /test
cpuset: attached one pid to cpuset
cs@altix-02$ cat /proc/self/cpuset
/test
cs@altix-02$ hwloc-ps
Segmentation fault (core dumped)
}}}

After rebuilding hwloc-1.2 for debugging, here's what I learned
with gdb:

{{{
cs@altix-02$ gdb /usr/local/bin/hwloc-ps
GNU gdb 6.2.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-suse-linux"...Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) run
Starting program: /usr/local/bin/hwloc-ps

Program received signal SIGSEGV, Segmentation fault.
hwloc_obj_type_snprintf (string=0x60000fffffffad00 "Machine", size=64,
obj=0x0, verbose=1) at traversal.c:188
188 hwloc_obj_type_t type = obj->type;
(gdb) bt
#0 hwloc_obj_type_snprintf (string=0x60000fffffffad00 "Machine", size=64,

obj=0x0, verbose=1) at traversal.c:188

#1 0x4000000000002c20 in main (argc=1, argv=0x60000fffffffb1a8)

at hwloc-ps.c:144

(gdb) up
#1 0x4000000000002c20 in main (argc=1, argv=0x60000fffffffb1a8)

at hwloc-ps.c:144

144 hwloc_obj_type_snprintf(type, sizeof(type), obj, 1);
(gdb) list 140
135 hwloc_bitmap_asprintf(&cpuset_str, cpuset);
136 printf("%s", cpuset_str);
137 } else {
138 hwloc_bitmap_t remaining = hwloc_bitmap_dup(cpuset);
139 int first = 1;
140 while (!hwloc_bitmap_iszero(remaining)) {
141 char type[64];
142 unsigned idx;
143 hwloc_obj_t obj = hwloc_get_first_largest_obj_inside_cpuset(topology, remaining);
144 hwloc_obj_type_snprintf(type, sizeof(type), obj, 1);
(gdb) print topology
$1 = 0x6000000000008010
(gdb) print *topology
$2 = {nb_levels = 3, next_group_depth = 0, level_nbobjects = {1, 2, 1,
0 <repeats 125 times>}, levels = {0x60000000000090d0, 0x600000000000ab70,
0x600000000000acc0, 0x0 <repeats 125 times>}, flags = 0, type_depth = {0,
0, 1, -1, -1, -1, 2, -1, -1}, ignored_types = {HWLOC_IGNORE_TYPE_NEVER,
HWLOC_IGNORE_TYPE_NEVER, HWLOC_IGNORE_TYPE_NEVER, HWLOC_IGNORE_TYPE_NEVER,
HWLOC_IGNORE_TYPE_NEVER, HWLOC_IGNORE_TYPE_NEVER, HWLOC_IGNORE_TYPE_NEVER,
HWLOC_IGNORE_TYPE_KEEP_STRUCTURE, HWLOC_IGNORE_TYPE_NEVER},
is_thissystem = 1, is_loaded = 1, pid = 0,
set_thisproc_cpubind = 0x200000000003ca38 <local+7656>,
get_thisproc_cpubind = 0x200000000003c868 <local+7192>,
set_thisthread_cpubind = 0x200000000003c878 <local+7208>,
get_thisthread_cpubind = 0x200000000003c888 <local+7224>,
set_proc_cpubind = 0x200000000003ca28 <local+7640>,
get_proc_cpubind = 0x200000000003c858 <local+7176>, set_thread_cpubind = 0,
get_thread_cpubind = 0,
get_thisproc_last_cpu_location = 0x200000000003c8a8 <local+7256>,
get_thisthread_last_cpu_location = 0x200000000003c8b8 <local+7272>,
get_proc_last_cpu_location = 0x200000000003ca58 <local+7688>,
set_thisproc_membind = 0, get_thisproc_membind = 0,
set_thisthread_membind = 0x200000000003c8e8 <local+7320>,
get_thisthread_membind = 0x200000000003c8f8 <local+7336>,
set_proc_membind = 0, get_proc_membind = 0,
set_area_membind = 0x200000000003c8c8 <local+7288>, get_area_membind = 0,
alloc = 0x200000000003c828 <local+7128>,
alloc_membind = 0x200000000003c8d8 <local+7304>,
free_membind = 0x200000000003c808 <local+7096>, support = {
discovery = 0x6000000000009070, cpubind = 0x6000000000009090,
membind = 0x60000000000090b0}, os_distances = {{nbobjs = 0, indexes = 0x0,
objs = 0x0, distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}, {nbobjs = 2, indexes = 0x6000000000009700,
objs = 0x60000000000096c0, distances = 0x60000000000096e0}, {nbobjs = 0,
indexes = 0x0, objs = 0x0, distances = 0x0}, {nbobjs = 0, indexes = 0x0,
objs = 0x0, distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}, {nbobjs = 0, indexes = 0x0, objs = 0x0,
distances = 0x0}}, backend_type = HWLOC_BACKEND_SYSFS, backend_params = {
sysfs = {root_path = 0x0, root_fd = -1}, synthetic = {arity = {0, 0,
4294967295, 0 <repeats 125 times>}, type = {
HWLOC_OBJ_SYSTEM <repeats 128 times>}, id = {0 <repeats 128 times>},
depth = {0 <repeats 128 times>}}}}
(gdb) print remaining
$3 = 0x6000000000009da0
(gdb) print *remaining
$4 = {ulongs_count = 1, ulongs_allocated = 8, ulongs = 0x600000000000b120,
infinite = 0}
(gdb) break hwloc_get_first_largest_obj_inside_cpuset
Breakpoint 1 at 0x4000000000003092: file helper.h, line 234.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /usr/local/bin/hwloc-ps

Breakpoint 1, hwloc_get_first_largest_obj_inside_cpuset (
topology=0x6000000000008010, set=0x6000000000009da0) at helper.h:234
234 hwloc_obj_t obj = hwloc_get_root_obj(topology);
(gdb) next
236 if (!hwloc_bitmap_intersects(obj->cpuset, set))
(gdb) next
237 return NULL;
(gdb) break hwloc_bitmap_intersects
Breakpoint 2 at 0x2000000000076ee2: file cpuset.c, line 895.
(gdb) delete 1
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /usr/local/bin/hwloc-ps

Breakpoint 2, hwloc_bitmap_intersects (set1=0x6000000000009890,
set2=0x6000000000009650) at cpuset.c:895
895 for(i=0; iulongs_count || iulongs_count; i++)
(gdb) print *set1
$5 = {ulongs_count = 1, ulongs_allocated = 8, ulongs = 0x60000000000098b0,
infinite = 0}
(gdb) print *set2
$6 = {ulongs_count = 1, ulongs_allocated = 8, ulongs = 0x6000000000009670,
infinite = 0}
(gdb) next
896 if ((HWLOC_SUBBITMAP_READULONG(set1, i) & HWLOC_SUBBITMAP_READULONG(set2, i)) != HWLOC_SUBBITMAP_ZERO)
(gdb) next
895 for(i=0; iulongs_count || iulongs_count; i++)
(gdb) next
899 if (set1->infinite && set2->infinite)
(gdb) next
902 return 0;
}}}

Add embedding capabilies

PLPA is "fully embeddable" in larger software projects, meaning:

  • Relevant m4 is available in a standalone file that is m4_include'able.
  • Specific m4 macros are exported in this file that can be called in a higher-level file (e.g., configure.ac).
  • When building in an "embedded" mode, only the library is made (as an LT convenience library); nothing is installed.
  • Prefix name shifting is available for all public symbols.

This capability needs to be brought to hwloc before it can be a wholesale replacement for PLPA.

IRIX support

sysmp(MP_NPROCS/MP_NAPROCS/MP_STAT)
NUMA: /hw : /hw/nodenum/0 -> /hw/module/1/slot/n1/node
/hw/cpunum/0 -> /hw/module/1/slot/n1/node/cpu/a
check through getmntent where hwgfs is mounted
sysmp(MP_MUSTRUN/MP_MUSTRUN_PID)
PTHREAD_SCOPE_BOUND_NP
pthread_setrunon_np()
process_cpulink()
mld_create() mldset_create() numa_acreate() migr_range_migrate()

function to get the current cpu number

It can be useful to know where a thread is currently actually executing. Of course, the information may be outdated shortly after being returned, but that's still useful to monitoring applications.

dynamic cpusets

The dyncpuset branch might be mergeable now. The remaining possible optimization are orthogonal, not required before entering trunk. I'd like some feedback about the current implementation.

FWIW, below is the duration (in microseconds) of hwloc_topology_load() depending on the topology size/hierarchy. It's a synthetic topo, so load() does pretty much nothing apart from allocating objects and manipulating tons of cpusets to insert in the tree and compute the {allowed,complete,online}_{cpuset,nodeset}.

{{{
size trunk dyncpusets
synthetic proc:4 4 100 100
synthetic proc:32 32 745 413
synthetic node:4 die:4 core:4 proc:4 256 7750 5409
synthetic proc:256 256 41932 34773
synthetic mach:4 node:4 die:4 core:4 proc:4 1024 44215 49406
synthetic proc:1024 1024 1237049 1442945
synthetic m:4 n:4 d:4 cache:4 core:4 4 4096 X 700547
synthetic m:4 n:4 d:4 cache:4 cache:4 core:4 4 16384 X 11597185
}}}

In short, dyncpusets decrease memory waste, and do not increase CPU cycles.

1024 is the current static size in the trunk, that's why it's faster than dyncpusets. The dyncpusets branch works at least until 16384 in the above test but the lstopo time became too long for me :)

USB tree?

I've came across the location of a CD-ROM drive:

/sys/devices/pci0000:00/0000:00:02.1/usb1/1-5/1-5:1.0/host6/target6:0:0/6:0:0:0/block:sr0

Windows warning

I'm getting this warning when compiling on RHEL4 with gcc:

{{{
../../src/topology-windows.c: In function `hwloc_look_windows':
../../src/topology-windows.c:194: warning: assignment from incompatible pointer type
}}}

I don't know anything about Windows code to fix it...

add linux cgroup support (seems to be cpuset-exclusive?)

add linux cgroup support (seems to be cpuset-exclusive?)
{{{
if /proc/self/cgroup exists and is not empty, take the path from the 3rd ':'-separated field
read cpuset cpulist in /dev/cgroup//cpuset.cpus
read cpuset memlist in /dev/cgroup//cpuset.mems
}}}

array of stringified infos

As discussed a while ago, I think we should add something like this to the end of the hwloc_obj structure:

char infos; /< \brief Array of string name=value /
unsigned infos_count; /
*< \brief Length of the infos array */

We would store in there things like:

DMIBoardVendor=Tyan (currently in obj->attr->machine.dmi_board_vendor)
DMIBoardModel=S4885 (currently in obj->attr->machine.dmi_board_info)
PCIVendor=AMD
PCIModel=Radeon HD4350

Some of them are already used in obj->name but that doesn't need to change.

Some system-fields might be interesting too, they should go in the topology or in the widest related object:

Backend=Synthetic
OS=Linux
LinuxCpuset=/foobar
FsRoot=/var/lib/topology/myworderfulmachine
Hostname=foobar

fgets() return value not checked

Pavan reported that when compiling with super-picky compiler flags, we get warnings about not checking the return status of fgets():

{{{
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:
In function 'hwloc_parse_sysfs_unsigned':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:241:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:
In function 'hwloc_read_cpuset_mask':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:326:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:346:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:
In function 'look_cpuinfo':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:863:
warning: ignoring return value of 'fscanf', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:
In function 'hwloc__get_dmi_info':
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:917:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/topology-linux.c:931:
warning: ignoring return value of 'fgets', declared with attribute
warn_unused_result
}}}

Here's the super-picky flags that he used:

{{{
libtool: compile: gcc -DHAVE_CONFIG_H -I.
-I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src
-I../include/private -I../include/hwloc
-I/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/include
-I../include -std=c99 -Wall -Wmissing-prototypes -Wundef -Wpointer-arith
-Wcast-align -O2 -Wall -Wextra -Wno-missing-field-initializers
-Wstrict-prototypes -Wmissing-prototypes -DGCC_WALL
-Wno-unused-parameter -Wno-unused-label -Wshadow -Wmissing-declarations
-Wno-long-long -Wfloat-equal -Wdeclaration-after-statement -Wundef
-Wno-endif-labels -Wpointer-arith -Wbad-function-cast -Wcast-align
-Wwrite-strings -Wno-sign-compare -Waggregate-return
-Wold-style-definition -Wno-multichar -Wno-deprecated-declarations
-Wpacked -Wnested-externs -Winvalid-pch -Wno-pointer-sign
-Wvariadic-macros -std=c89 -Wno-format-zero-length -Wno-type-limits
-D_POSIX_C_SOURCE=199506L -MT cpuset.lo -MD -MP -MF .deps/cpuset.Tpo -c
/home/balaji/projects/mpich2/hydra/hydra/tools/bind/hwloc/hwloc/src/cpuset.c
-o cpuset.o >/dev/null 2>&1
}}}

hwloc-aware top/ps

Based on discussion here, it looks like some people use top to verify the binding of their MPI processes. They will get physical processor index there. But they might want to easily check that the binding is "hwloc-correct" when MPI uses hwloc for binding.

So having a hwloc-aware top (or ps) would be good. No need to reimplement everything, only showing basic top/ps info would be enough. Something like below should be easy:
<socket1.core5>

With some options for filtering with a userid, process name, ...

Maybe print %CPU if it's easy to retrieve (but we may need to make it refresh the display every second or so, which opens the room for lots of useless requests from users). Probably better to let the user revert to the plain top/ps instead or reinventing the wheel.

utils man pages depend on executables

As mentioned in http://www.open-mpi.org/community/lists/hwloc-devel/2009/09/0060.php, there's a causality issue in "make dist": the man pages depend on the executables (because the man pages are generated via help2man).

This causes a problem with the following (e.g., nightly tarball generation):

{{{
svn co ...
cd ...
./autogen.sh
./configure
make dist
}}}

because the executables will try to build, but fail when there is no libhwloc.la.

A workaround, of course, is to "make all" first and then "make dist". But this is somewhat icky; it would be nice to have a better solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.