Introduction

hwloc provides command line tools and a C API to obtain the hierarchical map
of  key computing elements, such as: NUMA memory nodes, shared caches,
processor sockets, processor cores, and processing units (logical processors
or "threads"). hwloc also gathers various attributes such as cache and
memory information, and is portable across a variety of different operating
systems and platforms.

hwloc  primarily  aims  at  helping  high-performance  computing (HPC)
applications, but is also applicable to any project seeking to exploit code
and/or data locality on modern computing platforms.

Note that the hwloc project represents the merger of the libtopology project
from INRIA and the Portable Linux Processor Affinity (PLPA) sub-project from
Open MPI. Both of these prior projects are now deprecated. The first hwloc
release is essentially a "re-branding" of the libtopology code base, but
with both a few genuinely new features and a few PLPA-like features added
in. More new features and more PLPA-like features will be added to hwloc
over  time.  See switchfromplpa for more details about converting your
application from PLPA to hwloc.

hwloc supports the following operating systems:

  * Linux (including old kernels not having sysfs topology information, with
    knowledge of cpusets, offline cpus, ScaleMP vSMP, and Kerrighed support)
  * Solaris
  * AIX
  * Darwin / OS X
  * FreeBSD and its variants, such as kFreeBSD/GNU
  * OSF/1 (a.k.a., Tru64)
  * HP-UX
  * Microsoft Windows

hwloc  only  reports the number of processors on unsupported operating
systems; no topology information is available.

For development and debugging purposes, hwloc also offers the ability to
work on "fake" topologies:

  * Symmetrical tree of resources generated from a list of level arities
  * Remote machine simulation through the gathering of Linux sysfs topology
    files

hwloc  can  display the topology in a human-readable format, either in
graphical mode (X11), or by exporting in one of several different formats,
including: plain text, PDF, PNG, and FIG (see CLI Examples below). Note that
some of the export formats require additional support libraries.

hwloc  offers  a programming interface for manipulating topologies and
objects. It also brings a powerful CPU bitmap API that is used to describe
topology  objects  location  on  physical/logical  processors. See the
Programming Interface below. It may also be used to binding applications
onto  certain cores or memory nodes. Several utility programs are also
provided to ease command-line manipulation of topology objects, binding of
processes, and so on.

Installation

hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD
license. It is hosted as a sub-project of the overall Open MPI project
(http://www.open-mpi.org/).  Note  that  hwloc  does  not  require any
functionality from Open MPI -- it is a wholly separate (and much smaller!)
project and code base. It just happens to be hosted as part of the overall
Open MPI project.

Nightly development snapshots are available on the web site. Additionally,
the code can be directly checked out of Subversion:

shell$ svn checkout http://svn.open-mpi.org/svn/hwloc/trunk hwloc-trunk
shell$ cd hwloc-trunk
shell$ ./autogen.sh

Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are
required when building from a Subversion checkout.

Installation by itself is the fairly common GNU-based process:

shell$ ./configure --prefix=...
shell$ make
shell$ make install

The hwloc command-line tool "lstopo" produces human-readable topology maps,
as  mentioned above. It can also export maps to the "fig" file format.
Support for PDF, Postscript, and PNG exporting is provided if the "Cairo"
development  package  can be found when hwloc is configured and build.
Similarly, lstopo's XML support requires the libxml2 development package.

CLI Examples

On a 4-socket 2-core machine with hyperthreading, the lstopo tool may show
the following graphical output:

                              dudley.png

Here's the equivalent output in textual form:

Machine (16GB)
  Socket L#0 + L3 L#0 (4096KB)
 L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
   PU L#0 (P#0)
   PU L#1 (P#8)
 L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1
   PU L#2 (P#4)
   PU L#3 (P#12)
  Socket L#1 + L3 L#1 (4096KB)
 L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2
   PU L#4 (P#1)
   PU L#5 (P#9)
 L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3
   PU L#6 (P#5)
   PU L#7 (P#13)
  Socket L#2 + L3 L#2 (4096KB)
 L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4
   PU L#8 (P#2)
   PU L#9 (P#10)
 L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5
   PU L#10 (P#6)
   PU L#11 (P#14)
  Socket L#3 + L3 L#3 (4096KB)
 L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6
   PU L#12 (P#3)
   PU L#13 (P#11)
 L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7
   PU L#14 (P#7)
   PU L#15 (P#15)

Finally, here's the equivalent output in XML. Long lines were artificially
broken for document clarity (in the real output, each XML tag is on a single
line), and only socket #0 is shown for brevity:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x0000ffff"
   complete_cpuset="0x0000ffff" online_cpuset="0x0000ffff"
   allowed_cpuset="0x0000ffff"
   dmi_board_vendor="Dell Computer Corporation" dmi_board_name="0RD318"
   local_memory="16648183808">
 <page_type size="4096" count="4064498"/>
 <page_type size="2097152" count="0"/>
 <object type="Socket" os_level="-1" os_index="0" cpuset="0x00001111"
     complete_cpuset="0x00001111" online_cpuset="0x00001111"
     allowed_cpuset="0x00001111">
   <object type="Cache" os_level="-1" cpuset="0x00001111"
       complete_cpuset="0x00001111" online_cpuset="0x00001111"
       allowed_cpuset="0x00001111" cache_size="4194304" depth="3"
       cache_linesize="64">
     <object type="Cache" os_level="-1" cpuset="0x00000101"
         complete_cpuset="0x00000101" online_cpuset="0x00000101"
         allowed_cpuset="0x00000101" cache_size="1048576" depth="2"
         cache_linesize="64">
       <object type="Cache" os_level="-1" cpuset="0x00000101"
           complete_cpuset="0x00000101" online_cpuset="0x00000101"
           allowed_cpuset="0x00000101" cache_size="16384" depth="1"
           cache_linesize="64">
         <object type="Core" os_level="-1" os_index="0" cpuset="0x00000101"
             complete_cpuset="0x00000101" online_cpuset="0x00000101"
             allowed_cpuset="0x00000101">
           <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
               complete_cpuset="0x00000001" online_cpuset="0x00000001"
               allowed_cpuset="0x00000001"/>
           <object type="PU" os_level="-1" os_index="8" cpuset="0x00000100"
               complete_cpuset="0x00000100" online_cpuset="0x00000100"
               allowed_cpuset="0x00000100"/>
         </object>
       </object>
     </object>
     <object type="Cache" os_level="-1" cpuset="0x00001010"
         complete_cpuset="0x00001010" online_cpuset="0x00001010"
         allowed_cpuset="0x00001010" cache_size="1048576" depth="2"
         cache_linesize="64">
       <object type="Cache" os_level="-1" cpuset="0x00001010"
           complete_cpuset="0x00001010" online_cpuset="0x00001010"
           allowed_cpuset="0x00001010" cache_size="16384" depth="1"
           cache_linesize="64">
         <object type="Core" os_level="-1" os_index="1" cpuset="0x00001010"
             complete_cpuset="0x00001010" online_cpuset="0x00001010"
             allowed_cpuset="0x00001010">
           <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010"
               complete_cpuset="0x00000010" online_cpuset="0x00000010"
               allowed_cpuset="0x00000010"/>
           <object type="PU" os_level="-1" os_index="12" cpuset="0x00001000"

               complete_cpuset="0x00001000" online_cpuset="0x00001000"
               allowed_cpuset="0x00001000"/>
         </object>
       </object>
     </object>
   </object>
 </object>
 <!-- ...other sockets listed here ... -->
  </object>
</topology>

On a 4-socket 2-core Opteron NUMA machine, the lstopo tool may show the
following graphical output:

                              hagrid.png

Here's the equivalent output in textual form:

Machine (32GB)
  NUMANode L#0 (P#0 8190MB) + Socket L#0
 L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
 L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1)
  NUMANode L#1 (P#1 8192MB) + Socket L#1
 L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2)
 L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3)
  NUMANode L#2 (P#2 8192MB) + Socket L#2
 L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4)
 L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5)
  NUMANode L#3 (P#3 8192MB) + Socket L#3
 L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
 L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)

And here's the equivalent output in XML. Similar to above, line breaks were
added and only PU #0 is shown for brevity:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff"
   complete_cpuset="0x000000ff" online_cpuset="0x000000ff"
   allowed_cpuset="0x000000ff" nodeset="0x000000ff"
   complete_nodeset="0x000000ff" allowed_nodeset="0x000000ff"
   dmi_board_vendor="TYAN Computer Corp" dmi_board_name="S4881 ">
 <page_type size="4096" count="0"/>
 <page_type size="2097152" count="0"/>
 <object type="NUMANode" os_level="-1" os_index="0" cpuset="0x00000003"
     complete_cpuset="0x00000003" online_cpuset="0x00000003"
     allowed_cpuset="0x00000003" nodeset="0x00000001"
     complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
     local_memory="7514177536">
   <page_type size="4096" count="1834516"/>
   <page_type size="2097152" count="0"/>
   <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000003"
       complete_cpuset="0x00000003" online_cpuset="0x00000003"
       allowed_cpuset="0x00000003" nodeset="0x00000001"
       complete_nodeset="0x00000001" allowed_nodeset="0x00000001">
     <object type="Cache" os_level="-1" cpuset="0x00000001"
         complete_cpuset="0x00000001" online_cpuset="0x00000001"
         allowed_cpuset="0x00000001" nodeset="0x00000001"
         complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
         cache_size="1048576" depth="2" cache_linesize="64">
       <object type="Cache" os_level="-1" cpuset="0x00000001"
           complete_cpuset="0x00000001" online_cpuset="0x00000001"
           allowed_cpuset="0x00000001" nodeset="0x00000001"
           complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
           cache_size="65536" depth="1" cache_linesize="64">
         <object type="Core" os_level="-1" os_index="0"
             cpuset="0x00000001" complete_cpuset="0x00000001"
             online_cpuset="0x00000001" allowed_cpuset="0x00000001"
             nodeset="0x00000001" complete_nodeset="0x00000001"
             allowed_nodeset="0x00000001">
           <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
               complete_cpuset="0x00000001" online_cpuset="0x00000001"
               allowed_cpuset="0x00000001" nodeset="0x00000001"
               complete_nodeset="0x00000001" allowed_nodeset="0x00000001"/>
         </object>
       </object>
     </object>
  <!-- ...more objects listed here ... -->
</topology>

On a 2-socket quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each
socket):

                              emmett.png

Here's the same output in textual form:

Machine (16GB)
  Socket L#0
 L2 L#0 (4096KB)
   L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
   L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4)
 L2 L#1 (4096KB)
   L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
   L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
  Socket L#1
 L2 L#2 (4096KB)
   L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
   L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
 L2 L#3 (4096KB)
   L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3)
   L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)

And the same output in XML (line breaks added, only PU #0 shown):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff"
   complete_cpuset="0x000000ff" online_cpuset="0x000000ff"
   allowed_cpuset="0x000000ff" dmi_board_vendor="Dell Inc."
   dmi_board_name="0NR282" local_memory="16865292288">
 <page_type size="4096" count="4117503"/>
 <page_type size="2097152" count="0"/>
 <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000055"
     complete_cpuset="0x00000055" online_cpuset="0x00000055"
     allowed_cpuset="0x00000055">
   <object type="Cache" os_level="-1" cpuset="0x00000011"
       complete_cpuset="0x00000011" online_cpuset="0x00000011"
       allowed_cpuset="0x00000011" cache_size="4194304" depth="2"
       cache_linesize="64">
     <object type="Cache" os_level="-1" cpuset="0x00000001"
         complete_cpuset="0x00000001" online_cpuset="0x00000001"
         allowed_cpuset="0x00000001" cache_size="32768" depth="1"
         cache_linesize="64">
       <object type="Core" os_level="-1" os_index="0" cpuset="0x00000001"
           complete_cpuset="0x00000001" online_cpuset="0x00000001"
           allowed_cpuset="0x00000001">
         <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
             complete_cpuset="0x00000001" online_cpuset="0x00000001"
             allowed_cpuset="0x00000001"/>
       </object>
     </object>
     <object type="Cache" os_level="-1" cpuset="0x00000010"
         complete_cpuset="0x00000010" online_cpuset="0x00000010"
         allowed_cpuset="0x00000010" cache_size="32768" depth="1"
         cache_linesize="64">
       <object type="Core" os_level="-1" os_index="1" cpuset="0x00000010"
           complete_cpuset="0x00000010" online_cpuset="0x00000010"
           allowed_cpuset="0x00000010">
         <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010"
             complete_cpuset="0x00000010" online_cpuset="0x00000010"
             allowed_cpuset="0x00000010"/>
       </object>
     </object>
   </object>
  <!-- ...more objects listed here ... -->
</topology>

Programming Interface

The basic interface is available in hwloc.h. It essentially offers low-level
routines for advanced programmers that want to manually manipulate objects
and follow links between them. Documentation for everything in hwloc.h are
provided  later  in  this  document.  Developers  should  also look at
hwloc/helper.h (and also in this document, which provides good higher-level
topology traversal examples.

To precisely define the vocabulary used by hwloc, a termsanddefs section is
available and should probably be read first.

Each hwloc object contains a cpuset describing the list of processing units
that it contains. These bitmaps may be used for CPU binding and Memory
binding.  hwloc  offers  an extensive bitmap manipulation interface in
hwloc/bitmap.h.

Moreover, hwloc also comes with additional helpers for interoperability with
several commonly used environments. See the interoperability section for
details.

The complete API documentation is available in a full set of HTML pages, man
pages, and self-contained PDF files (formatted for both both US letter and
A4 formats) in the source tarball in doc/doxygen-doc/.

NOTE: If you are building the documentation from a Subversion checkout, you
will need to have Doxygen and pdflatex installed -- the documentation will
be built during the normal "make" process. The documentation is installed
during "make install" to $prefix/share/doc/hwloc/ and your systems default
man page tree (under $prefix, of course).

Portability

As shown in CLI Examples, hwloc can obtain information on a wide variety of
hardware  topologies.  However, some platforms and/or operating system
versions will only report a subset of this information. For example, on an
PPC64-based system with 32 cores (each with 2 hardware threads) running a
default 2.6.18-based kernel from RHEL 5.4, hwloc is only able to glean
information about NUMA nodes and processor units (PUs). No information about
caches, sockets, or cores is available.

Similarly, Operating System have varying support for CPU and memory binding,
e.g. while some Operating Systems provide interfaces for all kinds of CPU
and memory bindings, some others provide only interfaces for a limited
number of kinds of CPU and memory binding, and some do not provide any
binding interface at all. Hwloc's binding functions would then simply return
the ENOSYS error (Function not implemented), meaning that the underlying
Operating System does not provide any interface for them. CPU binding and
Memory binding provide more information on which hwloc binding functions
should be preferred because interfaces for them are usually available on the
supported Operating Systems.

Here's the graphical output from lstopo on this platform when Simultaneous
Multi-Threading (SMT) is enabled:

                          ppc64-with-smt.png

And here's the graphical output from lstopo on this platform when SMT is
disabled:

                        ppc64-without-smt.png

Notice that hwloc only sees half the PUs when SMT is disabled. PU #15, for
example, seems to change location from NUMA node #0 to #1. In reality, no
PUs "moved" -- they were simply re-numbered when hwloc only saw half as
many. Hence, PU #15 in the SMT-disabled picture probably corresponds to PU
#30 in the SMT-enabled picture.

This same "PUs have disappeared" effect can be seen on other platforms --
even platforms / OSs that provide much more information than the above PPC64
system. This is an unfortunate side-effect of how operating systems report
information to hwloc.

Note that upgrading the Linux kernel on the same PPC64 system mentioned
above to 2.6.34, hwloc is able to discover all the topology information. The
following picture shows the entire topology layout when SMT is enabled:

                       ppc64-full-with-smt.png

Developers using the hwloc API or XML output for portable applications
should therefore be extremely careful to not make any assumptions about the
structure of data that is returned. For example, per the above reported PPC
topology, it is not safe to assume that PUs will always be descendants of
cores.

Additionally, future hardware may insert new topology elements that are not
available in this version of hwloc. Long-lived applications that are meant
to span multiple different hardware platforms should also be careful about
making structure assumptions. For example, there may someday be an element
"lower" than a PU, or perhaps a new element may exist between a core and a
PU.

API Example

The following small C example (named ``hwloc-hello.c'') prints the topology
of the machine and bring the process to the first logical processor of the
second core of the machine.

/* Example hwloc API program.
 *
 * Copyright Â© 2009-2010 INRIA
 * Copyright Â© 2009-2010 UniversitÃ© Bordeaux 1
 * Copyright Â© 2009-2010 Cisco Systems, Inc.  All rights reserved.
 *
 * hwloc-hello.c
 */

#include <hwloc.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>

static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
                        int depth)
{
 char string[128];
 unsigned i;

 hwloc_obj_snprintf(string, sizeof(string), topology, obj, "#", 0);
 printf("%*s%s\n", 2*depth, "", string);
 for (i = 0; i < obj->arity; i++) {
     print_children(topology, obj->children[i], depth + 1);
 }
}

int main(void)
{
 int depth;
 unsigned i, n;
 unsigned long size;
 int levels;
 char string[128];
 int topodepth;
 hwloc_topology_t topology;
 hwloc_cpuset_t cpuset;
 hwloc_obj_t obj;

 /* Allocate and initialize topology object. */
 hwloc_topology_init(&topology);

 /* ... Optionally, put detection configuration here to ignore
    some objects types, define a synthetic topology, etc....

    The default is to detect all the objects of the machine that
    the caller is allowed to access.  See Configure Topology
    Detection. */

 /* Perform the topology detection. */
 hwloc_topology_load(topology);

 /* Optionally, get some additional topology information
    in case we need the topology depth later. */
 topodepth = hwloc_topology_get_depth(topology);

 /*****************************************************************
  * First example:
  * Walk the topology with an array style, from level 0 (always
  * the system level) to the lowest level (always the proc level).
  *****************************************************************/
 for (depth = 0; depth < topodepth; depth++) {
     printf("*** Objects at level %d\n", depth);
     for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
          i++) {
         hwloc_obj_snprintf(string, sizeof(string), topology,
                    hwloc_get_obj_by_depth(topology, depth, i),
                    "#", 0);
         printf("Index %u: %s\n", i, string);
     }
 }

 /*****************************************************************
  * Second example:
  * Walk the topology with a tree style.
  *****************************************************************/
 printf("*** Printing overall tree\n");
 print_children(topology, hwloc_get_root_obj(topology), 0);

 /*****************************************************************
  * Third example:
  * Print the number of sockets.
  *****************************************************************/
 depth = hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET);
 if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
     printf("*** The number of sockets is unknown\n");
 } else {
     printf("*** %u socket(s)\n",
            hwloc_get_nbobjs_by_depth(topology, depth));
 }

 /*****************************************************************
  * Fourth example:
  * Compute the amount of cache that the first logical processor
  * has above it.
  *****************************************************************/
 levels = 0;
 size = 0;
 for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
      obj;
      obj = obj->parent)
   if (obj->type == HWLOC_OBJ_CACHE) {
     levels++;
     size += obj->attr->cache.size;
   }
 printf("*** Logical processor 0 has %d caches totaling %luKB\n",
        levels, size / 1024);

 /*****************************************************************
  * Fifth example:
  * Bind to only one thread of the last core of the machine.
  *
  * First find out where cores are, or else smaller sets of CPUs if
  * the OS doesn't have the notion of a "core".
  *****************************************************************/
 depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);

 /* Get last core. */
 obj = hwloc_get_obj_by_depth(topology, depth,
                hwloc_get_nbobjs_by_depth(topology, depth) - 1);
 if (obj) {
     /* Get a copy of its cpuset that we may modify. */
     cpuset = hwloc_bitmap_dup(obj->cpuset);

     /* Get only one logical processor (in case the core is
        SMT/hyperthreaded). */
     hwloc_bitmap_singlify(cpuset);

     /* And try to bind ourself there. */
     if (hwloc_set_cpubind(topology, cpuset, 0)) {
         char *str;
         int error = errno;
         hwloc_bitmap_asprintf(&str, obj->cpuset);
         printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error));
         free(str);
     }

     /* Free our cpuset copy */
     hwloc_bitmap_free(cpuset);
 }

 /*****************************************************************
  * Sixth example:
  * Allocate some memory on the last NUMA node, bind some existing
  * memory to the last NUMA node.
  *****************************************************************/
 /* Get last node. */
 n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NODE);
 if (n) {
     void *m;
     size_t size = 1024*1024;

     obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, n - 1);
     m = hwloc_alloc_membind_nodeset(topology, size, obj->nodeset,
             HWLOC_MEMBIND_DEFAULT, 0);
     hwloc_free(topology, m, size);

     m = malloc(size);
     hwloc_set_area_membind_nodeset(topology, m, size, obj->nodeset,
             HWLOC_MEMBIND_DEFAULT, 0);
     free(m);
 }

 /* Destroy topology object. */
 hwloc_topology_destroy(topology);

 return 0;
}

hwloc provides a pkg-config executable to obtain relevant compiler and
linker flags. For example, it can be used thusly to compile applications
that utilize the hwloc library (assuming GNU Make):

CFLAGS += $(pkg-config --cflags hwloc)
LDLIBS += $(pkg-config --libs hwloc)
cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS)

On a machine with 4GB of RAM and 2 processor sockets -- each socket of which
has two processing cores -- the output from running hwloc-hello could be
something like the following:

shell$ ./hwloc-hello
*** Objects at level 0
Index 0: Machine(3938MB)
*** Objects at level 1
Index 0: Socket#0
Index 1: Socket#1
*** Objects at level 2
Index 0: Core#0
Index 1: Core#1
Index 2: Core#3
Index 3: Core#2
*** Objects at level 3
Index 0: PU#0
Index 1: PU#1
Index 2: PU#2
Index 3: PU#3
*** Printing overall tree
Machine(3938MB)
  Socket#0
 Core#0
   PU#0
 Core#1
   PU#1
  Socket#1
 Core#3
   PU#2
 Core#2
   PU#3
*** 2 socket(s)
shell$

Questions and Bugs

Questions    should    be    sent    to   the   devel   mailing   list
(http://www.open-mpi.org/community/lists/hwloc.php). Bug reports should be
reported in the tracker (https://svn.open-mpi.org/trac/hwloc/).

If hwloc discovers an incorrect topology for your machine, the very first
thing you should check is to ensure that you have the most recent updates
installed  for  your  operating system. Indeed, most of hwloc topology
discovery relies on hardware information retrieved through the operation
system (e.g., via the /sys virtual filesystem of the Linux kernel). If
upgrading your OS or Linux kernel does not solve your problem, you may also
want to ensure that you are running the most recent version of the BIOS for
your machine.

If those things fail, contact us on the mailing list for additional help.
Please attach the output of lstopo after having given the --enable-debug
option to ./configure and rebuilt completely, to get debugging output.

History / Credits

hwloc    is    the   evolution   and   merger   of   the   libtopology
(http://runtime.bordeaux.inria.fr/libtopology/) project and the Portable
Linux Processor Affinity (PLPA) (http://www.open-mpi.org/projects/plpa/)
project. Because of functional and ideological overlap, these two code bases
and ideas were merged and released under the name "hwloc" as an Open MPI
sub-project.

libtopology  was initially developed by the INRIA Runtime Team-Project
(http://runtime.bordeaux.inria.fr/)    (headed   by   Raymond   Namyst
(http://dept-info.labri.fr/~namyst/). PLPA was initially developed by the
Open MPI development team as a sub-project. Both are now deprecated in favor
of hwloc, which is distributed as an Open MPI sub-project.

Further Reading

The documentation chapters include

  * termsanddefs
  * tools
  * envvar
  * cpu_mem_bind
  * interoperability
  * threadsafety
  * embed
  * switchfromplpa
  * faq

Make sure to have had a look at those too!

termsanddefs Terms and Definitions

Object
       Interesting kind of part of the system, such as a Core, a Cache, a
       Memory node, etc. The different types detected by hwloc are detailed
       in the hwloc_obj_type_t enumeration.

       They are topologically sorted by CPU set into a tree.

CPU set
       The  set  of logical processors (or processing units) logically
       included in an object (if it makes sense). They are always expressed
       using physical logical processor numbers (as announced by the OS).
       They are implemented as the hwloc_bitmap_t opaque structure. hwloc
       CPU  sets are just masks, they do not have any relation with an
       operating system actual binding notion like Linux' cpusets.

Node set
       The set of NUMA memory nodes logically included in an object (if it
       makes sense). They are always expressed using physical node numbers
       (as  announced  by  the  OS).  They  are  implemented  with the
       hwloc_bitmap_t opaque structure. as bitmaps.

Bitmap
       A possibly-infinite set of bits used for describing sets of objects
       such  as  CPUs (CPU sets) or memory nodes (Node sets). They are
       implemented with the hwloc_bitmap_t opaque structure.

Parent object
       The object logically containing the current object, for example
       because its CPU set includes the CPU set of the current object.

Ancestor object
       The parent object, or its own parent object, and so on.

Children object(s)
       The object (or objects) contained in the current object because their
       CPU set is included in the CPU set of the current object.

Arity
       The number of children of an object.

Sibling objects
       Objects of the same type which have the same parent.

Sibling rank
       Index to uniquely identify objects of the same type which have the
       same parent, and is always in the range [0, parent_arity).

Cousin objects
       Objects of the same type as the current object.

Level
       Set of objects of the same type.

OS or physical index
       The index that the operating system (OS) uses to identify the object.
       This  may be completely arbitrary, or it may depend on the BIOS
       configuration.

Depth
       Nesting level in the object tree, starting from the 0th object.

Logical index
       Index to uniquely identify objects of the same type. It expresses
       proximity in a generic way. This index is always linear and in the
       range [0, num_objs_same_type_same_level). Think of it as ``cousin
       rank.'' The ordering is based on topology first, and then on OS CPU
       numbers,  so it is stable across everything except firmware CPU
       renumbering.

Logical processor

Processing unit
       The smallest processing element that can be represented by a hwloc
       object. It may be a single-core processor, a core of a multicore
       processor, or a single thread in SMT processor.

The  following  diagram  can  help to understand the vocabulary of the
relationships by showing the example of a machine with two dual core sockets
(with no hardware threads); thus, a topology with 4 levels. Each box with
rounded corner corresponds to one hwloc_obj_t, containing the values of the
different integer fields (depth, logical_index, etc.), and arrows show to
which other hwloc_obj_t pointers point to (first_child, parent, etc.)

                             diagram.png

It should be noted that for PU objects, the logical index -- as computed
linearly by hwloc -- is not the same as the OS index.

tools Command-line tools

hwloc comes with an extensive C programming interface and several command
line utilities. Each of them is fully documented in its own manual page; the
following is a summary of the available command line tools.

lstopo

lstopo (also known as hwloc-info and hwloc-ls) displays the hierarchical
topology map of the current system. The output may be graphical or textual,
and can also be exported to numerous file formats such as PDF, PNG, XML, and
others.

Note that lstopo can read XML files and/or alternate chroot filesystems and
display topological maps representing those systems (e.g., use lstopo to
output an XML file on one system, and then use lstopo to read in that XML
file and display it on a different system).

hwloc-bind

hwloc-bind binds processes to specific hardware objects through a flexible
syntax. A simple example is binding an executable to specific cores (or
sockets or bitmaps or ...). The hwloc-bind(1) man page provides much more
detail on what is possible.

hwloc-bind can also be used to retrieve the current process' binding.

hwloc-calc

hwloc-calc is generally used to create bitmap strings to pass to hwloc-bind.
Although hwloc-bind accepts many forms of object specification (i.e., bitmap
strings are one of many forms that hwloc-bind understands), they can be
useful, compact representations in shell scripts, for example.

hwloc-calc generates bitmap strings from given hardware objects with the
ability to aggregate them, intersect them, and more. hwloc-calc generally
uses the same syntax than hwloc-bind, but multiple instances may be composed
to generate complex combinations.

Note that hwloc-calc can also generate lists of logical processors or NUMA
nodes that are convenient to pass to some external tools such as taskset or
numactl.

hwloc-distrib

hwloc-distrib  generates  a  set  of bitmap strings that are uniformly
distributed across the machine for the given number of processes. These
strings may be used with hwloc-bind to run processes to maximize their
memory bandwidth by properly distributing them across the machine.

hwloc-ps

hwloc-ps is a tool to display the bindings of processes that are currently
running on the local machine. By default, hwloc-ps only lists processes that
are bound; unbound process (and Linux kernel threads) are not displayed.

envvar Environment variables

The behavior of the hwloc library and tools may be tuned thanks to the
following environment variables.

HWLOC_XMLFILE=/path/to/file.xml
       enforces   the   discovery  from  the  given  XML  file  as  if
       hwloc_topology_set_xml() had been called. This file may have been
       generated earlier with lstopo file.xml. For convenience, this backend
       provides empty binding hooks which just return success. To have hwloc
       still actually call OS-specific hooks, HWLOC_THISSYSTEM should be set
       1 in the environment too, to assert that the loaded file is really
       the underlying system.

HWLOC_FSROOT=/path/to/linux/filesystem-root/
       switches to reading the topology from the specified Linux filesystem
       root   instead   of   the   main   file-system   root,   as  if
       hwloc_topology_set_fsroot() had been called. Not using the main
       file-system root causes hwloc_topology_is_thissystem() to return 0.
       For convenience, this backend provides empty binding hooks which just
       return success. To have hwloc still actually call OS-specific hooks,
       HWLOC_THISSYSTEM should be set 1 in the environment too, to assert
       that the loaded file is really the underlying system.

HWLOC_THISSYSTEM=1
       enforces the return value of hwloc_topology_is_thissystem(). It means
       that it makes hwloc assume that the selected backend provides the
       topology for the system on which we are running, even if it is not
       the OS-specific backend but the XML backend for instance. This means
       making the binding functions actually call the OS-specific system
       calls and really do binding, while the XML backend would otherwise
       provide empty hooks just returning success. This can be used for
       efficiency reasons to first detect the topology once, save it to an
       XML file, and quickly reload it later through the XML backend, but
       still having binding functions actually do bind.

cpu_mem_bind CPU Binding and Memory Binding

Some OSes do not systematically provide separate functions for CPU and
Memory binding. This means that CPU binding functions may have have effects
on the memory binding policy, and changing the memory binding policy may
change the CPU binding of the current thread. This is often not a problem
for the application, so by default hwloc will make use of these functions
when they provide better binding support.

If the application does not want any CPU binding change when changing the
memory policy, it needs to use the HWLOC_MEMBIND_NOCPUBIND flag to prevent
hwloc  from  using  OS  functions  which would change the CPU binding.
Conversely, HWLOC_CPUBIND_NOMEMBIND can be passed to cpu binding function to
prevent  hwloc form using OS functions woudl change the memory binding
policy. Of course, this will thus reduce hwloc's support for binding, so
their use is discouraged.

One can however avoid using these flags but still closely control both
memory and CPU binding, by allocating memory and touching it, and then
changing the CPU binding. The already-really-allocated memory will not be
migrated, thus even if the memory binding policy gets changed by the CPU
binding order, the effect will have been achieved. On binding and allocating
further memory, the CPU binding should be performed again in case the memory
binding altered the previously-selected CPU binding.

Not all OSes support the notion of a current memory binding policy for the
current process but those often still provide a way to allocate data on a
given node set. Conversely, some OSes support the notion of a current memory
binding policy, and do not permit to allocate data on a given node set
without  just changing the current policy and allocate the data. Hwloc
provides  functions  that  set the current memory binding policies (if
supported) as well as functions which allocat memory bound to given node
set. By default, it does not use the former to achieve the latter, so that
users can use both on OSes where they are both supported, and get both
effects at the same time. For convenience, hwloc however also provides the
hwloc_alloc_membind_policy and hwloc_alloc_membind_policy_nodeset helpers
which  are  allowed to change the current memory binding policy of the
process, in order to achieve memory binding even if that means having to
change the current memory binding policy.

interoperability Interoperability with other software

Although hwloc offers its own portable interface, it still may have to
interoperate with specific or non-portable libraries that manipulate similar
kinds of objects. hwloc therefore offers several specific "helpers" to
assist converting between those specific interfaces and hwloc.

Some external libraries may be specific to a particular OS; others may not
always be available. The hwloc core therefore generally does not explicitly
depend on these types of libraries. However, when a custom application uses
or  otherwise depends on such a library, it may optionally include the
corresponding hwloc helper to extend the hwloc interface with dedicated
helpers.

Linux specific features
       hwloc/linux.h  offers  Linux-specific helpers that utilize some
       non-portable features of the Linux system, such as binding threads
       through their thread ID ("tid") or parsing kernel CPU mask files.

Linux libnuma
       hwloc/linux-libnuma.h provides conversion helpers between hwloc CPU
       sets and libnuma-specific types, such as nodemasks and bitmasks. It
       helps you use libnuma memory-binding functions with hwloc CPU sets.

Glibc
       hwloc/glibc-sched.h offers conversion routines between Glibc and
       hwloc  CPU  sets  in  order to use hwloc with functions such as
       sched_setaffinity().

OpenFabrics Verbs
       hwloc/openfabrics-verbs.h helps interoperability with the OpenFabrics
       Verbs interface. For example, it can return a list of processors near
       an OpenFabrics device.

Myrinet Express
       hwloc/myriexpress.h offers interoperability with the Myrinet Express
       interface. It can return the list of processors near a Myrinet board
       managed by the MX driver.

NVIDIA CUDA
       hwloc/cuda.h and hwloc/cudart.h enable interoperability with NVIDIA
       CUDA Driver and Runtime interfaces. For instance, it may return the
       list of processors near NVIDIA GPUs.

Taskset command-line tool
       The taskset command-line tool is widely used for binding processes.
       It manipulates CPU set strings in a format that is slightly different
       from hwloc's one (it does not divide the string in fixed-size subsets
       and separates them with commas). To ease interoperability, hwloc
       offers routines to convert hwloc CPU sets from/to taskset-specific
       string  format.  Most hwloc command-line tools also support the
       --taskset option to manipulate taskset-specific strings.

threadsafety Thread safety

Like most libraries that mainly fill data structures, hwloc is not thread
safe but rather reentrant: all state is held in a hwloc_topology_t instance
without mutex protection. That means, for example, that two threads can
safely operate on and modify two different hwloc_topology_t instances, but
they  should  not simultaneously invoke functions that modify the same
instance.  Similarly,  one thread should not modify a hwloc_topology_t
instance while another thread is reading or traversing it. However, two
threads can safely read or traverse the same hwloc_topology_t instance
concurrently.

When running in multiprocessor environments, be aware that proper thread
synchronization and/or memory coherency protection is needed to pass hwloc
data (such as hwloc_topology_t pointers) from one processor to another
(e.g., a mutex, semaphore, or a memory barrier). Note that this is not a
hwloc-specific requirement, but it is worth mentioning.

For reference, hwloc_topology_t modification operations include (but may not
be limited to):

Creation and destruction
       hwloc_topology_init(),                   hwloc_topology_load(),
       hwloc_topology_destroy() (see Create and Destroy Topologies) imply
       major modifications of the structure, including freeing some objects.
       No other thread cannot access the topology or any of its objects at
       the same time.

       Also references to objects inside the topology are not valid anymore
       after these functions return.

Runtime topology modifications
       hwloc_topology_insert_misc_object_by_* (see Tinker with topologies.)
       may modify the topology significantly by adding objects inside the
       tree, changing the topology depth, etc.

       Although  references to former objects may still be valid after
       insertion, it is strongly advised to not rely on any such guarantee
       and always re-consult the topology to reacquire new instances of
       objects.

Locating topologies
       hwloc_topology_ignore*, hwloc_topology_set* (see Configure Topology
       Detection) do not modify the topology directly, but they do modify
       internal structures describing the behavior of the next invocation of
       hwloc_topology_load(). Hence, all of these functions should not be
       used concurrently.

       Note that these functions do not modify the current topology until it
       is actually reloaded; it is possible to use them while other threads
       are only read the current topology.

embed Embedding hwloc in other software

It can be desirable to include hwloc in a larger software package (be sure
to  check out the LICENSE file) so that users don't have to separately
download  and  install it before installing your software. This can be
advantageous to ensure that your software uses a known-tested/good version
of hwloc, or for use on systems that do not have hwloc pre-installed.

When used in "embedded" mode, hwloc will:

  * not install any header files
  * not build any documentation files
  * not build or install any executables or tests
  * not build libhwloc.* -- instead, it will build libhwloc_embedded.*

There are two ways to put hwloc into "embedded" mode. The first is directly
from the configure command line:

shell$ ./configure --enable-embedded-mode ...

The second requires that your software project uses the GNU Autoconf /
Automake / Libtool tool chain to build your software. If you do this, you
can  directly integrate hwloc's m4 configure macro into your configure
script. You can then invoke hwloc's configuration tests and build setup by
calling an m4 macro (see below).

Using hwloc's M4 Embedding Capabilities

Every project is different, and there are many different ways of integrating
hwloc into yours. What follows is one example of how to do it.

If your project uses recent versions Autoconf, Automake, and Libtool to
build, you can use hwloc's embedded m4 capabilities. We have tested the
embedded m4 with projects that use Autoconf 2.65, Automake 1.11.1, and
Libtool 2.2.6b. Slightly earlier versions of may also work but are untested.
Autoconf versions prior to 2.65 are almost certain to not work.

You can either copy all the config/hwloc*m4 files from the hwloc source tree
to the directory where your project's m4 files reside, or you can tell
aclocal to find more m4 files in the embedded hwloc's "config" subdirectory
(e.g.,  add  "-Ipath/to/embedded/hwloc/config"  to  your Makefile.am's
ACLOCAL_AMFLAGS).

The following macros can then be used from your configure script (only
HWLOC_SETUP_CORE must be invoked if using the m4 macros):

  * HWLOC_SETUP_CORE(config-dir-prefix,           action-upon-success,
    action-upon-failure,   print_banner_or_not):   Invoke   the  hwloc
    configuration  tests  and setup the hwloc tree to build. The first
    argument is the prefix to use for AC_OUTPUT files -- it's where the
    hwloc tree is located relative to $top_srcdir. Hence, if your embedded
    hwloc is located in the source tree at contrib/hwloc, you should pass
    [contrib/hwloc] as the first argument. If HWLOC_SETUP_CORE and the rest
    of configure completes successfully, then "make" traversals of the hwloc
    tree with standard Automake targets (all, clean, install, etc.) should
    behave as expected. For example, it is safe to list the hwloc directory
    in the SUBDIRS of a higher-level Makefile.am. The last argument, if not
    empty, will cause the macro to display an announcement banner that it is
    starting the hwloc core configuration tests.

HWLOC_SETUP_CORE will set the following environment variables and AC_SUBST
them:      HWLOC_EMBEDDED_CFLAGS,     HWLOC_EMBEDDED_CPPFLAGS,     and
HWLOC_EMBEDDED_LIBS. These flags are filled with the values discovered in
the  hwloc-specific m4 tests, and can be used in your build process as
relevant. The _CFLAGS, _CPPFLAGS, and _LIBS variables are necessary to build
libhwloc (or libhwloc_embedded) itself.

HWLOC_SETUP_CORE also sets HWLOC_EMBEDDED_LDADD environment variable (and
AC_SUBSTs  it)  to  contain  the  location of the libhwloc_embedded.la
convenience Libtool archive. It can be used in your build process to link an
application or other library against the embedded hwloc library.

NOTE: If the HWLOC_SET_SYMBOL_PREFIX macro is used, it must be invoked
before HWLOC_SETUP_CORE.

  * HWLOC_BUILD_STANDALONE: HWLOC_SETUP_CORE defaults to building hwloc in
    an "embedded" mode (described above). If HWLOC_BUILD_STANDALONE is
    invoked *before* HWLOC_SETUP_CORE, the embedded definitions will not
    apply (e.g., libhwloc.la will be built, not libhwloc_embedded.la).

  * HWLOC_SET_SYMBOL_PREFIX(foo_): Tells the hwloc to prefix all of hwloc's
    types and public symbols with "foo_"; meaning that function hwloc_init()
    becomes foo_hwloc_init(). Enum values are prefixed with an upper-case
    translation  if  the  prefix  supplied;  HWLOC_OBJ_SYSTEM  becomes
    FOO_HWLOC_OBJ_SYSTEM. This is recommended behavior if you are including
    hwloc  in  middleware -- it is possible that your software will be
    combined with other software that links to another copy of hwloc. If
    both uses of hwloc utilize different symbol prefixes, there will be no
    type/symbol  clashes,  and  everything will compile, link, and run
    successfully. If you both embed hwloc without changing the symbol prefix
    and also link against an external hwloc, you may get multiple symbol
    definitions when linking your final library or application.

  * HWLOC_SETUP_DOCS, HWLOC_SETUP_UTILS, HWLOC_SETUP_TESTS: These three
    macros only apply when hwloc is built in "standalone" mode (i.e., they
    should NOT be invoked unless HWLOC_BUILD_STANDALONE has already been
    invoked).

  * HWLOC_DO_AM_CONDITIONALS: If you embed hwloc in a larger project and
    build it conditionally with Automake (e.g., if HWLOC_SETUP_CORE is
    invoked    conditionally),   you   must   unconditionally   invoke
    HWLOC_DO_AM_CONDITIONALS to avoid warnings from Automake (for the cases
    where  hwloc is not selected to be built). This macro is necessary
    because  hwloc  uses  some  AM_CONDITIONALs  to  build itself, and
    AM_CONDITIONALs cannot be defined conditionally. Note that it is safe
    (but   unnecessary)   to  call  HWLOC_DO_AM_CONDITIONALS  even  if
    HWLOC_SETUP_CORE  is invoked unconditionally. If you are not using
    Automake to build hwloc, this macro is unncessary (and will actually
    cause errors because it invoked AM_* macros that will be undefined).

NOTE: When using the HWLOC_SETUP_CORE m4 macro, it may be necessary to
explicitly  invoke  AC_CANONICAL_TARGET (which requires config.sub and
config.guess) and/or AC_USE_SYSTEM_EXTENSIONS macros early in the configure
script (e.g., after AC_INIT but before AM_INIT_AUTOMAKE). See the Autoconf
documentation for further information.

Also note that hwloc's top-level configure.ac script uses exactly the macros
described above to build hwloc in a standalone mode (by default). You may
want to examine it for one example of how these macros are used.

Example Embedding hwloc

Here's an example of integrating with a larger project named sandbox that
already uses Autoconf, Automake, and Libtool to build itself:

# First, cd into the sandbox project source tree
shell$ cd sandbox
shell$ cp -r /somewhere/else/hwloc-<version> my-embedded-hwloc
shell$ edit Makefile.am
  1. Add "-Imy-embedded-hwloc/config" to ACLOCAL_AMFLAGS
  2. Add "my-embedded-hwloc" to SUBDIRS
  3. Add "$(HWLOC_EMBEDDED_LDADD)" and "$(HWLOC_EMBEDDED_LIBS)" to
  sandbox's executable's LDADD line.  The former is the name of the
  Libtool convenience library that hwloc will generate.  The latter
  is any dependent support libraries that may be needed by
  $(HWLOC_EMBEDDED_LDADD).
  4. Add "$(HWLOC_EMBEDDED_CFLAGS)" to AM_CFLAGS
  5. Add "$(HWLOC_EMBEDDED_CPPFLAGS)" to AM_CPPFLAGS
shell$ edit configure.ac
  1. Add "HWLOC_SET_SYMBOL_PREFIX(sandbox_hwloc_)" line
  2. Add "HWLOC_SETUP_CORE([my-embedded-hwloc], [happy=yes], [happy=no])" line
  3. Add error checking for happy=no case
shell$ edit sandbox.c
  1. Add #include <hwloc.h>
  2. Add calls to sandbox_hwloc_init() and other hwloc API functions

Now you can bootstrap, configure, build, and run the sandbox as normal --
all calls to "sandbox_hwloc_*" will use the embedded hwloc rather than any
system-provided copy of hwloc.

switchfromplpa Switching from PLPA to hwloc

Although PLPA and hwloc share some of the same ideas, their programming
interfaces are quite different. After much debate, it was decided not to
emulate the PLPA API with hwloc's API because hwloc's API is already far
more rich than PLPA's.

More specifically, exploiting modern computing architecture requires the
flexible functionality provided by the hwloc API -- the PLPA API is too
rigid  in  its definitions and practices to handle the evolving server
hardware landscape (e.g., PLPA only understands cores and sockets; hwloc
understands a much larger set of hardware objects).

As such, even though it is fully possible to emulate the PLPA API with hwloc
(e.g., only deal with sockets and cores), and while the documentation below
describes how to do this, we encourage any existing PLPA application authors
to actually re-think their application in terms of more than just sockets
and cores. In short, we encourage you to use the full hwloc API to exploit
all the hardware.

Topology Context vs. Caching

First, all hwloc functions take a topology parameter. This parameter serves
as an internal storage for the result of the topology discovery. It replaces
PLPA's caching abilities and even lets you manipulate multiple topologies as
the same time, if needed.

Thus,   all   programs  should  first  run  hwloc_topology_init()  and
hwloc_topology_destroy() as they did plpa_init() and plpa_finalize() in the
past.

Hierarchy vs. Core@Socket

PLPA was designed to understand only cores and sockets. hwloc offers many
more different types of objects (e.g., cores, sockets, hardware threads,
NUMA nodes, and others) and stores them within a tree of resources.

To emulate the PLPA model, it is possible to find sockets using functions
such as hwloc_get_obj_by_type(). Iterating over sockets is also possible
using hwloc_get_next_obj_by_type(). Then, finding a core within a socket may
be     done     using     hwloc_get_obj_inside_cpuset_by_type()     or
hwloc_get_next_obj_inside_cpuset_by_type().

It is also possible to directly find an object "below" another object using
hwloc_get_obj_below_by_type() (or hwloc_get_obj_below_array_by_type()).

Logical vs. Physical/OS Indexes

hwloc manipulates logical indexes, meaning indexes specified with regard to
the ordering of objects in the hwloc-provided hierarchical tree. Physical or
OS indexes may be entirely hidden if not strictly required. The reason for
this is that physical/OS indexes may change with the OS or with the BIOS
version. They may be non-consecutive, multiple objects may have the same
physical/OS  indexes,  making  their  manipulation  tricky  and highly
non-portable.

Note that hwloc tries very hard to always present a hierarchical tree with
the same logical ordering, regardless of physical or OS index ordering.

It is still possible to retrieve physical/OS indexes through the os_index
field of objects, but such practice should be avoided as much as possible
for  the  reasons described above (except perhaps for prettyprinting /
debugging purposes).

HWLOC_OBJ_PU objects are supposed to have different physical/OS indexes
since the OS uses them for binding. The os_index field of these objects
provides  the  identifier  that  may  be  used  for  such binding, and
hwloc_get_proc_obj_by_os_index() finds the object associated with a specific
OS index.

But as mentioned above, we discourage the use of these conversion methods
for actual binding. Instead, hwloc offers its own binding model using the
cpuset  field  of  objects. These cpusets may be duplicated, modified,
combined,  etc.  (see  hwloc/bitmap.h  for details) and then passed to
hwloc_set_cpubind() for binding.

Counting Specification

PLPA offers a countspec parameter to specify whether counting all CPUs, only
the online ones or only the offline ones. However, some operating systems do
not expose the topology of offline CPUs (i.e., offline CPUs are not reported
at all by the OS). Also, some processors may not be visible to the current
application due to administrative restrictions. Finally, some processors let
you shutdown a single hardware thread in a core, making some of the PLPA
features irrelevant.

hwloc stores in the hierarchical tree of objects all CPUs that have known
topology information. It then provides the applications with several cpusets
that contain the list of CPUs that are actually known, that have topology
information, that are online, or that are available to the application.
These cpusets may be retrieved with hwloc_topology_get_online_cpuset() and
other similar functions to filter the object that are relevant or not.

faq Frequently Asked Questions

I do not want hwloc to rediscover my enormous machine topology everytime I
rerun a process

Although the topology discovery is not expensive on common machines, its
overhead may become significant when multiple processes repeat the discovery
on large machines (for instance when starting one process per core in a
parallel application). The machine topology usually does not vary much,
except  if  some  cores  are stopped/restarted or if the administrator
restrictions are modified. Thus rediscovering the whole topology again and
again may look useless.

For this purpose, hwloc offers XML import/export features. It lets you save
the discovered topology to a file (for instance with the lstopo program) and
reload it later by setting the HWLOC_XMLFILE environment variable. Loading a
XML topology is usually much faster than querying multiple files or calling
multiple  functions  of  the  operating system. It is also possible to
manipulate  such  XML  files with the C programming interface, and the
import/export may also be directed to memory buffer (that may for instance
be transmitted between applications through a socket).
  _________________________________________________________________


 Generated on Tue Dec 21 21:04:53 2010 for Hardware Locality (hwloc) by
 doxygen 1.3.9.1
