Copyright © 2009 CNRS Copyright © 2009-2011 INRIA. All rights reserved. Copyright © 2009-2011 Université Bordeaux 1 Copyright © 2009-2011 Cisco Systems, Inc. All rights reserved. $COPYRIGHT$ Additional copyrights may follow $HEADER$ =========================================================================== This file contains the main features as well as overviews of specific bug fixes (and other actions) for each version of hwloc since version 0.9 (as initially released as "libtopology", then re-branded to "hwloc" in v0.9.1). Version 1.2.2 ------------- * Fix build on AIX 5.2, thanks Utpal Kumar Ray for the report. * Fix XML import of very large page sizes or counts on 32bits platform, thanks to Karsten Hopp for the RedHat ticket. * Fix crash when administrator limitations such as Linux cgroup require to restrict distance matrices. Thanks to Ake Sandgren for reporting the problem. * Fix the removal of objects such as AMD Magny-Cours dual-node sockets in case of administrator restrictions. * Improve error reporting and messages in case of wrong synthetic topology description. * Several other minor internal fixes and documentation improvements. Version 1.2.1 ------------- * Improve support of AMD Bulldozer "Compute-Unit" modules by detecting logical processors with different core IDs on Linux. * Fix hwloc-ps crash when listing processes from another Linux cpuset. Thanks to Carl Smith for reporting the problem. * Fix build on AIX and Solaris. Thanks to Carl Smith and Andreas Kupries for reporting the problems. * Fix cache size detection on Darwin. Thanks to Erkcan Özcan for reporting the problem. * Make configure fail if --enable-xml or --enable-cairo is given and proper support cannot be found. Thanks to Andreas Kupries for reporting the XML problem. * Fix spurious L1 cache detection on AIX. Thanks to Hendryk Bockelmann for reporting the problem. * Fix hwloc_get_last_cpu_location(THREAD) on Linux. Thanks to Gabriele Fatigati for reporting the problem. * Fix object distance detection on Solaris. * Add pthread_self weak symbol to ease static linking. * Minor documentation fixes. Version 1.2.0 ------------- * Major features + Expose latency matrices in the API as an array of distance structures within objects. Add several helpers to find distances. + Add hwloc_topology_set_distance_matrix() and environment variables to provide a matrix of distances between a given set of objects. + Add hwloc_get_last_cpu_location() and hwloc_get_proc_last_cpu_location() to retrieve the processors where a process or thread recently ran. - Add the corresponding --get-last-cpu-location option to hwloc-bind. + Add hwloc_topology_restrict() to restrict an existing topology to a given cpuset. - Add the corresponding --restrict option to lstopo. * Minor API updates + Add hwloc_bitmap_list_sscanf/snprintf/asprintf to convert between bitmaps and strings such as 4-5,7-9,12,15- + hwloc_bitmap_set/clr_range() now support infinite ranges. + Clarify the difference between inserting Misc objects by cpuset or by parent. + hwloc_insert_misc_object_by_cpuset() now returns NULL in case of error. * Discovery improvements + x86 backend (for freebsd): add x2APIC support + Support standard device-tree phandle, to get better support on e.g. ARM systems providing it. + Detect cache size on AIX. Thanks Christopher and IBM. + Improve grouping to support asymmetric topologies. * Tools + Command-line tools now support "all" and "root" special locations consisting in the entire topology, as well as type names with depth attributes such as L2 or Group4. + hwloc-calc improvements: - Add --number-of/-N option to report the number of objects of a given type or depth. - -I is now equivalent to --intersect for listing the indexes of objects of a given type or depth that intersects the input. - Add -H to report the output as a hierarchical combination of types and depths. + Add --thissystem to lstopo. + Add lstopo-win, a console-less lstopo variant on Windows. * Miscellaneous + Remove C99 usage from code base. + Rename hwloc-gather-topology.sh into hwloc-gather-topology + Fix AMD cache discovery on freebsd when there is no L3 cache, thanks Andriy Gapon for the fix. Version 1.1.2 ------------- * Fix a segfault in the distance-based grouping code when some objects are not placed in any group. Thanks to Bernd Kallies for reporting the problem and providing a patch. * Fix the command-line parsing of hwloc-bind --mempolicy interleave. Thanks to Guy Streeter for reporting the problem. * Stop truncating the output in hwloc_obj_attr_snprintf() and in the corresponding lstopo output. Thanks to Guy Streeter for reporting the problem. * Fix object levels ordering in synthetic topologies. * Fix potential incoherency between device tree and kernel information, when SMT is disabled on Power machines. * Fix and document the behavior of hwloc_topology_set_synthetic() in case of invalid argument. Thanks to Guy Streeter for reporting the problem. * Add some verbose error message reporting when it looks like the OS gives erroneous information. * Do not include unistd.h and stdint.h in public headers on Windows. * Move config.h files into their own subdirectories to avoid name conflicts when AC_CONFIG_HEADERS adds -I's for them. * Remove the use of declaring variables inside "for" loops. * Some other minor fixes. * Many minor documentation fixes. Version 1.1.1 ------------- * Add hwloc_get_api_version() which returns the version of hwloc used at runtime. Thanks to Guy Streeter for the suggestion. * Fix the number of hugepages reported for NUMA nodes on Linux. * Fix hwloc_bitmap_to_ulong() right after allocating the bitmap. Thanks to Bernd Kallies for reporting the problem. * Fix hwloc_bitmap_from_ith_ulong() to properly zero the first ulong. Thanks to Guy Streeter for reporting the problem. * Fix hwloc_get_membind_nodeset() on Linux. Thanks to Bernd Kallies for reporting the problem and providing a patch. * Fix some file descriptor leaks in the Linux discovery. * Fix the minimum width of NUMA nodes, caches and the legend in the graphical lstopo output. Thanks to Jirka Hladky for reporting the problem. * Various fixes to bitmap conversion from/to taskset-strings. * Fix and document snprintf functions behavior when the buffer size is too small or zero. Thanks to Guy Streeter for reporting the problem. * Fix configure to avoid spurious enabling of the cpuid backend. Thanks to Tim Anderson for reporting the problem. * Cleanup error management in hwloc-gather-topology.sh. Thanks to Jirka Hladky for reporting the problem and providing a patch. * Add a manpage and usage for hwloc-gather-topology.sh on Linux. Thanks to Jirka Hladky for providing a patch. * Memory binding documentation enhancements. Version 1.1.0 ------------- * API + Increase HWLOC_API_VERSION to 0x00010100 so that API changes may be detected at build-time. + Add a memory binding interface. + The cpuset API (hwloc/cpuset.h) is now deprecated. It is replaced by the bitmap API (hwloc/bitmap.h) which offers the same features with more generic names since it applies to CPU sets, node sets and more. Backward compatibility with the cpuset API and ABI is still provided but it will be removed in a future release. Old types (hwloc_cpuset_t, ...) are still available as a way to clarify what kind of hwloc_bitmap_t each API function manipulates. Upgrading to the new API only requires to replace hwloc_cpuset_ function calls with the corresponding hwloc_bitmap_ calls, with the following renaming exceptions: - hwloc_cpuset_cpu -> hwloc_bitmap_only - hwloc_cpuset_all_but_cpu -> hwloc_bitmap_allbut - hwloc_cpuset_from_string -> hwloc_bitmap_sscanf + Add an `infos' array in each object to store couples of info names and values. It enables generic storage of things like the old dmi board infos that were previously stored in machine specific attributes. + Add linesize cache attribute. * Features + Bitmaps (and thus CPU sets and node sets) are dynamically (re-)allocated, the maximal number of CPUs (HWLOC_NBMAXCPUS) has been removed. + Improve the distance-based grouping code to better support irregular distance matrices. + Add support for device-tree to get cache information (useful on Power architectures). * Helpers + Add NVIDIA CUDA helpers in cuda.h and cudart.h to ease interoperability with CUDA Runtime and Driver APIs. + Add Myrinet Express helper in myriexpress.h to ease interoperability. * Tools + lstopo now displays physical/OS indexes by default in graphical mode (use -l to switch back to logical indexes). The textual output still uses logical by default (use -p to switch to physical indexes). + lstopo prefixes logical indexes with `L#' and physical indexes with `P#'. Physical indexes are also printed as `P#N' instead of `phys=N' within object attributes (in parentheses). + Add a legend at the bottom of the lstopo graphical output, use --no-legend to remove it. + Add hwloc-ps to list process' bindings. + Add --membind and --mempolicy options to hwloc-bind. + Improve tools command-line options by adding a generic --input option (and more) which replaces the old --xml, --synthetic and --fsys-root. + Cleanup lstopo output configuration by adding --output-format. + Add --intersect in hwloc-calc, and replace --objects with --largest. + Add the ability to work on standard input in hwloc-calc. + Add --from, --to and --at in hwloc-distrib. + Add taskset-specific functions and command-line tools options to manipulate CPU set strings in the format of the taskset program. + Install hwloc-gather-topology.sh on Linux. Version 1.0.3 ------------- * Fix support for Linux cpuset when emulated by a cgroup mount point. * Remove unneeded runtime dependency on libibverbs.so in the library and all utils programs. * Fix hwloc_cpuset_to_linux_libnuma_ulongs in case of non-linear OS-indexes for NUMA nodes. * lstopo now displays physical/OS indexes by default in graphical mode (use -l to switch back to logical indexes). The textual output still uses logical by default (use -p to switch to physical indexes). Version 1.0.2 ------------- * Public headers can now be included directly from C++ programs. * Solaris fix for non-contiguous cpu numbers. Thanks to Rolf vandeVaart for reporting the issue. * Darwin 10.4 fix. Thanks to Olivier Cessenat for reporting the issue. * Revert 1.0.1 patch that ignored sockets with unknown ID values since it only slightly helped POWER7 machines with old Linux kernels while it prevents recent kernels from getting the complete POWER7 topology. * Fix hwloc_get_common_ancestor_obj(). * Remove arch-specific bits in public headers. * Some fixes in the lstopo graphical output. * Various man page clarifications and minor updates. Version 1.0.1 ------------- * Various Solaris fixes. Thanks to Yannick Martin for reporting the issue. * Fix "non-native" builds on x86 platforms (e.g., when building 32 bit executables with compilers that natively build 64 bit). * Ignore sockets with unknown ID values (which fixes issues on POWER7 machines). Thanks to Greg Bauer for reporting the issue. * Various man page clarifications and minor updates. * Fixed memory leaks in hwloc_setup_group_from_min_distance_clique(). * Fix cache type filtering on MS Windows 7. Thanks to Αλέξανδρος Παπαδογιαννάκ for reporting the issue. * Fixed warnings when compiling with -DNDEBUG. Version 1.0.0 ------------- * The ABI of the library has changed. * Backend updates + Add FreeBSD support. + Add x86 cpuid based backend. + Add Linux cgroup support to the Linux cpuset code. + Support binding of entire multithreaded process on Linux. + Fix and enable Group support in Windows. + Cleanup XML export/import. * Objects + HWLOC_OBJ_PROC is renamed into HWLOC_OBJ_PU for "Processing Unit", its stringified type name is now "PU". + Use new HWLOC_OBJ_GROUP objects instead of MISC when grouping objects according to NUMA distances or arbitrary OS aggregation. + Rework memory attributes. + Add different cpusets in each object to specify processors that are offline, unavailable, ... + Cleanup the storage of object names and DMI infos. * Features + Add support for looking up specific PID topology information. + Add hwloc_topology_export_xml() to export the topology in a XML file. + Add hwloc_topology_get_support() to retrieve the supported features for the current topology context. + Support non-SYSTEM object as the root of the tree, use MACHINE in most common cases. + Add hwloc_get_*cpubind() routines to retrieve the current binding of processes and threads. * API + Add HWLOC_API_VERSION to help detect the currently used API version. + Add missing ending "e" to *compare* functions. + Add several routines to emulate PLPA functions. + Rename and rework the cpuset and/or/xor/not/clear operators to output their result in a dedicated argument instead of modifying one input. + Deprecate hwloc_obj_snprintf() in favor of hwloc_obj_type/attr_snprintf(). + Clarify the use of parent and ancestor in the API, do not use father. + Replace hwloc_get_system_obj() with hwloc_get_root_obj(). + Return -1 instead of HWLOC_OBJ_TYPE_MAX in the API since the latter isn't public. + Relax constraints in hwloc_obj_type_of_string(). + Improve displaying of memory sizes. + Add 0x prefix to cpuset strings. * Tools + lstopo now displays logical indexes by default, use --physical to revert back to OS/physical indexes. + Add colors in the lstopo graphical outputs to distinguish between online, offline, reserved, ... objects. + Extend lstopo to show cpusets, filter objects by type, ... + Renamed hwloc-mask into hwloc-calc which supports many new options. * Documentation + Add a hwloc(7) manpage containing general information. + Add documentation about how to switch from PLPA to hwloc. + Cleanup the distributed documentation files. * Miscellaneous + Many compilers warning fixes. + Cleanup the ABI by using the visibility attribute. + Add project embedding support. Version 0.9.4 (unreleased) -------------------------- * Fix reseting colors to normal in lstopo -.txt output. * Fix Linux pthread_t binding error report. Version 0.9.3 ------------- * Fix autogen.sh to work with Autoconf 2.63. * Fix various crashes in particular conditions: - xml files with root attributes - offline CPUs - partial sysfs support - unparseable /proc/cpuinfo - ignoring NUMA level while Misc level have been generated * Tweak documentation a bit * Do not require the pthread library for binding the current thread on Linux * Do not erroneously consider the sched_setaffinity prototype is the old version when there is actually none. * Fix _syscall3 compilation on archs for which we do not have the sched_setaffinity system call number. * Fix AIX binding. * Fix libraries dependencies: now only lstopo depends on libtermcap, fix binutils-gold link * Have make check always build and run hwloc-hello.c * Do not limit size of a cpuset. Version 0.9.2 ------------- * Trivial documentation changes. Version 0.9.1 ------------- * Re-branded to "hwloc" and moved to the Open MPI project, relicensed under the BSD license. * The prefix of all functions and tools is now hwloc, and some public functions were also renamed for real. * Group NUMA nodes into Misc objects according to their physical distance that may be reported by the OS/BIOS. May be ignored by setting HWLOC_IGNORE_DISTANCES=1 in the environment. * Ignore offline CPUs on Solaris. * Improved binding support on AIX. * Add HP-UX support. * CPU sets are now allocated/freed dynamically. * Add command line options to tune the lstopo graphical output, add semi-graphical textual output * Extend topobind to support multiple cpusets or objects on the command line as topomask does. * Add an Infiniband-specific helper hwloc/openfabrics-verbs.h to retrieve the physical location of IB devices. Version 0.9 (libtopology) ------------------------- * First release.