Version | Date | Description |
2.6.1 |
15-Dec-2022 |
Latest Release MD5sum: 56a49be3c2fe1c021ceeb8780a73757d
Includes:
- Support for
- Score-P v8.0, incl. OpenMP measurements via OMPT
- OTF2 v3.0
- Cube C Writer v4.8
- Build system improvements:
- Add proper support for additional compiler suites via --with-nocross-compiler-suite=<suite>:
- amdclang: AMD ROCm compilers (amdclang, amdclang++)
- nvhpc: NVIDIA HPC Toolkit compilers (nvc, nvc++)
- oneapi: Intel oneAPI compilers (icx, icpx)
- Automatic trace analyzer changes & improvements:
- Fixed a performance regression when analyzing traces including OpenMP with few instrumented user regions.
- Improved metric data aggregation and writing.
- Measurement nexus (scan) changes:
- Removed stalled resources counter from default POP multi-run preset and added an enhanced preset pop-with-stalled-resources.
|
2.6 |
19-Apr-2021 |
MD5sum: 566657db21f7bf87a7009653d330d8bf
Includes:
- Build system improvements:
- Auto-detect Cray XC platforms with ARM CPUs, supporting Cray, ARM, and GCC compilers
- Added support for Clang and AMD AOCC compilers
- Updated support for Spectrum MPI
- Automatic trace analyzer changes & improvements:
- Revised 'Early Reduce' wait state definition.
- Added calculation of 'Early Reduce' delay costs.
- Fixed various delay cost calculation and propagation issues.
- Fixed various inconsistencies between wait-state and root-cause analysis.
- Made POSIX threads analysis consistent with Score-P by avoiding thread function stub call paths underneath pthread_create. This also fixes a deadlock when analyzing traces containing "orphaned threads".
- Measurement nexus (scan) changes:
- Added preset mode for multi-run measurements with a preset for POP analysis requirements as an use case.
- Added support for multiple file systems in SCAN_TRACE_FILESYS by using a colon separated list of paths.
- Analysis report postprocessing changes:
- Add metric hierarchies for CUDA, OpenCL, and OpenACC. (NOTE: The trace analysis still only supports host-side events!)
- Renamed '-c' command-line option of square to '-C' for running sanity checks on newly created reports.
- Added new '-c' command-line option to square to allow specifying the number of counters considered during report scoring (for consistency with scorep-score).
- Added new '-x' command-line option to square to allow passing options directly through to scorep-score.
- Avoid unnecessary aggregation/postprocessing of reports with multi-run experiments.
- Substantial code cleanup.
|
2.5 |
22-Mar-2019 |
MD5sum: c4ea190408ef34592008f9706dc30a0c
Includes:
- Support for
- Score-P v5.0, incl. virtual process/thread topologies
- Automatic trace analyzer changes & improvements:
- Various fixes and improvements in timestamp correction algorithm.
- Fixed 'Late Receiver' instance tracking.
- Slightly improved analysis report data collation.
- Added support for multi-run experiments.
- Code refactoring and various bug fixes.
- Improved user documentation:
- Revised User Guide including command reference.
- Added man pages.
|
2.4 |
14-May-2018 |
MD5sum: c9d09b71721a8345f172fc05debc38b3
Includes:
- Support for
- Build system improvements:
- Fix build issues with compilers defaulting to C++11 or higher (e.g., Intel 2017, PGI 17).
- Fix build issues with PGI 16+ compilers (pgCC no longer available)
- Fix build issues on Cray systems, now also properly taking CRAYPE_LINK_TYPE setting into account
- Automatic trace analyzer changes & improvements:
- Fix rare crash/deadlock in critical-path/delay analysis while analyzing MPI persistent communication.
- Improved memory management.
- Improved handling of OTF2 traces in SIONlib containers.
- Improved trace reading times, especially at scale.
- Fixed detection of wait states in active-target synchronization based on EPIK traces
- Code refactoring and various bug fixes.
|
2.3.1 |
20-May-2016 |
MD5sum: a83ced912b9d2330004cb6b9cefa7585
Includes:
- Build system improvements:
- Fixed build issue with GCC 6.1.
- Fixed build issue on the Intel Xeon Phi platform.
|
2.3 |
14-Apr-2016 |
MD5sum: de782c8b6ecfce0e16a4b143ba7a9b5a
Includes:
- Support for
- Automatic trace analyzer changes & improvements:
- Experimental support for Score-P traces collected using sampling (see OPEN_ISSUES for limitations).
- Improved analysis report postprocessing:
- Revised metric hierarchies (organization, metric naming, etc).
- Suppress calculation of performance properties that are only relevant for unused parallel programming models.
- Performance property documentation fixes & improvements.
- Build system improvements.
- Code refactoring and various bug fixes.
|
2.2.2 |
19-June-2015 |
MD5sum: 2bafce988b0522d18072f7771e491ab9
Includes:
- Platform support:
- Fixed a build issue on the Intel Xeon Phi platform.
- Improved support for the ibrun launcher.
- Automatic trace analyzer changes & improvements:
- Worked around rare run-time issue with MVAPICH2.
|
2.2.1 |
08-May-2015 |
MD5sum: e5083a75160257f8e2051fbe113272cd
Includes:
- Platform support:
- Added build system support for Power8/Linux.
- Added build system support for 64-bit ARM/Linux (AArch64).
- Prefer linking static over dynamic Cube/OTF2 libraries on Fujitsu K/FX10/FX100.
- Automatic trace analyzer changes & improvements:
- Fixed delay-cost propagation through OpenMP barrier wait states.
- Various algorithmic optimizations reducing overall analysis time for traces of multi-threaded applications:
- Improved memory management.
- Improved trace preprocessing.
- Improved timestamp correction.
- Code refactoring and various bug fixes.
|
2.2 |
30-Jan-2015 |
MD5sum: 06e0380c612811a1ff9c183fed9331a9
Includes:
- Support for
- Score-P v1.4
- OTF2 v1.5, incl. full SIONlib support (if configured)
- Cube v4.3
- Platform support:
- Added support for Intel Xeon Phi, native mode only.
- Added support for Fujitsu FX100 (thanks to T. Nakamura, Fujitsu Ltd).
- Automatic trace analyzer changes & improvements:
- Added basic support for POSIX threads.
- Added basic support for OpenMP tasking.
- Added lock contention analysis (OpenMP & POSIX threads).
- Added root-cause/delay analysis (MPI & OpenMP).
- New command-line options '--[no-]rootcause'.
- Code refactoring and various bug fixes.
|
2.1 |
29-Aug-2014 |
MD5sum: bab9c2b021e51e2ba187feec442b96e6
Includes:
- Support for
- Platform support:
- Added support for Fujitsu FX10 & K computer.
- Improved support for Cray systems.
- Automatic trace analyzer changes & improvements:
- Added Critical-path analysis.
- Improved Late Receiver detection.
- New command-line options '--[no-]critical-path' and '--single-pass'.
- Fixed crash in data collation when number of OpenMP threads varied among MPI processes.
- Code refactoring and various small bug fixes.
- Initial version of updated User Guide (still work in progress).
|
2.0 |
13-Aug-2013 |
MD5sum: 0a666d4aef8ec5d32b77d1e034321fd1
Includes:
- Support for
- Score-P v1.2
- OTF2 v1.2
- Cube v4.2
- New build system based on GNU autotools.
- Significant amount of code refactoring.
- Automatic trace analyzer changes & improvements:
- Support for arbitrary deep system trees.
- Improved performance of timestamp correction.
- Pattern instance tracking and statistics are now enabled by default.
- New command-line options '--verbose', '--[no-]time-correct', and '--[no-]statistics'.
- Limited backward-compatibility support for handling existing traces in EPILOG format generated by Scalasca v1.
|