Prepare a program for profiling
In most cases, if your program is already compiled with debugging symbols (-g), you do not need to recompile your program to use it with MAP. However, in some cases it might need to be relinked, as explained in Linking.
Typically you should keep optimization flags enabled when profiling (rather than profiling a debug build). This will give more representative results.
The recommended set of compilation flags are:
Arm Compiler for Linux:
-g1 -O3 -fno-inline-functions -fno-optimize-sibling-calls
Cray Fortran:
-G2 -O3 -h ipa0
Cray Clang C and C++:
-g1 -O3 -fno-inline-functions -fno-optimize-sibling-calls
GNU:
-g1 -O3 -fno-inline -fno-optimize-sibling-calls
Intel:
-debug minimal -O3 -fno-inline -no-ip -no-ipo -fno-omit-frame-pointer -fno-optimize-sibling-calls
NVIDIA HPC:
-g -O3 -Meh_frame -Mnoautoinline
These flags preserve most performance optimizations whilst enabling file and line number information and maximizing stack trace readability by disabling features that might prevent MAP from obtaining stack traces (such as function inlining and tail call optimization). Minimal debug information is also used to reduce memory usage during profiling.
Debug symbols
If your compiler supports minimal debug info, consider using it (for
file and line number information only) instead of full debug info. For
GCC, Arm® Compiler for Linux, and Intel, this means using -g1
for compiling instead of -g.
Although this can cause inlined functions not to be shown in profiles, it can significantly reduce the memory overhead when profiling.
This is particularly relevant for complex C++ codes, memory-constrained compute nodes, or when profiling many processes per node.
You can also use MAP on programs without debug information. In this case inlined functions are not shown, and the source code cannot be shown but other features will work as expected.
For some compilers, it is necessary to explicitly enable frame pointer
information to ensure stack traces, particularly when debug information
has been disabled. This is normally done with
-fno-omit-frame-pointer
(or -Meh_frame
for NVIDIA HPC).
Cray compiler
For the Cray compiler we recommend using the -G2
option.
CUDA programs
When compiling CUDA kernels, do not generate debug information for device
code (the -G
or --device-debug
flag) as this can significantly
impair runtime performance. Use -lineinfo
instead, for example:
nvcc device.cu -c -o device.o -g -lineinfo -O3
Disable function inlining
While compilers can inline functions, their ability to include sufficient information to reconstruct the original call tree varies between vendors and is not possible if compiling your program with minimal debug info (file & line info only) or without debug info.
To maximize readability of call trees, we recommend that you disable function inlining using the appropriate compiler-specific flags (see Prepare a program for profiling).
Note
Some compilers might still inline functions even when they are explicitly instructed not to do so.
There is typically a small performance penalty for disabling function inlining or enabling profiling information.
Disable tail call optimization
A function can return the result of calling another function, for example:
int someFunction()
{
...
return otherFunction();
}
In this case, the compiler can change the call to
otherFunction
into a jump. This means that, when inside otherFunction
, the calling function, someFunction
, no longer appears on the stack.
This optimization, called tail recursion optimization, can be disabled
by passing the -fno-optimize-sibling-calls
argument to most compilers.
Linking
To collect data from your program, MAP uses two small profiler libraries,
map-sampler
and map-sampler-pmpi
. These profiler libraries must be linked with your program. On most systems MAP can do this automatically without any action by you. This is done via the system’s LD_PRELOAD mechanism, which allows an extra library into your program when starting it.
Note
Although these libraries contain the word map, they are used for Linaro MAP and Linaro Performance Reports.
This automatic linking when you start your program only works if your
program is dynamically-linked. Programs can be dynamically-linked or
statically-linked. For MPI programs this is normally determined by
your MPI library. Most MPI libraries are configured with
--enable-dynamic
by default, and mpicc/mpif90 produce dynamically-linked executables that Linaro MAP can automatically collect data from.
The map-sampler-pmpi
library is a temporary file that is precompiled and copied or compiled at runtime in the directory ~/.allinea/wrapper.
If your home directory will not be accessible by all nodes in your cluster you can change where the
map-sampler-pmpi
library will be created by altering the shared directory as described in No shared home directory.
The temporary library will be created in the .allinea/wrapper subdirectory to this shared directory
.
For Cray X-Series Systems the shared directory is not applicable, instead map-sampler-pmpi
is copied into a hidden .allinea sub-directory of the current working directory.
If Linaro MAP warns you that it could not pre-load the Linaro Forge sampler libraries, this
often means that your MPI library was not configured with
--enable-dynamic
, or that the LD_PRELOAD mechanism is not supported on your platform. You now have three options:
Try compiling and linking your code dynamically. On most platforms this allows MAP to use the LD_PRELOAD mechanism to automatically insert its libraries into your program at runtime.
Link MAP’s
map-sampler
andmap-sampler-pmpi
libraries with your program at link time manually.See Dynamic linking on Cray X-Series systems, or Static linking and Static linking on Cray X-Series systems.
Finally, it may be that your system supports dynamic linking but you have a statically-linked MPI. You can try to recompile the MPI implementation with
--enable-dynamic
, or find a dynamically-linked version on your system and recompile your program using that version. This will produce a dynamically-linked program that MAP can automatically collect data from.
Dynamic linking on Cray X-Series systems
If the LD_PRELOAD mechanism is not supported on your Cray X-Series system, you can try to dynamically link your program explicitly with the MAP sampling libraries.
Compile the MPI Wrapper Library
Compile the MPI wrapper library for your system using the
make-profiler-libraries --platform=cray --lib-type=shared
command.Note
Performance Reports also uses this library.
user@login:~/myprogram$ make-profiler-libraries --platform=cray --lib-type=shared Created the libraries in /home/user/myprogram: libmap-sampler.so (and .so.1, .so.1.0, .so.1.0.0) libmap-sampler-pmpi.so (and .so.1, .so.1.0, .so.1.0.0) To instrument a program, add these compiler options: compilation for use with MAP - not required for Performance Reports: -g (or '-G2' for native Cray Fortran) (and -O3 etc.) linking (both MAP and Performance Reports): -dynamic -L/home/user/myprogram -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr Note: These libraries must be on the same NFS/Lustre/GPFS filesystem as your program. Before running your program (interactively or from a queue), set LD_LIBRARY_PATH: export LD_LIBRARY_PATH=/home/user/myprogram:$LD_LIBRARY_PATH map ... or add -Wl,-rpath=/home/user/myprogram when linking your program.
Link with the MPI wrapper library
mpicc -G2 -o hello hello.c -dynamic -L/home/user/myprogram \ -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr
NVIDIA HPC Compiler
When linking OpenMP programs, you must pass the -Bdynamic
command line argument to the compiler when linking dynamically
Static linking
If you compile your program statically, that is your MPI uses a static library or you pass the
-static
option to the compiler, you must explicitly link your program with the Linaro Forge sampler and MPI wrapper libraries.
Compile the MPI Wrapper Library
Compile the MPI wrapper library for your system using the
make-profiler-libraries --lib-type=static
command.Note
Performance Reports also uses this library.
user@login:~/myprogram$ make-profiler-libraries --lib-type=static Created the libraries in /home/user/myprogram: libmap-sampler.a libmap-sampler-pmpi.a To instrument a program, add these compiler options: compilation for use with MAP - not required for Performance Reports: -g (and -O3 etc.) linking (both MAP and Performance Reports): -Wl,@/home/user/myprogram/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then these must appear *after* the Forge sampler and MPI wrapper libraries in the link line. There's a comprehensive description of the link ordering requirements in the 'Prepare a Program for Profiling' section of userguide-forge.pdf, located in /opt/linaro/forge/x.y.z/doc/.
Link with the MPI wrapper library. The
-Wl,@/home/user/myprogram/allinea-profiler.ld
syntax tells the compiler to look in/home/user/myprogram/allinea-profiler.ld
for instructions on how to link with the Linaro Forge sampler. Usually this is sufficient, but not in all cases. The rest of this section explains how to manually add the Linaro Forge sampler to your link line.
NVIDIA HPC Compiler
The NVIDIA HPC C runtime static library contains an undefined reference to
__kmpc_fork_call
. This causes compilation to fail when linking allinea-profiler.ld. Add --undefined __wrap___kmpc_fork_call
to your link line before linking to the Linaro Forge sampler.
Cray
When linking C++ programs you might encounter a conflict between the Cray C++ runtime and the GNU C++ runtime used by the Linaro MAP libraries with an error similar to the one below:
/opt/cray/cce/8.2.5/CC/x86-64/lib/x86-64/libcray-c++-rts.a(rtti.o)
: In function '__cxa_bad_typeid':
/ptmp/ulib/buildslaves/cfe-82-edition-build/tbs/cfe/lib_src/rtti.c
:1062: multiple definition of '__cxa_bad_typeid'
/opt/gcc/4.4.4/snos/lib64/libstdc++.a(eh_aux_runtime.o):/tmp/peint
/gcc/repackage/4.4.4c/BUILD/snos_objdir/x86_64-suse-linux/libstdc++-v3/libsupc++/../../../../xt-gcc-4.4.4/libstdc++-v3/libsupc++/eh_aux_runtime.cc:46: first defined here
You can resolve this conflict by removing
-lstdc++
and -lgcc_eh
from allinea-profiler.ld.
When linking your program you might encounter undefined references similar to the ones below:
ld.lld: error: undefined symbol: pstart_pes
ld.lld: error: undefined symbol: pshmem_init
ld.lld: error: undefined symbol: p_my_pe
ld.lld: error: undefined symbol: pshmem_barrier_all
ld.lld: error: undefined symbol: pshmem_finalize
You can resolve this by ensuring that the cray-openshmemx
and cray-pmi
modules are loaded.
-lpthread
When linking -Wl,@allinea-profiler.ld
must go before the -lpthread
command-line argument if present.
Manual Linking
When linking your program you must add the path to the profiler
libraries
(-L/path/to/profiler-libraries
), and the libraries themselves
(-lmap-sampler-pmpi
, -lmap-sampler
).
The MPI wrapper library (-lmap-sampler-pmpi
) must go:
After your program’s object (
.o
) files.After your program’s own static libraries, for example
-lmylibrary
.After the path to the profiler libraries (
-L/path/to/profiler-libraries
).Before the MPI’s Fortran wrapper library, if any. For example
-lmpichf
.Before the MPI’s implementation library usually
-lmpi.
Before the Linaro Forge sampler library
-lmap-sampler
.
The Linaro Forge sampler library -lmap-sampler
must go:
After the MPI wrapper library.
After your program’s object (
.o
) files.After your program’s own static libraries, for example
-lmylibrary
.After
-Wl,--undefined,allinea_init_sampler_now
.After the path to the profiler libraries (
-L/path/to/profiler-libraries
).Before
-lstdc++, -lgcc_eh, -lrt, -lpthread, -ldl, -lm and -lc
.For example:
mpicc hello.c -o hello -g -L/users/ddt/linaro \ -lmap-sampler-pmpi \ -Wl,--undefined,allinea_init_sampler_now \ -lmap-sampler -lstdc++ -lgcc_eh -lrt \ -Wl,--whole-archive -lpthread \ -Wl,--no-whole-archive \ -Wl,--eh-frame-hdr \ -ldl \ -lm mpif90 hello.f90 -o hello -g -L/users/ddt/linaro \ -lmap-sampler-pmpi \ -Wl,--undefined,allinea_init_sampler_now \ -lmap-sampler -lstdc++ -lgcc_eh -lrt \ -Wl,--whole-archive -lpthread \ -Wl,--no-whole-archive \ -Wl,--eh-frame-hdr \ -ldl \ -lm
Static linking on Cray X-Series systems
Compile the MPI Wrapper Library
On Cray X-Series systems, you can compile the MPI wrapper library using
make-profiler-libraries --platform=cray --lib-type=static
:Created the libraries in /home/user/myprogram: libmap-sampler.a libmap-sampler-pmpi.a To instrument a program, add these compiler options: compilation for use with MAP - not required for Performance Reports: -g (or -G2 for native Cray Fortran) (and -O3 etc.) linking (both MAP and Performance Reports): -Wl,@/home/user/myprogram/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then these must appear *after* the Forge sampler and MPI wrapper libraries in the link line. There's a comprehensive description of the link ordering requirements in the 'Prepare a Program for Profiling' section of userguide-forge.pdf, located in /opt/linaro/forge/x.y.z/doc/.
Link with the MPI wrapper library using:
cc hello.c -o hello -g -Wl,@allinea-profiler.ld ftn hello.f90 -o hello -g -Wl,@allinea-profiler.ld
Dynamic and static linking on Cray X-Series systems using the modules environment
If your system has the Linaro Forge module files installed, you can load them and build your program as usual. See map-link modules installation on Cray X-Series.
module load forge
or ensure thatmake-profiler-libraries
is in your PATH.module load map-link-static
ormodule load map-link-dynamic
.Recompile your program.
map-link modules installation on Cray X-Series
To facilitate dynamic and static linking of user programs with the MPI wrapper library and Linaro Forge sampler libraries Cray X-Series System Administrators can integrate the map-link-dynamic and map-link-static modules into their module system. Templates for these modules are supplied as part of the Linaro Forge package.
Copy files share/modules/cray/map-link-*
into a dedicated directory on the system.
For each of the two module files copied:
Find the line starting with conflict and correct the prefix to refer to the location the module files were installed. For example,
forge/map-link-static
. The correct prefix depends on the subdirectory (if any) under the module search path themap-link-*
module files were installed.Find the line starting with set MAP_LIBRARIES_DIRECTORY “NONE” and replace
NONE
with a user writable directory accessible from the login and compute nodes.
After installed you can verify whether or not the prefix has been set
correctly with module avail
, the prefix shown by this command for the
map-link-dynamic
and map-link-static
modules should match the
prefix set in the conflict line of the module sources.
Unsupported user applications
Ensure that the program to be profiled does not set or unset the SIGPROF signal handler. This interferes with the MAP profiling function and can cause it to fail.
We recommend that you do not use Linaro MAP to profile programs that contain instructions to perform MPI profiling using MPI wrappers and the MPI standard profiling interface, PMPI. This is because MAP’s own MPI wrappers may conflict with those contained in the program, producing incorrect metrics.