Prepare a program for profiling

In most cases, if your program is already compiled with debugging symbols (-g), you do not need to recompile your program to use it with MAP. However, in some cases it might need to be relinked, as explained in Linking.

Typically you should keep optimization flags enabled when profiling (rather than profiling a debug build). This will give more representative results.

The recommended set of compilation flags are:

  • Arm Compiler for Linux: -g1 -O3 -fno-inline-functions -fno-optimize-sibling-calls

  • Cray Fortran: -G2 -O3 -h ipa0

  • Cray Clang C and C++: -g1 -O3 -fno-inline-functions -fno-optimize-sibling-calls

  • GNU: -g1 -O3 -fno-inline -fno-optimize-sibling-calls

  • Intel: -debug minimal -O3 -fno-inline -no-ip -no-ipo -fno-omit-frame-pointer -fno-optimize-sibling-calls

  • NVIDIA HPC: -g -O3 -Meh_frame -Mnoautoinline

These flags preserve most performance optimizations whilst enabling file and line number information and maximizing stack trace readability by disabling features that might prevent MAP from obtaining stack traces (such as function inlining and tail call optimization). Minimal debug information is also used to reduce memory usage during profiling.

Debug symbols

If your compiler supports minimal debug info, consider using it (for file and line number information only) instead of full debug info. For GCC, Arm® Compiler for Linux, and Intel, this means using -g1 for compiling instead of -g.

Although this can cause inlined functions not to be shown in profiles, it can significantly reduce the memory overhead when profiling.

This is particularly relevant for complex C++ codes, memory-constrained compute nodes, or when profiling many processes per node.

You can also use MAP on programs without debug information. In this case inlined functions are not shown, and the source code cannot be shown but other features will work as expected.

For some compilers, it is necessary to explicitly enable frame pointer information to ensure stack traces, particularly when debug information has been disabled. This is normally done with -fno-omit-frame-pointer (or -Meh_frame for NVIDIA HPC).

Cray compiler

For the Cray compiler we recommend using the -G2 option.

CUDA programs

When compiling CUDA kernels, do not generate debug information for device code (the -G or --device-debug flag) as this can significantly impair runtime performance. Use -lineinfo instead, for example:

nvcc device.cu -c -o device.o -g -lineinfo -O3

Disable function inlining

While compilers can inline functions, their ability to include sufficient information to reconstruct the original call tree varies between vendors and is not possible if compiling your program with minimal debug info (file & line info only) or without debug info.

To maximize readability of call trees, we recommend that you disable function inlining using the appropriate compiler-specific flags (see Prepare a program for profiling).

Note

Some compilers might still inline functions even when they are explicitly instructed not to do so.

There is typically a small performance penalty for disabling function inlining or enabling profiling information.

Disable tail call optimization

A function can return the result of calling another function, for example:

int someFunction()
{
   ...
   return otherFunction();
}

In this case, the compiler can change the call to otherFunction into a jump. This means that, when inside otherFunction, the calling function, someFunction, no longer appears on the stack.

This optimization, called tail recursion optimization, can be disabled by passing the -fno-optimize-sibling-calls argument to most compilers.

Linking

To collect data from your program, MAP uses two small profiler libraries, map-sampler and map-sampler-pmpi. These profiler libraries must be linked with your program. On most systems MAP can do this automatically without any action by you. This is done via the system’s LD_PRELOAD mechanism, which allows an extra library into your program when starting it.

Note

Although these libraries contain the word map, they are used for Linaro MAP and Linaro Performance Reports.

This automatic linking when you start your program only works if your program is dynamically-linked. Programs can be dynamically-linked or statically-linked. For MPI programs this is normally determined by your MPI library. Most MPI libraries are configured with --enable-dynamic by default, and mpicc/mpif90 produce dynamically-linked executables that Linaro MAP can automatically collect data from.

The map-sampler-pmpi library is a temporary file that is precompiled and copied or compiled at runtime in the directory ~/.allinea/wrapper.

If your home directory will not be accessible by all nodes in your cluster you can change where the map-sampler-pmpi library will be created by altering the shared directory as described in No shared home directory.

The temporary library will be created in the .allinea/wrapper subdirectory to this shared directory.

For Cray X-Series Systems the shared directory is not applicable, instead map-sampler-pmpi is copied into a hidden .allinea sub-directory of the current working directory.

If Linaro MAP warns you that it could not pre-load the Linaro Forge sampler libraries, this often means that your MPI library was not configured with --enable-dynamic, or that the LD_PRELOAD mechanism is not supported on your platform. You now have three options:

  • Try compiling and linking your code dynamically. On most platforms this allows MAP to use the LD_PRELOAD mechanism to automatically insert its libraries into your program at runtime.

  • Link MAP’s map-sampler and map-sampler-pmpi libraries with your program at link time manually.

    See Dynamic linking on Cray X-Series systems, or Static linking and Static linking on Cray X-Series systems.

  • Finally, it may be that your system supports dynamic linking but you have a statically-linked MPI. You can try to recompile the MPI implementation with --enable-dynamic, or find a dynamically-linked version on your system and recompile your program using that version. This will produce a dynamically-linked program that MAP can automatically collect data from.

Dynamic linking on Cray X-Series systems

If the LD_PRELOAD mechanism is not supported on your Cray X-Series system, you can try to dynamically link your program explicitly with the MAP sampling libraries.

Compile the MPI Wrapper Library

  1. Compile the MPI wrapper library for your system using the make-profiler-libraries --platform=cray --lib-type=shared command.

    Note

    Performance Reports also uses this library.

    user@login:~/myprogram$ make-profiler-libraries --platform=cray --lib-type=shared
    
       Created the libraries in /home/user/myprogram:
       libmap-sampler.so       (and .so.1, .so.1.0, .so.1.0.0)
       libmap-sampler-pmpi.so  (and .so.1, .so.1.0, .so.1.0.0)
    
       To instrument a program, add these compiler options:
       compilation for use with MAP - not required for Performance Reports:
          -g (or '-G2' for native Cray Fortran) (and -O3 etc.)
       linking (both MAP and Performance Reports):
          -dynamic -L/home/user/myprogram -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr
    
       Note: These libraries must be on the same NFS/Lustre/GPFS filesystem as your program.
    
       Before running your program (interactively or from a queue), set
       LD_LIBRARY_PATH:
       export LD_LIBRARY_PATH=/home/user/myprogram:$LD_LIBRARY_PATH
       map  ...
       or add -Wl,-rpath=/home/user/myprogram when linking your program.
    
  2. Link with the MPI wrapper library

    mpicc -G2 -o hello hello.c -dynamic -L/home/user/myprogram \
          -lmap-sampler-pmpi -lmap-sampler -Wl,--eh-frame-hdr
    

NVIDIA HPC Compiler

When linking OpenMP programs, you must pass the -Bdynamic command line argument to the compiler when linking dynamically

Static linking

If you compile your program statically, that is your MPI uses a static library or you pass the -static option to the compiler, you must explicitly link your program with the Linaro Forge sampler and MPI wrapper libraries.

Compile the MPI Wrapper Library

  1. Compile the MPI wrapper library for your system using the make-profiler-libraries --lib-type=static command.

    Note

    Performance Reports also uses this library.

    user@login:~/myprogram$ make-profiler-libraries --lib-type=static
    
    Created the libraries in /home/user/myprogram:
        libmap-sampler.a
        libmap-sampler-pmpi.a
    
    To instrument a program, add these compiler options:
        compilation for use with MAP - not required for Performance Reports:
          -g (and -O3 etc.)
          linking (both MAP and Performance Reports):
          -Wl,@/home/user/myprogram/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES
          If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then
          these must appear *after* the Forge sampler and MPI wrapper libraries in
          the link line.  There's a comprehensive description of the link ordering
          requirements in the 'Prepare a Program for Profiling' section of
          userguide-forge.pdf, located in /opt/linaro/forge/x.y.z/doc/.
    
  2. Link with the MPI wrapper library. The -Wl,@/home/user/myprogram/allinea-profiler.ld syntax tells the compiler to look in /home/user/myprogram/allinea-profiler.ld for instructions on how to link with the Linaro Forge sampler. Usually this is sufficient, but not in all cases. The rest of this section explains how to manually add the Linaro Forge sampler to your link line.

NVIDIA HPC Compiler

The NVIDIA HPC C runtime static library contains an undefined reference to __kmpc_fork_call. This causes compilation to fail when linking allinea-profiler.ld. Add --undefined __wrap___kmpc_fork_call to your link line before linking to the Linaro Forge sampler.

Cray

When linking C++ programs you might encounter a conflict between the Cray C++ runtime and the GNU C++ runtime used by the Linaro MAP libraries with an error similar to the one below:

/opt/cray/cce/8.2.5/CC/x86-64/lib/x86-64/libcray-c++-rts.a(rtti.o)
    : In function '__cxa_bad_typeid':
/ptmp/ulib/buildslaves/cfe-82-edition-build/tbs/cfe/lib_src/rtti.c
    :1062: multiple definition of '__cxa_bad_typeid'
/opt/gcc/4.4.4/snos/lib64/libstdc++.a(eh_aux_runtime.o):/tmp/peint
   /gcc/repackage/4.4.4c/BUILD/snos_objdir/x86_64-suse-linux/libstdc++-v3/libsupc++/../../../../xt-gcc-4.4.4/libstdc++-v3/libsupc++/eh_aux_runtime.cc:46: first defined here

You can resolve this conflict by removing -lstdc++ and -lgcc_eh from allinea-profiler.ld.

When linking your program you might encounter undefined references similar to the ones below:

ld.lld: error: undefined symbol: pstart_pes
ld.lld: error: undefined symbol: pshmem_init
ld.lld: error: undefined symbol: p_my_pe
ld.lld: error: undefined symbol: pshmem_barrier_all
ld.lld: error: undefined symbol: pshmem_finalize

You can resolve this by ensuring that the cray-openshmemx and cray-pmi modules are loaded.

-lpthread

When linking -Wl,@allinea-profiler.ld must go before the -lpthread command-line argument if present.

Manual Linking

When linking your program you must add the path to the profiler libraries (-L/path/to/profiler-libraries), and the libraries themselves (-lmap-sampler-pmpi, -lmap-sampler).

The MPI wrapper library (-lmap-sampler-pmpi) must go:

  1. After your program’s object (.o) files.

  2. After your program’s own static libraries, for example -lmylibrary.

  3. After the path to the profiler libraries (-L/path/to/profiler-libraries).

  4. Before the MPI’s Fortran wrapper library, if any. For example -lmpichf.

  5. Before the MPI’s implementation library usually -lmpi.

  6. Before the Linaro Forge sampler library -lmap-sampler.

The Linaro Forge sampler library -lmap-sampler must go:

  1. After the MPI wrapper library.

  2. After your program’s object (.o) files.

  3. After your program’s own static libraries, for example -lmylibrary.

  4. After -Wl,--undefined,allinea_init_sampler_now.

  5. After the path to the profiler libraries (-L/path/to/profiler-libraries).

  6. Before -lstdc++, -lgcc_eh, -lrt, -lpthread, -ldl, -lm and -lc.

    For example:

    mpicc hello.c -o hello -g -L/users/ddt/linaro \
        -lmap-sampler-pmpi \
        -Wl,--undefined,allinea_init_sampler_now \
        -lmap-sampler -lstdc++ -lgcc_eh -lrt \
        -Wl,--whole-archive -lpthread \
        -Wl,--no-whole-archive \
        -Wl,--eh-frame-hdr \
        -ldl \
        -lm
    
    mpif90 hello.f90 -o hello -g -L/users/ddt/linaro \
        -lmap-sampler-pmpi \
        -Wl,--undefined,allinea_init_sampler_now \
        -lmap-sampler -lstdc++ -lgcc_eh -lrt \
        -Wl,--whole-archive -lpthread \
        -Wl,--no-whole-archive \
        -Wl,--eh-frame-hdr \
        -ldl \
       -lm
    

Static linking on Cray X-Series systems

Compile the MPI Wrapper Library

  1. On Cray X-Series systems, you can compile the MPI wrapper library using make-profiler-libraries --platform=cray --lib-type=static:

    Created the libraries in /home/user/myprogram:
       libmap-sampler.a
       libmap-sampler-pmpi.a
    
    To instrument a program, add these compiler options:
       compilation for use with MAP - not required for Performance Reports:
       -g (or -G2 for native Cray Fortran) (and -O3 etc.)
       linking (both MAP and Performance Reports):
       -Wl,@/home/user/myprogram/allinea-profiler.ld ... EXISTING_MPI_LIBRARIES
       If your link line specifies EXISTING_MPI_LIBRARIES (e.g. -lmpi), then
       these must appear *after* the Forge sampler and MPI wrapper libraries in
       the link line.  There's a comprehensive description of the link ordering
       requirements in the 'Prepare a Program for Profiling' section of
       userguide-forge.pdf, located in /opt/linaro/forge/x.y.z/doc/.
    
  2. Link with the MPI wrapper library using:

    cc hello.c -o hello -g -Wl,@allinea-profiler.ld
    
    ftn hello.f90 -o hello -g -Wl,@allinea-profiler.ld
    

Dynamic and static linking on Cray X-Series systems using the modules environment

If your system has the Linaro Forge module files installed, you can load them and build your program as usual. See map-link modules installation on Cray X-Series.

  1. module load forge or ensure that make-profiler-libraries is in your PATH.

  2. module load map-link-static or module load map-link-dynamic.

  3. Recompile your program.

Unsupported user applications

Ensure that the program to be profiled does not set or unset the SIGPROF signal handler. This interferes with the MAP profiling function and can cause it to fail.

We recommend that you do not use Linaro MAP to profile programs that contain instructions to perform MPI profiling using MPI wrappers and the MPI standard profiling interface, PMPI. This is because MAP’s own MPI wrappers may conflict with those contained in the program, producing incorrect metrics.