Profiling settings

Issue

Your profiling session exhausts the available free memory on a node while Linaro MAP or Linaro Performance Reports is launching, profiling or analyzing the results when using Open MPI 4+ (excluding Open MPI (Compatibility)), MPICH 4, or Cray MPICH under SLURM (MPMD).

Solution

FORGE_MULTIPLEXER_RATIO can be used to trade increased time overhead during launch and analysis for reduced memory overhead. This environment variable accepts a range of values depending on how much memory overhead needs reducing. The value takes the form N:M. This is the ratio of Linaro MAP worker processes (N) to user processes (M). Examples are as follows.

  • FORGE_MULTIPLEXER_RATIO=1:2 results in one Linaro MAP worker process per two ranks.

    This will provide the most memory reduction for the least increase in time overhead.

  • FORGE_MULTIPLEXER_RATIO=1:3 results in one Linaro MAP worker process per three ranks.

    This will further reduce memory overhead for a further increase in time overhead.

Additionally, FORGE_MULTIPLEXER_RATIO=0 results in one Linaro MAP worker process for all ranks. This provides the most memory reduction with the largest increase in time overhead.

Issue

Your program appears to run correctly during profiling, but runs out of memory while Linaro MAP or Linaro Performance Reports collects the results.

Solution

Define FORGE_REDUCE_MEMORY_USAGE=1 in your environment and then rerun Linaro MAP. This environment variable causes the results for each process on a node to be processed sequentially instead of processed in parallel.

This reduces the amount of free memory needed on each node, but takes longer to complete.