Linaro MAP
Linaro MAP is a parallel profiler that shows you the longest running lines of code, and explains why. Linaro MAP does not require any complicated configuration, and you do not need to have experience with profiling tools to use it.
Linaro MAP supports:
MPI, OpenMP, and single-threaded programs.
Small data files. All data is aggregated on the cluster and only a few megabytes written to disk, regardless of the size or duration of the run.
Sophisticated source code view, enabling you to analyze performance across individual functions.
Both interactive and batch modes for gathering profile data.
A rich set of metrics, that show memory usage, floating-point calculations, and MPI usage across processes, including:
Percentage of vectorized instructions, including AVX extensions, used in each part of the code.
Time spent in memory operations, and how it varies over time and processes, to verify if there are any cache bottlenecks.
A visual overview across aggregated processes and cores that highlights any regions of imbalance in the code.