OpenMP programs
For an OpenMP or multi-threaded program (or a mixed-mode MPI+OpenMP
program) you will also see other colors used.
- Light green Multi-threaded computation time. For an OpenMP program this
is time inside OpenMP regions. When profiling an OpenMP program you
want to see as much light green as possible, because that is the only
time you are using all available cores. Time spent in dark green is a
potential bottleneck because it is serial code outside an OpenMP
region.
- Light blue Multi-threaded MPI communication time. This is MPI time spent
waiting for MPI communication while inside an OpenMP region or on a
pthread. As with the normal blue MPI time you will want to minimize
this, but also maximize the amount of multi-threaded computation
(light green) that is occurring on the other threads while this MPI
communication is taking place.
- Dark gray Time inside an OpenMP region
in which a core is idle or waiting to synchronize with the other
OpenMP threads. In theory, during an OpenMP region all threads are
active all of the time. In practice there are significant
synchronization overheads involved in setting up parallel regions and
synchronizing at barriers. These will be seen as dark gray holes in
the otherwise good light green of optimal parallel computation. If
you see these there may be an opportunity to improve performance with
better loop scheduling or division of the work to be done.
- Pale blue Thread synchronization time.
Time spent waiting for synchronization between non-OpenMP threads
(for example, a
pthread_join
). Whether this time can be reduced
depends on the purpose of the threads in question.
In the screenshot above you can see that 12.8% of the time is spent
calling neighbor.build(atom)
and 75.3% of the time is spent calling
force->compute(atom, neighbor, comm, comm.me)
. The graphs show a
mixture of light green indicating an OpenMP region and dark
gray indicating OpenMP overhead. OpenMP overhead is the time spent in
OpenMP that is not the contents of an OpenMP region (user code).
Hovering the mouse over a line will show the exact percentage of time
spent in overhead, but visually you can already see that it is
significant but not dominant here.
Increasingly, programs use both MPI and OpenMP to parallelize their
workloads efficiently. Arm® MAP fully and transparently supports this model
of working. It is important to note that the graphs are a reflection of
the application
activity over time:
- A large section of blue in a mixed-mode MPI code means that all the
processes in the program were inside MPI calls during this
period. Try to reduce these, especially if they have a triangular
shape suggesting that some processes were waiting inside MPI while
others were still computing.
- A large section of dark green means that all the processes were
running single-threaded computations during that period. Avoid this
in an MPI+OpenMP code, or you might as well leave out the OpenMP
sections altogether.
- Ideally you want to achieve large sections of light green, showing
OpenMP regions being effectively used across all processes
simultaneously.
- It is possible to call MPI functions from within an OpenMP
region. Arm® MAP only supports this if the OpenMP primary
thread is the one that makes the MPI calls. In this case, the blue
block of MPI time are smaller, demonstrating that one OpenMP
thread is in an MPI function while the rest are doing something else,
such as useful computation.