A short description of the main features of the OpenMP Regions view.
Note
If you are using MPI and OpenMP, this view summarizes all cores across all nodes, not just one node.
The OpenMP Regions view shows:
update
function
at line 207. Clicking on this shows the region in the Source Code viewer.do_math
function.
Hovering on the line or clicking on the [–] symbol collapses the view
down to show the figures for how much time.do_math
, the (sqtau * (values[i-1]
...)
one takes longest with 13.7% of the total core hours across all
cores used in the job.sqtau = tau * tau
is the next most expensive line,
taking 10.5% of the total core hours.From this you can see that the region is optimized for OpenMP usage, that is, it has very low overhead. If you want to improve performance you can look at the calculations on the lines highlighted in conjunction with the CPU instruction metrics, in order to answer the following questions:
See Metrics view for more information on CPU instruction metrics.
Click on any line of the OpenMP Regions view to jump to the Source Code viewer to show that line of code.
The percentage OpenMP synchronization time gives an idea as to how well your program is scaling to multiple cores and highlights the OpenMP regions that are causing the greatest overhead. Examples of things that cause OpenMP synchronization include:
omp atomic
does not appear as
synchronization time. This is generally implemented as a locking
modifier to CPU instructions. Overuse of the atomic
operator shows
up as large amounts of time spent in memory accesses and on lines
immediately following an atomic
pragma.no_barrier
OpenMP keyword if appropriate.When parallelizing with OpenMP it is extremely important to achieve good single-core performance first. If a single CPU core is already bottlenecked on memory bandwidth, splitting the computations across additional cores rarely solves the problem.