A short description of the main features of the OpenMP Regions view.
Note
If you are using MPI and OpenMP, this view summarizes all cores across all nodes, not just one node.
The OpenMP Regions view shows:
update function
at line 207. Clicking on this shows the region in the Source Code viewer.do_math function.
Hovering on the line or clicking on the [–] symbol collapses the view
down to show the figures for how much time.do_math, the (sqtau * (values[i-1]
...) one takes longest with 13.7% of the total core hours across all
cores used in the job.sqtau = tau * tau is the next most expensive line,
taking 10.5% of the total core hours.From this you can see that the region is optimized for OpenMP usage, that is, it has very low overhead. If you want to improve performance you can look at the calculations on the lines highlighted in conjunction with the CPU instruction metrics, in order to answer the following questions:
See Metrics view for more information on CPU instruction metrics.
Click on any line of the OpenMP Regions view to jump to the Source Code viewer to show that line of code.
The percentage OpenMP synchronization time gives an idea as to how well your program is scaling to multiple cores and highlights the OpenMP regions that are causing the greatest overhead. Examples of things that cause OpenMP synchronization include:
omp atomic does not appear as
synchronization time. This is generally implemented as a locking
modifier to CPU instructions. Overuse of the atomic operator shows
up as large amounts of time spent in memory accesses and on lines
immediately following an atomic pragma.no_barrier OpenMP keyword if appropriate.When parallelizing with OpenMP it is extremely important to achieve good single-core performance first. If a single CPU core is already bottlenecked on memory bandwidth, splitting the computations across additional cores rarely solves the problem.