CPU metrics breakdown

This section presents key CPU performance measurements gathered using the Linux perf event subsystem.

Note

Metrics described in this section are only available on Arm®v8 and IBM Power systems. These metrics are not available on virtual machines. Linux perf events performance events counters must be accessible on all systems on which the target program runs. See Armv8 (AArch64) known issues or POWER8 and POWER9 (POWER 64-bit) in Platform notes and known issues.

Cycles per instruction

The average amount of CPU cycles lapsed for each retired instruction. This metric is affected by CPU frequency scaling and various issues, particularly hardware interrupt counts.

Stalled cycles

Note

This metric is available on Arm®v8 and IBM Power 9 systems only.

The percentage of CPU cycles that lapsed, on which operation instructions are not issued.

L2 cache misses

Note

This metric is available on Arm®v8 systems only.

The ratio of L2 data cache accesses which resulted in a miss to instructions completed.

L3 cache miss per instruction

Note

This metric is available on IBM Power 9 systems only.

The ratio of L3 data cache demand loads to instructions completed.

FLOPS scalar lower bound

This is a lower bound because its value is calculated from FLOPS vector lower bound, which does not account for the length of vector operations.

Note

This metric is available on IBM Power 8 systems only.

The rate at which floating-point scalar operations finished.

FLOPS vector lower bound

This is a lower bound because the counted value does not account for the length of vector operations.

Note

This metric is available on IBM Power 8 systems only.

The rate at which vector floating-point instructions completed.

Memory accesses (IBM Power 8)

Note

This metric is available on IBM Power 8 systems only.

The rate at which the processor’s data cache reloaded from a memory location, including L4 from local, remote, or distant due to a demand load.