Intel Xe GPU Breakpoints
Breakpoints affect all GPU threads, and cause the program to stop when a thread reaches the breakpoint. Threads of the same sub-group reach the breakpoint together and the kernel pauses once per sub-group.
The number of breakpoints hit in a GPU kernel can be refined using:
Conditional Breakpoints, see Conditional breakpoints.
Hit Limits, see Set breakpoints.
Divergent Execution
Where kernels have divergent execution within a sub-group, GPU threads which are currently in-active (diverge) will not be shown by Linaro DDT.
For example, a divergent if-else statement with a breakpoint set on each branch, will result only in GPU threads stopping at the first breakpoint that satisfy the if-condition. GPU threads which do not satisfy the if-condition will not be shown.
When the first breakpoint has been hit by all GPU threads of a sub-group, the GPU threads will progress to the next branch of the if-else statement, but with the GPU threads who do not satisfy the else-statement condition being hidden.
Attempting to select an in-active GPU thread will result in an error message.
Conditional Breakpoints
Breakpoints can be applied to individual work-item global IDs, work-item local IDs or work-group IDs.
To apply breakpoints to individual work-items, based on their global ID use the GDB convenience variable
$_workitem_global_id
.To apply breakpoints to individual work-items, based on their local ID and work-group ID use the GDB convenience variables
$_workitem_local_id
and$_thread_workgroup
respectively.
These GDB convenience variables are 3-dimensional and each dimension can be accessed
using the respective array element, for example $_workitem_global_id[0]
,
$_workitem_local_id[1]
or $_thread_workgroup[2]
.
Example: To apply a breakpoint only to work-item with the global ID <<<4,6,9>>>
use the following condition: $_workitem_global_id[0] == 4 && $_workitem_global_id[1] == 6 && $_workitem_global_id[2] == 9
.
Please note that the global work-item ID might be transposed.
Note
Breakpoints of GPU threads are reported in the order they are scheduled on Execution Units (EU).
Due to internal GPU thread scheduling behavior, it might take a significant amount of time until a conditional breakpoint is hit.