Intel Xe GPU Breakpoints

Breakpoints affect all GPU threads, and cause the program to stop when a thread reaches the breakpoint. Threads of the same sub-group reach the breakpoint together and the kernel pauses once per sub-group.

The number of breakpoints hit in a GPU kernel can be refined using:

Divergent Execution

Where kernels have divergent execution within a sub-group, GPU threads which are currently in-active (diverge) will not be shown by Linaro DDT.

For example, a divergent if-else statement with a breakpoint set on each branch, will result only in GPU threads stopping at the first breakpoint that satisfy the if-condition. GPU threads which do not satisfy the if-condition will not be shown.

When the first breakpoint has been hit by all GPU threads of a sub-group, the GPU threads will progress to the next branch of the if-else statement, but with the GPU threads who do not satisfy the else-statement condition being hidden.

Attempting to select an in-active GPU thread will result in an error message.

Conditional Breakpoints

Breakpoints can be applied to individual work-item global IDs, work-item local IDs or work-group IDs.

  • To apply breakpoints to individual work-items, based on their global ID use the GDB convenience variable $_workitem_global_id.

  • To apply breakpoints to individual work-items, based on their local ID and work-group ID use the GDB convenience variables $_workitem_local_id and $_thread_workgroup respectively.

These GDB convenience variables are 3-dimensional and each dimension can be accessed using the respective array element, for example $_workitem_global_id[0], $_workitem_local_id[1] or $_thread_workgroup[2].

Example: To apply a breakpoint only to work-item with the global ID <<<4,6,9>>> use the following condition: $_workitem_global_id[0] == 4 && $_workitem_global_id[1] == 6 && $_workitem_global_id[2] == 9. Please note that the global work-item ID might be transposed.

Note

Breakpoints of GPU threads are reported in the order they are scheduled on Execution Units (EU).

Due to internal GPU thread scheduling behavior, it might take a significant amount of time until a conditional breakpoint is hit.