Cray MPT
This section only applies when using aprun
.
For srun
(Native SLURM mode), see SLURM.
Linaro DDT and Linaro MAP have been tested with Cray XT 5/6, XE6, XK6/7, and XC30 systems. Linaro DDT can launch and support debugging jobs in more than 700,000 cores. Linaro Performance Reports has been tested with Cray XK7 and XC30 systems.
Several template files are included in the distribution for launching applications from within the queue, using the Linaro Forge job submission interface. These might require some minor editing to cope with local differences on your batch system.
To attach to a running job on a Cray system, the MOM nodes where aprun
is
launched, must be accessible using ssh
from the node where Linaro DDT is running.
You can either specify the aprun
host manually in the Attach dialog when scanning for jobs,
or configure a hosts list containing all nodes.
Preloading Linaro DDT memory debugging libraries is not supported with aprun
.
If the program is dynamically linked, Linaro MAP and Linaro Performance Reports support preloading the profile libraries with aprun
(with aprun/ALPS 4.1 or later).
Preloading is not supported in MPMD mode and requires that sampling libraries are linked with the application before running on this platform.
See the Linking section in Prepare a program for profiling for more information.
By default, scripts wrapping Cray MPT are not detected. However, you can
force the detection before starting Linaro DDT, Linaro MAP, or Linaro Performance Reports, by setting the environment variable to yes
.
Using DDT with Cray ATP (the Abnormal Termination Process)
Linaro DDT is compatible with the Cray ATP system, which is the default on some XE systems. This runtime addition to applications automatically gathers crashing process stacks, and can be used to let Linaro DDT attach to a job before it is cleaned up during a crash.
To debug after a crash when an application is run with ATP but without a debugger, initialize the environment variable before launching the job. For a large Petascale system, a value of 5 is sufficient, giving 5 minutes for the attach to complete.
The following example shows the typical output of an ATP session:
n10888@kaibab:~> aprun -n 1200 ./atploop
Application 1110443 is crashing. ATP analysis proceeding...
Stack walkback for Rank 23 starting:
_start@start.S:113
__libc_start_main@libc-start.c:220
main@atploop.c:48
__kill@0x4b5be7
Stack walkback for Rank 23 done
Process died with signal 11: 'Segmentation fault'
View application merged backtrace tree file 'atpMergedBT.dot'
with 'statview'
You may need to 'module load stat'.
atpFrontend: Waiting 5 minutes for debugger to attach...
To debug the application at this point, launch Linaro DDT.
Linaro DDT can attach using the Attaching dialogs described in Attach to running programs, or given
the PID of the aprun
process, the debugging set can be specified from the command line.
For example, to attach to the entire job:
ddt --attach-mpi=12772
If a particular subset of processes are required, then the subset notation could also be used to select particular ranks.
ddt --attach-mpi=12772 --subset=23,100-112,782,1199