SLURM support
On Cray X-series systems, only native SLURM is supported. Hybrid mode is not supported.
If you are using Slurm 21.08.0x, where x <= 4, you might see one of these error messages:
Invalid generic resource (gres) specification
or
Invalid Trackable RESource (TRES) specification
If you see either of these errors, you can use FORGE_USE_SSH_STARTUP=1
to
get startup working. FORGE_USE_SSH_STARTUP=1
disables the Forge scalable
launch mechanism, which could lead to performance issues if starting a many
process job. If you encounter performance issues, try
FORGE_DEBUG_SRUN_ARGS="%jobid% --mem-per-cpu=0 -I -W0 --gpus=0 --overlap"
with the added caveat that this might lead to more issues if you are launching
on a GPU compute node.