Slurm Interactive Jobs

Interactive jobs are an effective way to debug and troubleshoot workload steps. Opposed to batch jobs sbatch which run unnatended and provide aggregated file based error and output messages, interactive jobs salloc srun allow you to access compute resources with a desired allocation and walk through job steps to identify problems or optimizations.

This is especially important for new or experimental workloads, where using sbatch typically involves an inefficient process of running all jobs steps, waiting for results, modifying input, and repeating until all problems are resolved.

Interactive job enable these processes to be addressed in a single step and provide more control and visibility into workload behavior.

Our recommended steps for troubleshooting or preparing new workloads are:

  1. Request or aquires software and dependencies

  2. Perform data transfer, staging, and lightweight testing on the login node to verify basic functionality.

  3. Run an interactive job usin

  4. Execute any known environment modifications andor commands required for your workload software components

  5. Attempt to identify and correct errors or unexpected behaviors and note all additional steps taken

  6. Iterate step 5, until the software is generating expected results

  7. Create a job script that includes all steps used to correct errors and\or behaviors, in the appropriate order

  8. Exit the interactive session, and confirm that the job script executes in batch

  9. Make scientific discoveries

Interactive Jobs Using salloc

salloc is the preferred utility for interactive jobs.

To request an interactivce job allocation using salloc, modify andor append the following generalized syntax with your desired resource allocation:

salloc -N1 -n16

Upon success, Slurm will allocate resources according to your parameters and create a shell session on the root compute node, from which you can begin validating your job steps …

salloc: Pending job allocation 588384
salloc: job 588384 queued and waiting for resources
salloc: job 588384 has been allocated resources
salloc: Granted job allocation 588384

[hpcuser@node123 ~]$

Interactive Jobs Using srun

srun is a legacy approach which has recently been deprecated.

For convenience, interactive jobs using srun are still supported but as of 03.31.23 an environment change is required to use srun for interactive jobs

module load slurm/auhpc
srun -N1 -n16 [optional parameters] --pty /bin/bash

The first step changes your environment to use an intermediary script which evaluates and determines the requested job type(s) (e.g. interactive, or direct execution) and appends any additional parameters required for the determined type.

To disable this behavior, you can unload the environment module and return to the standard srun utility by loading the default slurm module.

module load slurm/auhpc
which srun
/tools/scripts/auhpc/srun

module load slurm
which srun
/cm/shared/apps/slurm/current/bin/srun