.. _easley_interactive: Slurm Interactive Jobs ====================== Interactive jobs are an effective way to debug and troubleshoot workload steps. Opposed to batch jobs ``sbatch`` which run unnatended and provide aggregated file based error and output messages, interactive jobs ``salloc`` ``srun`` allow you to access compute resources with a desired allocation and walk through job steps to identify problems or optimizations. This is especially important for new or experimental workloads, where using ``sbatch`` typically involves an inefficient process of running all jobs steps, waiting for results, modifying input, and repeating until all problems are resolved. Interactive job enable these processes to be addressed in a single step and provide more control and visibility into workload behavior. Our recommended steps for troubleshooting or preparing new workloads are: 1. Request or aquires software and dependencies 2. Perform data transfer, staging, and lightweight testing on the login node to verify basic functionality. 3. Run an interactive job usin 4. Execute any known environment modifications and\or commands required for your workload software components 5. Attempt to identify and correct errors or unexpected behaviors and note all additional steps taken 6. Iterate step 5, until the software is generating expected results 7. Create a job script that includes all steps used to correct errors and\\or behaviors, in the appropriate order 8. Exit the interactive session, and confirm that the job script executes in batch 9. Make scientific discoveries Interactive Jobs Using ``salloc`` --------------------------------- ``salloc`` is the preferred utility for interactive jobs. To request an interactivce job allocation using ``salloc``, modify and\or append the following generalized syntax with your desired resource allocation: .. hpc-prompt:: hpcterm > salloc -N1 -n16 Upon success, Slurm will allocate resources according to your parameters and create a shell session on the root compute node, from which you can begin validating your job steps ... .. hpc-prompt:: hpcterm > salloc: Pending job allocation 588384 salloc: job 588384 queued and waiting for resources salloc: job 588384 has been allocated resources salloc: Granted job allocation 588384 [hpcuser@node123 ~]$ Interactive Jobs Using ``srun`` ------------------------------- ``srun`` is a legacy approach which has recently been deprecated. For convenience, interactive jobs using ``srun`` are still supported but **as of 03.31.23 an environment change is required to use srun for interactive jobs** ... .. hpc-prompt:: hpcterm > module load slurm/auhpc srun -N1 -n16 [optional parameters] --pty /bin/bash The first step changes your environment to use an intermediary script which evaluates and determines the requested job type(s) (e.g. interactive, or direct execution) and appends any additional parameters required for the determined type. To disable this behavior, you can unload the environment module and return to the standard srun utility by loading the default slurm module. .. hpc-prompt:: hpcterm > module load slurm/auhpc which srun /tools/scripts/auhpc/srun module load slurm which srun /cm/shared/apps/slurm/current/bin/srun