.. _gpu_lab: ************* GPU Workloads ************* .. _Slides: https://hpc.auburn.edu/hpc/docs/hpcdocs/build/html/easley/hpc_training_gpu.pdf .. _Repository: https://github.com/auburn-research-computing/gpu_lab.git Presentation Slides_ .. | Github Repository_ Introduction ^^^^^^^^^^^^ Easley GPU nodes are equipped with NVIDA Tesla T4 devices. Two generally accessible GPU partitions exist, based on the number of GPU devices that are present per node. .. code-block:: console :linenos: sinfo -p gpu2,gpu4 -O gres,partitionname,nodes GPU Modules & Locations ^^^^^^^^^^^^^^^^^^^^^^^ CUDA Toolkit and a related programming framework module form the base for your GPU workload environment. A number of code examples and scripts are also available in the /tools/gpu subdirectory ... .. code-block:: console module show cuda11.0/toolkit ls /tools/gpu/cuda_simple Basic CUDA C++ Example ^^^^^^^^^^^^^^^^^^^^^^ Copy the sample source files to a location in your home directory... .. code-block:: console cd ~ mkdir hpc_gpu cd hpc_gpu cp -R /tools/gpu/tutorials/* . ls cd hello Now let's take a quick look at the source code, compile, and run it... .. code-block:: console module load cuda11.0/toolkit cat hello.cu nvcc -o hello hello.cu srun -N1 -n1 --partition=gpu4 --gres=gpu:tesla:1 ./hello cat hello_threads.cu nvcc -o threads hello_threads.cu srun -N1 -n1 --partition=gpu4 --gres=gpu:tesla:1 ./threads And to be sure, let's use the CUDA profiler to see exactly how the programs are using the GPU ... .. code-block:: console srun -N1 -n1 -p gpu4 --gres=gpu:tesla:1 nvprof $HOME/hpc_gpu/hello/hello srun -N1 -n1 -p gpu4 --gres=gpu:tesla:1 nvprof $HOME/hpc_gpu/hello/threads Using Cuda with Pytorch ^^^^^^^^^^^^^^^^^^^^^^^^ Pytorch is one of the many popular deep learning frameworks used among data scientist. In order to set up and run CUDA operations, Pytorch provides the torch.cuda package. This package adds support for CUDA tensor types, that impliment the same function as CPU tensors but utilizes GPUS for computation. Installing Pytorch ^^^^^^^^^^^^^^^^^^ A virtual environment will need to be created before installing Pytorch. For more information about virtual environments please visit the following: https://hpc.auburn.edu/hpc/docs/hpcdocs/build/html/easley/python.html#python-virtual-environments .. code-block:: console cd ~/ hpc_gpu module load python python3 -m virtualenv pytorch_lab source pytorch_lab/bin/activate At this point, the virtual environment is created and activated using the source command. Now that the pytorch virtual environment is activated, install pytorch using the following .. code-block:: console cd pytorch_lab pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 pip list installed deactivate In this first exercise we will use the torch.cuda package to check the availability of the cuda device and to gather information. torch.cuda ^^^^^^^^^^ .. code-block:: console srun -N1 -n1 --partition=gpu2 --gres=gpu:tesla:1 --pty /bin/bash module load python module load cuda11.0/toolkit cd ~/hpc_gpu/pytorch_lab source bin/activate python3 import torch To check if your system supports cuda, use the following command. is_available() will return a bool value either true if your system supports cuda or false. .. code-block:: console torch.cuda.is_available() true The current_device() command will provide information about the id of the cuda device .. code-block:: console torch.cuda.current_device() 0 Taking the following id value provided above, you can also retrieve the name of the device using the following command .. code-block:: console torch.cuda.get_device_name(0) 'Tesla T4' To provide even further information using the id of the cuda device you can do the following .. code-block:: console torch.cuda.get_device_properties(0) _CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15109MB, multi_processor_count=40) Lets finish the interactive job by doing the following. Exit the python program and the interactive job as such .. code-block:: console exit() Copy the following exercise in the virtualenvs/pytorch directory .. code-block:: console cp ../pytorch/pytorch.py . ./pytorch.py exit Once placed back onto the login node, deactivate the virtual environment .. code-block:: console deactivate Batch Job Submission ^^^^^^^^^^^^^^^^^^^^ Create a bash script named pytorch_lab.sh and place the following .. code-block:: console nano pytorch_lab.sh .. code-block:: console #!/bin/bash #SBATCH --partition=gpu4 #SBATCH --time=5:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --gres=gpu:tesla:1 #SBATCH --job-name=pytorch_lab module load python module load cuda11.0/toolkit/11.0.3 source ~/hpc_gpu/pytorch_lab/bin/activate python3 pytorch.py > results.out .. code-block:: console sbatch pytorch_lab.sh