.. _gpu_lab:

*************
GPU Workloads
*************

.. _Slides: https://hpc.auburn.edu/hpc/docs/hpcdocs/build/html/easley/hpc_training_gpu.pdf

.. 
  _Repository: https://github.com/auburn-research-computing/gpu_lab.git
  

Presentation Slides_

..
   | Github Repository_


Introduction
^^^^^^^^^^^^

Easley GPU nodes are equipped with NVIDA Tesla T4 devices. Two generally accessible GPU partitions exist, based on the number of GPU devices that are present per node.

.. code-block:: console
   :linenos:
   
   sinfo -p gpu2,gpu4 -O gres,partitionname,nodes

GPU Modules & Locations
^^^^^^^^^^^^^^^^^^^^^^^

CUDA Toolkit and a related programming framework module form the base for your GPU workload environment.

A number of code examples and scripts are also available in the /tools/gpu subdirectory ...

.. code-block:: console
        
   module show cuda11.0/toolkit
   ls /tools/gpu/cuda_simple

Basic CUDA C++ Example 
^^^^^^^^^^^^^^^^^^^^^^

Copy the sample source files to a location in your home directory...

.. code-block:: console
        
   cd ~
   mkdir hpc_gpu
   cd hpc_gpu
   cp -R /tools/gpu/tutorials/* .
   ls
   cd hello

Now let's take a quick look at the source code, compile, and run it...

.. code-block:: console

   module load cuda11.0/toolkit
   cat hello.cu
   nvcc -o hello hello.cu 
   srun -N1 -n1 --partition=gpu4 --gres=gpu:tesla:1 ./hello
   cat hello_threads.cu
   nvcc -o threads hello_threads.cu 
   srun -N1 -n1 --partition=gpu4 --gres=gpu:tesla:1 ./threads 

And to be sure, let's use the CUDA profiler to see exactly how the programs are using the GPU ...

.. code-block:: console

   srun -N1 -n1 -p gpu4 --gres=gpu:tesla:1 nvprof $HOME/hpc_gpu/hello/hello
   srun -N1 -n1 -p gpu4 --gres=gpu:tesla:1 nvprof $HOME/hpc_gpu/hello/threads
   

Using Cuda with Pytorch
^^^^^^^^^^^^^^^^^^^^^^^^
Pytorch is one of the many popular deep learning frameworks used among data scientist. In order to set up and run CUDA operations, Pytorch provides the torch.cuda package. 
This package adds support for CUDA tensor types, that impliment the same function as CPU tensors but utilizes GPUS for computation.

Installing Pytorch
^^^^^^^^^^^^^^^^^^
A virtual environment will need to be created before installing Pytorch.
For more information about virtual environments please visit the following:

https://hpc.auburn.edu/hpc/docs/hpcdocs/build/html/easley/python.html#python-virtual-environments

.. code-block:: console

   cd ~/ hpc_gpu
   module load python
   python3 -m virtualenv pytorch_lab   
   source pytorch_lab/bin/activate

At this point, the virtual environment is created and activated using the source command. Now that the pytorch virtual environment is activated, install pytorch using the following

.. code-block:: console
   
   cd pytorch_lab
   pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
   pip list installed
   deactivate


In this first exercise we will use the torch.cuda package to check the availability of the cuda device and to gather information. 

torch.cuda
^^^^^^^^^^

.. code-block:: console
   
   srun -N1 -n1 --partition=gpu2 --gres=gpu:tesla:1 --pty /bin/bash
   module load python
   module load cuda11.0/toolkit
   cd ~/hpc_gpu/pytorch_lab
   source bin/activate
   python3
   import torch
   

To check if your system supports cuda, use the following command. is_available() will return a bool value either true if your system supports cuda or false.

.. code-block:: console
   
   torch.cuda.is_available()
   true   

The current_device() command will provide information about the id of the cuda device 

.. code-block:: console

   torch.cuda.current_device()
   0

Taking the following id value provided above, you can also retrieve the name of the device using the following command

.. code-block:: console
  
   torch.cuda.get_device_name(0)
   'Tesla T4'

To provide even further information using the id of the cuda device you can do the following

.. code-block:: console
 
   torch.cuda.get_device_properties(0)
   _CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15109MB, multi_processor_count=40)

Lets finish the interactive job by doing the following. Exit the python program and the interactive job as such

.. code-block:: console
 
   exit()
   

Copy the following exercise in the virtualenvs/pytorch directory

.. code-block:: console

   cp ../pytorch/pytorch.py .
   ./pytorch.py
   exit

Once placed back onto the login node, deactivate the virtual environment

.. code-block:: console

   deactivate

Batch Job Submission
^^^^^^^^^^^^^^^^^^^^   
Create a bash script named pytorch_lab.sh and place the following 

.. code-block:: console

   nano pytorch_lab.sh

.. code-block:: console

        
   #!/bin/bash

   #SBATCH --partition=gpu4
   #SBATCH --time=5:00
   #SBATCH --nodes=1
   #SBATCH --ntasks-per-node=1
   #SBATCH --gres=gpu:tesla:1
   #SBATCH --job-name=pytorch_lab

   module load python
   module load cuda11.0/toolkit/11.0.3

   source ~/hpc_gpu/pytorch_lab/bin/activate

   python3 pytorch.py  > results.out


.. code-block:: console

   sbatch pytorch_lab.sh