.. _python_lab: **************************** Python Fundamentals Training **************************** .. _Slides: https://hpc.auburn.edu/hpc/docs/hpcdocs/build/html/easley/hpc_training_python1.pdf .. _Repository: https://github.com/auburn-research-computing/python_multiprocessing.git | Presentation Slides_ | Github Repository_ .. raw:: html | | Introduction ^^^^^^^^^^^^ When you first login to Easley, the system python is available by default. However this may not be the ideal version of python for your workload. The following commands will return the version of python and the path where the version of python is installed. The first command below will give you details about the version of python .. hpc-prompt:: hpcterm >,(env)...$ auto > python "-V" Python 2.7.5 The second command will give you the path. .. hpc-prompt:: hpcterm >,(env)...$ auto > which python /usr/bin/python By default python 2.7.5 is available for use. However, if you wish to choose another version of python or prefer to use anaconda, run the following command: .. hpc-prompt:: hpcterm >,(env)...$ auto > module av ------------- Languages & Environments -------- python/anaconda/2.7.14 python/anaconda/3.6.3 python/anaconda/3.7.0 python/intel/3.7.9 python/anaconda/2.7.15 python/anaconda/3.6.4 python/anaconda/3.7.4 python/3.8.6 python/anaconda/3.5.2-0 python/anaconda/3.6.5 python/anaconda/3.8.6 python/3.9.2 (D) Under Languages and Environments, you will see a variety of python modules to choose from. The default python will be python/3.9.2 .. hpc-prompt:: hpcterm > module load python .. hpc-prompt:: hpcterm >,(env)...$ auto > python -V Python 3.9.2 .. hpc-prompt:: hpcterm >,(env)...$ auto > which python /tools/python-3.9.2/bin/python Listing Available packages ^^^^^^^^^^^^^^^^^^^^^^^^^^ The two most popular package managers for installing and listing packages are pip and conda. There are several system wide packages installed and available for use. They vary depending on the version of anaconda or python you load. To check for the list of packages available use the following command. For python modules use the following: .. hpc-prompt:: hpcterm > module load python pip list installed For anaconda/python modules use the following command: .. hpc-prompt:: hpcterm > module unload python module load python/anaconda/3.8.6 conda list Python or Anaconda? ^^^^^^^^^^^^^^^^^^^ Anaconda is an opensource Python distribution that contains hundreds of data science libraries. It may be beneficial for users who specialize in data science to use the python anaconda modules. Virtual Environments and installing packages locally ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you load a version of python that does not include a package nessessary for your workload you can submit a request with our office. However you can also install the package(s) locally. To work around the problems with limited privileges, it is possible to use your home directory ``/home/`` where you have full rights, to install special entire instances of Python. .. hpc-prompt:: hpcterm > pip install --user package_name However, if you are planning on using multiple versions of the same package for different projects, you may run into dependency issues. Virtual environments can be used to solve this issue. Virtual environments are recommended for python-based projects, and it is recommended that you create a new environment for every project. The Python ``virtualenv`` module allows us to create encapsulated Python instance where we can perform module installations without administrative privileges and avoid software conflicts. The ``virtualenv`` feature actually creates a copy of the core binaries and libraries needed to run a Python program, including an isolated ``site-packages`` location in your home directory (or other specified location to which you have write permissions) where modules are typically installed. This feature is very helpful for shared systems like HPC clusters so that you can customize Python for your research workflow without administrative privileges. Python 3 with pip ================= This section will demonstrate how to set up a virtual environment using python3 with the pip package manager. Step 1: Get Home ++++++++++++++++ First change into your home directory by typing the following command: .. hpc-prompt:: hpcterm > cd You can always use the following command to check the current directory you are in. .. hpc-prompt:: hpcterm > pwd Step 2: Set the Python Environment ++++++++++++++++++++++++++++++++++ Before creating the virtual environment, check which modules are loaded using ``module list``. Load any additional modules needed for your workflow with ``module load``, and unload any unneccessary modules with ``module unload``. For this example, we use default (latest) version of Python ... .. hpc-prompt:: hpcterm > module load python Most versions of Python available on HPC will have the ``virtualenv`` feature available for use globally, but it's probably still a good idea to create an instance of the feature in our home directory. The following ``pip`` command uses the ``--user`` option, which will install ``virtualenv`` to a location in our home directory. .. hpc-prompt:: hpcterm > pip3 install --user virtualenv If the command is successful, you should be able to see what files were installed by looking into a special hidden path (".local") in your home directory. The ``pip`` command, when used with ``--user``, creates this path to hold user specific python files and configuration items ... .. hpc-prompt:: hpcterm > ls -al ~/.local/lib/python-/site-packages .. Caution :: ``pip3 install --user `` is similar to python virtual environments, in that it creates a location in your home directory where you can install modules. However, it's important to note that simply using ``pip --user`` does not generate all of the necessary changes needed to work in a fully isolated virtual environment. Step 3: Create the Virtual Environment ++++++++++++++++++++++++++++++++++++++ With ``virtualenv`` installed, we need to create a new location in our home directory to hold the virtual environment files. You may end up with multiple virtual environments for various projects, so it's a good idea give the new path a descriptive name (e.g. pytorch_project1). Here we just call our environment "env1" as a generic identifier, but you can name the path whatever you like. .. hpc-prompt:: hpcterm > mkdir ~/env1 cd ~/env1/ From within the new directory, we can now create a new virtual environment by telling python to copy all of the essential files to a location within our new project folder ("env1"). We use "env1_python" here, but you can name it whatever you want ... .. hpc-prompt:: hpcterm > python3 -m virtualenv env1_python Because Python has to perform a number of file copies in order to create the virtual environment, this command may take a little while to complete. We can see what has been copied by looking into the newly created "env1_python" directory ... .. hpc-prompt:: hpcterm > ls -al env1_python ls -al env1_python/bin ls -al env1_python/lib/python Step 4: Activate the Virtual Environment ++++++++++++++++++++++++++++++++++++++++ You may notice that several "activate" scripts have been created for us in ``env1_python/bin``. These files perform all of the necessary tasks to ensure that your Linux shell is configured with all of the appropriate paths, libraries, etc. needed to run in the isolated virtual environment. In order to actually switch to this specific execution context, we use the ``source`` command along with the location of the ``activate`` script ... .. hpc-prompt:: hpcterm > source env1_python/bin/activate With the environment now set after calling ``source``, you should see that your prompt has changed to indicate the name of the virtual environment ... .. hpc-prompt:: hpcterm (env1_python)...> auto (env1_python) hpcuser@easley01:env1_python > .. hpc-prompt:: hpcterm (env1_python)...> auto (env1_python) hpcuser@easley01:env1_python > which python /home/hpcuser/env1/env1_python/bin/python Step 5: Install Packages ++++++++++++++++++++++++ From within this new Python execution context, we can now perform operations like we normally would for a locally installed python. Most importantly, you can now install packages using pip ... .. hpc-prompt:: hpcterm (env1_python)...> auto (env1_python) hpcuser@easley01:env1_python > pip3 install .. hpc-prompt:: hpcterm (env1_python)...> auto (env1_python) hpcuser@easley01:env1_python > python3 To confirm, we can take a look in the virtual environments dedicated ``site-packages`` directory, and we should see that any modules we have installed while activated are placed in that location ... .. hpc-prompt:: hpcterm (env1_python)...> auto (env1_python) hpcuser@easley01:env1_python > ls env/lib/python3.9/site-packages/ And, we can run interactively in our customized shell to test some code ... .. hpc-prompt:: hpcterm (env1_python)...> auto (env1_python) hpcuser@easley01:env1_python > python3 Before creating the virtual environment, check which modules are loaded, and load any additional modules needed. For this example use the latest version of Python 3. Move to the next section if you plan on using conda as the package manager. Step 6: Deactivate Virtual Environment ++++++++++++++++++++++++++++++++++++++ Final step is to deactivate the virtual environment using the following steps below: .. hpc-prompt:: hpcterm (env1_python)...> auto (env1_python) hpcuser@easley01:env1_python > deactivate env1_python Anaconda Python 3 with conda ============================ Anaconda is a special *distribution* of Python that is purposed for scientific workloads. Anaconda claims to provide everything you need for data science development and experimentation, including it's own specialized package manager. Some researchers prefer to use Anaconda for their Python workloads. The following steps describe the recommended method for creating an Anaconda (conda) virtual environment ... Step 1: Housekeeping ++++++++++++++++++++ First, ensure that you are in your home directory and have a clean environment, with no other Python-specific modules loaded... .. hpc-prompt:: hpcterm > cd module list If you see any Python modules loaded, it might be a good idea to log out, then back in to reset your environment. Step 2: Load Anaconda +++++++++++++++++++++ Once you have confirmed that your environment is clean, you can load one of the Anaconda Python modules from which you can begin configuring your virtual environment. Here, we load the default (latest) version... .. hpc-prompt:: hpcterm > module av python .. admonition:: Command Line Interpreter: *bash* :class: terminal_small | (env2) hpcuser@easley01:~ > module av python | | - - - - - - - - - - - - Programming Languages & Environments - - - - - - - - - - - - | python/3.8.6 python/anaconda/2.7.14 python/anaconda/3.7.0 ... | python/3.9.2 python/anaconda/2.7.15 python/anaconda/3.7.4 python/intel/3.7.9 | .. hpc-prompt:: hpcterm > module load python/anaconda Step 3: Create a New Virtual Environment ++++++++++++++++++++++++++++++++++++++++ With Anaconda Python loaded, your environment should now be set to use the Python binaries. Because the Python installation is purposed for a multi-user environment, the path for the default package location is not writable by normal users. In order to use our own location for packages, we need to set the ``CONDA_PKGS_DIRS`` environment variable so that ``conda`` does not attempt to write to the shared location... .. hpc-prompt:: hpcterm > export CONDA_PKGS_DIRS=~/.conda/pkgs Now we can create a new virtual environment using ``conda create`` using the ``-n`` argument to give it a specific name. For this example, the virtual environment name will be ``env1``. ... .. hpc-prompt:: hpcterm > conda create -n env1 We can see from the output that conda wants to create the environment in a location in our home directory ``~/.conda/envs/env1``. If ``conda create`` was successful, we should see some new files and directories in that location ... .. admonition:: Command Line Interpreter: *bash* :class: terminal_small | $ ls ~/.conda/ | environments.txt envs pkgs | | $ ls ~/.conda/envs/env1/ | conda-meta Step 4: Activate the Environment ++++++++++++++++++++++++++++++++ Also from the ``conda create`` output, we can see that Anaconda Python has advised us to use ``conda activate env1`` to use the environment. However, if you have not yet used Anaconda Python, you might see an error when running that command ... .. admonition:: Command Line Interpreter: *bash* :class: terminal_small | $ conda activate env1 | | CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. | To initialize your shell, run | | conda init .. warning:: We *could* certainly run ``conda init``, but doing this will alter our ``~/.bashrc`` file with settings specific to the loaded version of Python. Because ``.bashrc`` is sourced every time we log on to the cluster, this could potentially cause a conflict if we ever want to change our Python environment. To avoid this, our recommended method for activating the environment is to use ``source activate`` ... .. hpc-prompt:: hpcterm > source activate env1 If the activation is successful, your command prompt should change to indicate the environment in which you are running ... .. admonition:: Command Line Interpreter: *bash* :class: terminal_small | hpcuser@easley01:~ > source activate env1 | (env1) hpcuser@easley01:~ > Step 5: Install Packages ++++++++++++++++++++++++ Now, we should be able to install packages into our virtual environment using ``conda install ``. Let's see if we can install the curl module ... .. hpc-prompt:: hpcterm > conda install curl Step 6: Exiting +++++++++++++++ To exit the execution context of the virtual environment and get back to our regular shell, ``conda deactivate`` ... .. admonition:: Command Line Interpreter: *bash* :class: terminal_small | (env1) hpcuser@easley01:hpcuser > conda deactivate | hpcuser@easley01:hpcuser > Python Concurrency and Parallelism ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sample Code =========== Begin by copying the sample code from /tools/docs/tutorials/python/multi ... .. hpc-prompt:: hpcterm > cd ~ mkdir hpc_pylab cd hpc_pylab/ cp /tools/docs/tutorials/python/multi/* . Or, clone the public repository, which is the authoritative source for any updated code ... .. hpc-prompt:: hpcterm > cd ~ git clone https://github.com/auburn-research-computing/python_multiprocessing.git mv python_multiprocessing hpc_pylab cd hpc_pylab You should now have two files threads.py and procs.py .. hpc-prompt:: hpcterm >,(env)...$ auto > ls -al -rwxr-x--- 1 hpcuser hpcuser 1004 Sep 2 13:48 procs.py -rw-r--r-- 1 hpcuser hpcuser 1614 Sep 2 13:03 threads.py The procs.py file contains a code sample that employs process parallelism for a prime number calculation. The threads.py file demonstrates the use of thereads for I/O bound workloads. Multithreading with Python ^^^^^^^^^^^^^^^^^^^^^^^^^^ Let's start with a basic Python program to experiment with threading. Remember, threading is recommended for programs that spend much of their time waiting for input or output operations. The code threads.py will issue a number of http (web) requests, which provides a good simulation of (relatively) slow I/O. The syntax for using the threads.py looks something like ... .. code-block:: console python .py [number_of_threads] If the optional parameter number_of_threads is not provided, a single thread will be requested. Now let's submit an interactive job so that we can experiment with the code without worrying about overloading the login node ... .. hpc-prompt:: hpcterm >,(env)...$ auto > srun -N1 -n1 --pty /bin/bash node001> Here, we request a single core from one available compute node, and once the scheduler has allocated our resources, we should be dropped onto a compute node where we can run commands interactively. First, let's make sure we are in the location where we copied our sample code, and set our environment to use a recent version of Python ... .. hpc-prompt:: hpcterm node001> cd ~/hpc_pylab module load python Now, let's do some experimentation to see if we can see any benefit from using threads. We'll run the sample code with a single thread first, then increase it slightly to see if we see any performance benefit ... .. hpc-prompt:: hpcterm node001>,(env)...$ auto node001> python threads.py 1 ...take note of the total execution time... node001> python threads.py 4 ...take note of the total execution time... node001> python threads.py 8 ...take note of the total execution time... Multiprocessing with Python ^^^^^^^^^^^^^^^^^^^^^^^^^^^ For CPU bound workloads, like math operations we can use parallel processes in instead of threads. First, let's be sure to exit our current interactive job and resubmit using --ntasks and --cpus-per-task ... .. hpc-prompt:: hpcterm >,node001> auto > srun --ntasks=1 --cpus-per-task=8 --pty /bin/bash node001> cd ~/hpc_pylab Run the procs.py sample code with varied numbers of core and observe the performance impact ... .. hpc-prompt:: hpcterm node001>,(env)...$ auto node001> python procs.py 1 ... take note of the total execution time ... node001> python procs.py 4 ... take note of the total execution time ... node001> python procs.py 8 ... take note of the total execution time ... Job Submission ^^^^^^^^^^^^^^ To demonstrate the importance of job submission parameters, let's try running the procs.py (parallel process) program using the more standard node and core allocation ... First exit any existing interactive job if you haven't done so already. Then issue another job submission with ... .. hpc-prompt:: hpcterm >,node001> auto > srun -N1 -n1 --pty /bin/bash node001> cd ~/hpc_pylab node001> module load python node001> python procs.py 1 ...take note of execution time... node001> python procs.py 8 ...take note of execution time... | You should notice that the execution time remains very similar regardless of the number of cores you use. It's important to make sure that your job submission parameters are set according to the Python parallel model you want to use. As a general guideline, we recommend always using --ntasks and --cpus-per-task for Python programs that use multiprocessor or threading functions. Additional Services ^^^^^^^^^^^^^^^^^^^ Ralph Brown Draughon Library Research Data Services now offers computational support. Researchers can meet one on one with an expert in Python,R and many other data science/programming languages. More information can be found on their website https://libguides.auburn.edu/researchdata