Python on AU HPC Systems

Python is a general purpose programming language which has emerged as a popular choice for a number of research domains, primarily due to its portability, extensibility, and emphasis on human-friendly semantics.

Python is an interpreted language which makes it easy to move code from one system to another, but it can present problems with scalability on HPC clusters. The vast majority of Python code will use a shared memory model (threading) which is limited to the available resources on a single compute node.

Additionally, because AU HPC clusters are multi-user, shared systems where certain privileges are limited to the designated system administrators, the extensibility of Python through the use of modules can be somewhat constrained. Manipulation of the various Python configurations, as you might normally do on your local workstation, is not possible on the system installed Python instances.

These caveats are derived from feedback and experience from HPC researchers over the years, and are intended to promote awareness of the limitations of Python on HPC systems, not to discourage you from using Python for your HPC workloads.

The good news is that you can still run parallel Python code with scalability up to 48 cores on the Easley Intel node, or up to 128 cores on the AMD nodes. Even more encouraging is the use of GPU with Python, which has been proven to be very effective for some science domains such as machine learning.

Python Virtual Environments

To work around the aforementioned problems with limited privileges, it is possible to use your home directory /home/<userid> where you have full rights, to install special entire instances of Python, or (more efficiently) create virtual environments.

Additionally, if you are planning on using multiple versions of the same package for different projects, you may run into dependency issues. Virtual environments can also be used to solve this issue. Virtual environments are recommended for python-based projects, and it is recommended that you create a new environment for every project.

The Python virtualenv module allows us to create encapsulated Python instance where we can perform module installations without administrative privileges and avoid software conflicts. The virtualenv feature actually creates a copy of the core binaries and libraries needed to run a Python program, including an isolated site-packages location in your home directory (or other specified location to which you have write permissions) where modules are typically installed. This feature is very helpful for shared systems like HPC clusters so that you can customize Python for your research workflow without administrative privileges.