Python in HPC: Glossary

Key Points

Python in HPC
  • Python speed up can be thought of in multiple ways, single core and multi-core

  • Single core speed-up techniques include cython, numba and cffi

  • MPI is the true method of parallelisation in Python in multi-core applications

Timing Code and Simple Speed-up Techniques
  • Performance code profiling is used to identify and analyse the execution and improvement of applications.

  • Never try and optimise your code on the first try. Get the code correct first.

  • Most often, about 90% of the code time is spent in 10% of the application.

  • The lru_cache() helps reduce the execution time of the function by using the memoization technique, discarding least recently used items first.

Numba
  • Numba only compiles individual functions rather than entire scripts.

  • The recommended modes are nopython=True and njit

  • Numba is constantly changing, so keep checking for new versions.

Cython
  • Cython IS Python, only with C datatypes

  • Working from the terminal, a .pyx, main.py and setup.py file are required.

  • From the terminal the code can be run with python setup.py build_ext --inplace

  • Cython ‘typeness’ can be done using the %%cython -a cell magic, where the yellow tint is coloured according to typeness.

  • Main methods of improving Cython speedup include static type declarations, declaration of functions with cdef, using cimport and utilising fast index of C-numpy arrays and types.

  • Compiler directives can be used to turn off certain python features.

C Foreign Function Interface for Python
  • CFFI is an external package for Python that provides a C Foreign Function Interface for Python, and allows one to interact with almost any C code

  • The Application Binary Interface mode (ABI) is easier, but slower

  • The Application Programmer Interface mode (API) is more complex but faster

MPI with Python
  • MPI is the true way to achieve parallelism

  • mpi4py is an unofficial library that can be used to implement MPI in Python

  • A communicator is a group containing all the processes that will participate in communication

  • A rank is a logical ID number given to a process, and therefore a way to query the rank

  • Point to Point communication is the communication between two processes, where a source sends a message to a destination process which has to then receive it

Non-blocking and collective communications
  • In some cases, serialisation is worse than a deadlock, as you don’t know the code is inhibited by poor performance

  • Collective communication transmits data among all processes in a communicator, and must be called by all processes in a group

  • MPI is best used in C and Fortran, as in Python some function calls are either not present or are inhibited by poor performance

Dask
  • Dask allows task parallelisation of problems

  • Identification of tasks is crucial and understanding memory usage

  • Dask Array is a method to create a set of tasks automatically through operations on arrays

  • Tasks are created by splitting larger arrays, meaning larger memory problems can be handled

GPUs with Python
  • GPUs are more powerful than CPUs, and have several thousand cores, but have much less memory

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1
explanation 1
key word 2
explanation 2