Glossary

Key Points

Python in HPC	Python speed up can be thought of in multiple ways, single core and multi-core Single core speed-up techniques include `cython`, `numba` and `cffi` MPI is the true method of parallelisation in Python in multi-core applications
Timing Code and Simple Speed-up Techniques	Performance code profiling is used to identify and analyse the execution and improvement of applications. Never try and optimise your code on the first try. Get the code correct first. Most often, about 90% of the code time is spent in 10% of the application. The `lru_cache()` helps reduce the execution time of the function by using the memoization technique, discarding least recently used items first.
Numba	Numba only compiles individual functions rather than entire scripts. The recommended modes are `nopython=True` and `njit` Numba is constantly changing, so keep checking for new versions.
Cython	Cython IS Python, only with C datatypes Working from the terminal, a `.pyx`, `main.py` and `setup.py` file are required. From the terminal the code can be run with `python setup.py build_ext --inplace` Cython ‘typeness’ can be done using the `%%cython -a` cell magic, where the yellow tint is coloured according to typeness. Main methods of improving Cython speedup include static type declarations, declaration of functions with `cdef`, using `cimport` and utilising fast index of C-numpy arrays and types. Compiler directives can be used to turn off certain python features.
C Foreign Function Interface for Python	CFFI is an external package for Python that provides a C Foreign Function Interface for Python, and allows one to interact with almost any C code The Application Binary Interface mode (ABI) is easier, but slower The Application Programmer Interface mode (API) is more complex but faster
MPI with Python	MPI is the true way to achieve parallelism `mpi4py` is an unofficial library that can be used to implement MPI in Python A communicator is a group containing all the processes that will participate in communication A rank is a logical ID number given to a process, and therefore a way to query the rank Point to Point communication is the communication between two processes, where a source sends a message to a destination process which has to then receive it
Non-blocking and collective communications	In some cases, serialisation is worse than a deadlock, as you don’t know the code is inhibited by poor performance Collective communication transmits data among all processes in a communicator, and must be called by all processes in a group MPI is best used in C and Fortran, as in Python some function calls are either not present or are inhibited by poor performance
Dask	Dask allows task parallelisation of problems Identification of tasks is crucial and understanding memory usage Dask Array is a method to create a set of tasks automatically through operations on arrays Tasks are created by splitting larger arrays, meaning larger memory problems can be handled
GPUs with Python	GPUs are more powerful than CPUs, and have several thousand cores, but have much less memory

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1: explanation 1
key word 2: explanation 2

Python in HPC: Glossary

Key Points

Glossary