<center><img src="../../fig/ICHEC_Logo.jpg" alt="Drawing" style="width: 500px;"/>

<center> <img src="../../fig/notebooks/cythonlogo.png" alt="Drawing" style="width: 200px;"/>

<br>

# <center> Overview <center/>
******
***

**Cython** is a programming language that makes writing C extensions for the Python language as easy as Python itself. The source code gets **translated into optimised C/C++ code** and **compiled as Python extension modules**. 

The code is executed in the CPython runtime environment, but at the speed of compiled C with the ability to call directly into C libraries, whilst keeping the original interface of the Python source code.

This enables Cython's **two major use cases**:
   * extending the CPython interpreter with fast binary modules
   * interfacing Python code with external C libraries

**REMEMBER -** Cython **IS** Python, just with C data types, so some basic C knowledge is recommended.

<br>

# <center> Typing <center/>

**Cython** supports **static type declarations**, thereby turning readable Python code into plain C performance.

**Static Typing**:
   * type checking is performed during compile-time
   * e.g. x = 4 + 'e' would not compile
   * can detect type errors in rarely used code paths

**Dynamic Typing**:
   * type checking is performed during run-time
   * e.g. x = 4 + 'e' would result in a runtime type error
    
This allows for fast program execution and tight integration with external C libraries. 

# <center> Fundamentals of Implementing Cython <center/>

Cython can be utilised easily in Jupyter notebooks using cell magics, however we will also show you how to implement Cython outside the notebook environment.

There are a couple of ways to implement Cython:
   * Use cell magics `%%`;
       * and run everything in Jupyter notebooks
       * and create external files which you can then compile
   * Use another IDE (VSCode, spyder, Eclipse) or vim and utilise the terminal

### **1. Using Standard Python**

#### ***Utilising the Jupyter notebook***

In [None]:
def fib(n):
    # Prints the Fibonacci series up to n.
    a, b = 0, 1
    while b < n:
        print(b)
        a, b = b, a + b

In [None]:
fib(10)

We will use cell magics to create the files `fibonacci.py` and `fibonacci_main.py`

In [None]:
%%writefile fibonacci.py

def fib(n):
    # Prints the Fibonacci series up to n.
    a, b = 0, 1
    while b < n:
        print(b)
        a, b = b, a + b

In [None]:
%%writefile fibonacci_main.py

from fibonacci import fib

fib(10)

As you can see, we now have these files written [fibonacci.py](fibonacci.py) and implemented using [fibonacci_main.py](fibonacci_main.py)

(Try running this command using Terminal)

In [None]:
!python fibonacci_main.py

### **2. Using Cython**

#### ***A) Using Jupyter notebook***

First, load the cython extension into Jupyter notebook using;

In [None]:
%load_ext cython

In [None]:
import cython
cython.__version__

<div class="alert alert-block alert-info">
<b>This only needs to be done once!<b/>   
</div>

Now we can use Cython in our Jupyter notebook

In [None]:
%%cython

def fib_cyt(n):
    # Prints the Fibonacci series up to n.
    a, b = 0, 1
    while b < n:
        print(b)
        a, b = b, a + b

In [None]:
fib_cyt(10)

#### ***B) Using a `setup.py` (recommended)***

To run it in the terminal requires a bit more work, but is the best practice methodology.

We need to write the cell into a `.pyx` extension. 

In [None]:
%%writefile fibonacci_cyt.pyx

def fib_cyt(n):
    # Prints the Fibonacci series up to n.
    a, b = 0, 1
    while b < n:
        print(b)
        a, b = b, a + b

This `.pyx` file is compiled by Cython to a `.c` file. This file is then compiled by a C compiler to a `.so` or `.dylib` file.

There are a few ways to build your extension module. This method is by creating the `setup.py`, which can be viewed like a python Makefile

In [None]:
%%writefile setup_fib.py

from distutils.core import setup, Extension
from Cython.Build import cythonize

setup(ext_modules = cythonize("fibonacci_cyt.pyx"))

<br>
Lets see what our current directory looks like at present

In [None]:
!ls

<br>

At this stage all we have are our original Python files, our `.pyx` file and `setup_fib.py`. Now lets run our `setup_fib.py` and see how that changes. 

We use `build_ext --inplace` to compile the extension for use in the current directory.

In [None]:
!python setup_fib.py build_ext --inplace

<br>

Lets see what has happened to our current directory.

In [None]:
!ls

We have a few new additions;

* `.c` file, which is then compiled using a C compiler
* `build` directory which contains the `.o` file generated by the compiler
* `.so` file. The compiled library file

Next we add the `main` file which we will use to run our program.

In [None]:
%%writefile fibonacci_cyt_main.py 

from fibonacci_cyt import fib_cyt

fib_cyt(10)

In [None]:
!python fibonacci_cyt_main.py

And that's it, you have successfully compiled and used a Cython file. 

This is only the start however, as compiling a Cython file is the bare minimum that you would need to get a significant speedup.

#### **REMEMBER the 3 things you need!**
1. A `.pyx` file containing your Cython code
2. A `setup.py` file to build the extension
3. A `module_main.py` file with which you can use the extension

****
# <center> [Exercise 1 ~ 10 mins](exercise/03-Cython-Exercise.ipynb)

****



# <center>Accelerating Cython: Part 1<center/>
    
Compiling with Cython is fine, but it doesn't speed up our code to actually make a significant difference. We need to implement the C-features that Cython was designed for.
    
There are a number of different methods we can use.

###  **1. Static type declarations**

These allow Cython to step out of the dynamic nature of the Python code and generate simpler and faster C code - sometimes faster by orders of magnitude.

This is often the simplest and quickest way to achieve significant speedup, but the code can become more verbose and less readable

Types are declared with `cdef` keyword


**A: With cell magics ONLY**

In [None]:
import time
from random import random

def pi_montecarlo(n=1000):
    '''Calculate PI using Monte Carlo method'''
    in_circle = 0
    for i in range(n):
        x, y = random(), random()
        if x ** 2 + y ** 2 <= 1.0:
            in_circle += 1
        
    return 4.0 * in_circle / n

N = 100000

t0 = time.time()
pi_approx = pi_montecarlo(N)
t_python = time.time() - t0
print("Pi Estimate:", pi_approx)
print("Time Taken", t_python)

In [None]:
%%cython
import time
from random import random

def pi_montecarlo(n=1000):
    '''Calculate PI using Monte Carlo method'''
    in_circle = 0
    for i in range(n):
        x, y = random(), random()
        if x ** 2 + y ** 2 <= 1.0:
            in_circle += 1
        
    return 4.0 * in_circle / n

N = 100000

t0 = time.time()
pi_approx = pi_montecarlo(N)
t_cython0 = time.time() - t0
print("Pi Estimate:", pi_approx)
print("Time Taken", t_cython0)

**B: Implementing static type declarations**

In [None]:
%%cython
import time
from random import random

def pi_montecarlo(int n=1000):
    '''Calculate PI using Monte Carlo method'''
    cdef int in_circle = 0, i
    cdef double x, y
    for i in range(n):
        x, y = random(), random()
        if x ** 2 + y ** 2 <= 1.0:
            in_circle += 1
        
    return 4.0 * in_circle / n

N = 100000

t0 = time.time()
pi_approx = pi_montecarlo(N)
t_cython1 = time.time() - t0
print("Pi Estimate:", pi_approx)
print("Time Taken", t_cython1)


So, as you can see, a significant speedup, even with this minimal example!

### **2. Typing Function Calls**

As with 'typing' variables, you can also 'type' functions. Function calls in Python can be expensive, and can be even more expensive in Cython as one might need to convert to and from Python objects to do the call.

There are two ways in which to declare C-style functions in Cython;
* Declaring a C-type function - `cdef` (same as declaring a variable)
* Creation of a Python wrapper - `cpdef`

A side-effect of cdef is that the function is no longer available from Python-space, so Python won't know how to call it

**C: Implementing function call overheads**

In [None]:
%%cython

def cube(double x):
    return x ** 3

In [None]:
%time cube(3)

In [None]:
%%cython

cdef double cube_cdef(double x):
    return x ** 3

In [None]:
# Purposeful error!
%time cube_cdef(3)

<div class="alert alert-block alert-info">
<b>A side-effect of cdef is that the function is no longer available from Python-space, so Python won't know how to call it, so if we want to use the time magic command, use  cpdef</b>
</div>

In [None]:
%%cython
import time
from random import random

cdef double pi_montecarlo(int n=1000):
    '''Calculate PI using Monte Carlo method'''
    cdef int in_circle = 0, i
    cdef double x, y
    for i in range(n):
        x, y = random(), random()
        if x ** 2 + y ** 2 <= 1.0:
            in_circle += 1
        
    return 4.0 * in_circle / n

N = 100000

t0 = time.time()
pi_approx = pi_montecarlo(N)
t_cython2 = time.time() - t0
print("Pi Estimate:", pi_approx)
print("Time Taken", t_cython2)


In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.figure()
results = [t_python, t_cython0, t_cython1, t_cython2]
labels = ["Python", "Cython", "Cython\nStatic", "Cython\nFunc."]
plt.bar(range(len(results)), results)
plt.xticks(range(len(results)),labels)
plt.title("Pi Monte Carlo")

plt.ylabel('Time (sec)')

### **Where should I add types? (Profiling and annotation)**

For those of new to Cython and the concept of declaring types, there is a tendency to 'type' everything in sight. This reduces readability and flexibility and in certain situations, even slow things down.

It is also possible to kill performance by forgetting to 'type' a critical loop variable. Tools we can use are **profiling** and **annotation**.

Profiling is the first step of any optimisation effort and can tell you where the time is being spent. Cython's annotation can tell you why your code is taking so long.

Using the `-a` switch in cell magics, or `cython -a cython_module.pyx` from the terminal creates an HTML report of Cython and generated C code. Alternatively, pass the `annotate=True` parameter to `cythonize()` in the `setup.py` file (Note, you may have to delete the c file and compile again to produce the HTML report).

In [None]:
%%cython -a

from random import random

cpdef pi_montecarlo_cy(int n=1000):
    '''Calculate PI using Monte Carlo method'''
    cdef int in_circle = 0
    cdef int i
    cdef double x, y
    for i in range(n):
        x, y = random(), random()
        if x ** 2 + y ** 2 <= 1.0:
            in_circle += 1
        
    return 4.0 * in_circle / n

<br>
Lines are coloured according to "typedness";

* White lines translate to pure C (fast as normal C code)
* <span style='background :yellow' >Yellow</span> lines that require the Python C-API
* Lines with a `+` are translated to C code and can be viewed by clicking on it



By default, Cython code does not show up in profile produced by cProfile. In Jupyter notebook or indeed a source file, profiling can be enabled by including in the first line; 

```python
# cython: profile=True
```

Alternatively, if you want to do it on a function by function basis;
* Exclude specific function while profiling code

```python
# cython: profile=True
import cython
@cython.profile(False)
cdef func():
```
* Only profile highlighted function

```python
# cython: profile=False
import cython
@cython.profile(True)
cdef func():
```

To run the profile in Jupyter, we can use the cell magics `%prun func()`

In [None]:
%%cython 
# cython: profile=True

from random import random
import cython


cpdef pi_montecarlo_cy(int n=1000):
    '''Calculate PI using Monte Carlo method'''
    cdef int in_circle = 0
    cdef int i
    cdef double x, y
    for i in range(n):
        x, y = random(), random()
        if x ** 2 + y ** 2 <= 1.0:
            in_circle += 1
        
    return 4.0 * in_circle / n

In [None]:
%prun pi_montecarlo_cy(10000000)

# <center>Accelerating Cython: Part 2<center/>
    
Static type declarations and function call overheads can significantly reduce runtime, however if you are dealing with numpy arrays, there are additional things you can do to significantly speed up runtime.
    
### **3. NumPy Arrays with Cython**
    


In [None]:
import numpy as np

In [None]:
def powers_array(N, M):
    data = np.arange(M).reshape(N,N)
    
    for i in range(N):
        for j in range(N):
            data[i,j] = i**j
    return(data[2])


In [None]:
%time powers_array(15,225)

In [None]:
%%cython

import numpy as np # Normal NumPy import
cimport numpy as cnp # Import for NumPY C-API

def powers_array_cy(int N, int M): # declarations can be made only in function scope
    cdef cnp.ndarray[cnp.int_t, ndim=2] data
    data = np.arange(M).reshape((N, N))


    for i in range(N):
        for j in range(N):
            data[i,j] = i**j
    return(data[2])

In [None]:
%time powers_array_cy(15,225)

Note that for a small array like this, the speed up is not significant, you may even have got a slow down. This is because this particular operation in this situation suffers from unnecessary typing, as we have already discussed. 

Just because you can, doesn't always mean you should!

For larger problems with larger arrays, speeding up using `cnp` arrays are recommended!

### **4. Compiler Directives**

These affect the code in a way to get the compiler to ignore things that it would usually look out for. There are plenty of examples as discussed in the Cython [documentation](https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html), however the main ones we will use here are;

* `boundscheck` - If set to False, Cython is free to assume that indexing operations in the code will not cause any IndexErrors to be raised
* `wraparound` - If set to False, Cython is allowed to neither check for nor correctly handle negative indices. This can cause data corruption or segfaults if mishandled.

<div class="alert alert-block alert-info">
<b>You should implement these at a point where you know that the code is working efficiently and that any issues what could be raised by the compiler are sorted</b>
</div>

There are a few ways to implement them;

* Header comment at the top of a `.pyx` file, which must appear before any code

    `# cython: boundscheck=False`
* Passing a directive on the command line using the `-X` switch

    `$ cython -X boundscheck=True ...`
* Or locally for specific functions, for which you first need the `cython` module imported

    ```python
    cimport cython
    ```
    
    ```python
    @cython boundscheck(False)
    ```

In [None]:
%%cython

import numpy as np # Normal NumPy import
cimport numpy as cnp # Import for NumPY C-API

cimport cython

@cython.boundscheck(False) # turns off 
@cython.wraparound(False)

def powers_array_cy(int N, int power): # number of 
    cdef cnp.ndarray[cnp.int_t, ndim=2] arr
    cdef int M
    M = N*N
    arr = np.arange(M).reshape((N, N))

    for i in range(N):
        for j in range(N):
            arr[i,j] = i**j
    return(arr[power]) # returns the ascending powers

In [None]:
%time powers_array_cy(15,4)

****
# <center> [Exercise 2 ~ 5 mins](exercise/03-Cython-Exercise.ipynb)

****

***

# <center>Case Study: Mandelbrot<center/>

<center> <img src="../../fig/notebooks/Mandle.png" alt="Drawing" style="width: 400px;"/>

Here we are going to do a step by step demo of a good method of speeding up a Mandelbrot generation code, which is originally written in pure python.

First lets import our libraries and create an array to keep track of timings and how they change over time.

### **Attempt 1: Pure Python**

In [None]:
import matplotlib.pyplot as plt
import time
import numpy as np
from numpy import random
%matplotlib inline

mandel_timings = []

In [None]:

def plot_mandel(mandel):
    fig=plt.figure(figsize=(10,10))
    ax = fig.add_subplot(111)
    ax.set_aspect('equal')
    ax.axis('off')
    ax.imshow(mandel, cmap='gnuplot')
    plt.savefig('mandel.png')

def kernel(zr, zi, cr, ci, radius, num_iters):
    count = 0
    while ((zr*zr + zi*zi) < (radius*radius)) and count < num_iters:
        zr, zi = zr * zr - zi * zi + cr, 2 * zr * zi + ci
        count += 1
    return count

def compute_mandel_py(cr, ci, N, bound, radius=1000.):
    t0 = time.time()
    mandel = np.empty((N, N), dtype=int)
    grid_x = np.linspace(-bound, bound, N)

    for i, x in enumerate(grid_x):
        for j, y in enumerate(grid_x):
            mandel[i,j] = kernel(x, y, cr, ci, radius, N)
    return mandel, time.time() - t0

def python_run():
    kwargs = dict(cr=0.3852, ci=-0.2026,
              N=200,
              bound=1.2)
    print("Using pure Python")
    mandel_func = compute_mandel_py       
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated in {} seconds\n".format(runtime))
    plot_mandel(mandel_set)
    mandel_timings.append(runtime)



In [None]:
%time python_run()

<br>

You should have got a time between 10-15 seconds. Now we implement the next step;

### **Attempt 2: Compiling with Cython**

Cythonise the appropriate cell using magics

In [None]:
%%cython

import numpy as np
import time

def kernel(zr, zi, cr, ci, radius, num_iters):
    count = 0
    while ((zr*zr + zi*zi) < (radius*radius)) and count < num_iters:
        zr, zi = zr * zr - zi * zi + cr, 2 * zr * zi + ci
        count += 1
    return count

def compute_mandel_cyt(cr, ci, N, bound, radius=1000.):
    t0 = time.time()
    mandel = np.empty((N, N), dtype=int)
    grid_x = np.linspace(-bound, bound, N)

    for i, x in enumerate(grid_x):
        for j, y in enumerate(grid_x):
            mandel[i,j] = kernel(x, y, cr, ci, radius, N)
    return mandel, time.time() - t0

In [None]:
def cython_run():
    kwargs = dict(cr=0.3852, ci=-0.2026,
              N=200,
              bound=1.2)
    print("Using Cython compiler")
    mandel_func = compute_mandel_cyt
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated: \n")
    print("Pure Python runtime: ", mandel_timings[0])
    print("Compiled with Cython: {}\n".format(runtime))
    plot_mandel(mandel_set)
    mandel_timings.append(runtime)

In [None]:
%time cython_run()

Not much faster, but still, an improvement. We are going to see more significant improvements when we implement **static type declarations.**

### **Attempt 3: Static Type Declarations**

In [None]:
%%cython

import numpy as np
import time

def kernel(double zr, double zi, double cr, double ci, 
           double radius, int num_iters):
    cdef int count = 0
    while ((zr*zr + zi*zi) < (radius*radius)) and count < num_iters:
        zr, zi = zr * zr - zi * zi + cr, 2 * zr * zi + ci
        count += 1
    return count

def compute_mandel_cyt(cr, ci, N, bound, radius=1000.):
    t0 = time.time()
    mandel = np.empty((N, N), dtype=int)
    grid_x = np.linspace(-bound, bound, N)
    
    for i, x in enumerate(grid_x):
        for j, y in enumerate(grid_x):
            mandel[i,j] = kernel(x, y, cr, ci, radius, N)
    return mandel, time.time() - t0

In [None]:
def cython_run():
    kwargs = dict(cr=0.3852, ci=-0.2026,
              N=200,
              bound=1.2)
    print("Using Cython compiler & static type declarations")
    mandel_func = compute_mandel_cyt
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated: \n")
    print("Pure Python runtime: ", mandel_timings[0])
    print("Compiled with Cython: ", mandel_timings[1])
    print("Type declaration in kernel: {}\n".format(runtime))
    mandel_timings.append(runtime) 
    plot_mandel(mandel_set)

In [None]:
%time cython_run()

As you can see, the speedup is significant because of this, but can often be overused. There is a tendency to '*type*' everything in sight which can slow down your code fractionally rather than improve it
### **Attempt 4: Function call overhead**

In [None]:
%%cython

import numpy as np
import time

cdef int kernel(double zr, double zi, double cr, double ci, 
           double radius, int num_iters):
    cdef int count = 0
    while ((zr*zr + zi*zi) < (radius*radius)) and count < num_iters:
        zr, zi = zr * zr - zi * zi + cr, 2 * zr * zi + ci
        count += 1
    return count

def compute_mandel_cyt(cr, ci, N, bound, radius=1000.):
    t0 = time.time()
    mandel = np.empty((N, N), dtype=int)
    grid_x = np.linspace(-bound, bound, N)
    
    for i, x in enumerate(grid_x):
        for j, y in enumerate(grid_x):
            mandel[i,j] = kernel(x, y, cr, ci, radius, N)
    return mandel, time.time() - t0

In [None]:
def cython_run():
    kwargs = dict(cr=0.3852, ci=-0.2026,
              N=200,
              bound=1.2)
    print("Using Cython compiler & static type declarations")
    mandel_func = compute_mandel_cyt
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated: \n")
    print("Pure Python runtime: ", mandel_timings[0])
    print("Compiled with Cython: ", mandel_timings[1])
    print("Type declaration in kernel: ", mandel_timings[2])
    print("Kernel as a C function: {}\n".format(runtime))
    mandel_timings.append(runtime) 
    plot_mandel(mandel_set)

In [None]:
%time cython_run()

### **Attempt 5: Using NumPy arrays and compiler directives with Cython**

Cython's support of fast indexing, declaring types and dimensions can improve the runtime even more.

We need to cimport it 

We can also speed up the loop by removing the `enumerate` keyword and define new loop variables

Speeding up the code further can be done using our compiler directives which can turn certain features on of off.

Here we will use `boundscheck` and `wraparound`, assuming no indexing errors or negative indexing

In [None]:
%%cython

import numpy as np
import time
cimport numpy as cnp

from cython cimport boundscheck, wraparound

@wraparound(False)
@boundscheck(False)

cdef int kernel(double zr, double zi, double cr, double ci, 
           double radius, int num_iters):
    cdef int count = 0
    while ((zr*zr + zi*zi) < (radius*radius)) and count < num_iters:
        zr, zi = zr * zr - zi * zi + cr, 2 * zr * zi + ci
        count += 1
    return count

def compute_mandel_cyt(cr, ci, N, bound, radius=1000.):
    t0 = time.time()
    
    cdef cnp.ndarray[cnp.int_t, ndim=2] mandel
    mandel = np.empty((N, N), dtype=int)
    
    cdef cnp.ndarray[cnp.double_t, ndim=1] grid_x
    grid_x = np.linspace(-bound, bound, N)
    
    cdef:
        int i, j
        double x, y
    
    for i in range(N):
        for j in range(N):
            x = grid_x[i]
            y = grid_x[j]
            
            mandel[i,j] = kernel(x, y, cr, ci, radius, N)
    return mandel, time.time() - t0

In [None]:
def cython_run():
    kwargs = dict(cr=0.3852, ci=-0.2026,
              N=200,
              bound=1.2)
    print("Using Cython compiler")
    mandel_func = compute_mandel_cyt
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated: \n")
    print("Pure Python runtime: ", mandel_timings[0])
    print("Compiled with Cython: ", mandel_timings[1])
    print("Type declaration in kernel: ", mandel_timings[2])
    print("Kernel as C function: ", mandel_timings[3])
    print("Fast Indexing and directives: {}\n".format(runtime))
    mandel_timings.append(runtime)
    plot_mandel(mandel_set)
    global speed_up_factor
    speed_up_factor = 10 * round((mandel_timings[0]/mandel_timings[4])/10)
    
    print("Final speedup of around {} times the original Python code!\n".format(speed_up_factor))

In [None]:
%time cython_run()

### **Plot Times**

In [None]:
labels = ["Python", "Cython", "Cython\nStatic", "Cython\nFunc.", "Cython\nCompiler"]

plt.bar(range(len(mandel_timings)), mandel_timings)
plt.xticks(range(len(mandel_timings)),labels)

plt.ylabel('Time (sec)')
plt.yscale("log")
plt.title("Mandlebrot")

Now we can do longer calculations, so let's
- zoom in by a factor of **10**
- increase the number of iterations by a factor of **10**

In [None]:
def speed_run():
    kwargs = dict(cr=0.3852, ci=-0.2026,
              N=2000,
              bound=0.12)
    mandel_func = compute_mandel_cyt
    mandel_set, runtime = mandel_func(**kwargs)
    print("Mandelbrot set generated: \n")
    print("Using advanced techniques: {}\n".format(runtime))
    plot_mandel(mandel_set)
    print("Assuming the same speed up factor, our original code would take", (speed_up_factor*runtime)/60, "minutes")

In [None]:
speed_run()

***


# <center>Cheatsheet<center/>


It can be difficult to keep track of the steps to take when optimising code with Cython. The absolute essentials for working from the command line are as follows;
    
1. A `.pyx` file containing the code you wish to 'cythonize'
2. A `main.py` where your function(s) can be implemented
3. A `setup.py` to build your extension
4. Run your code using `python setup.py build_ext --inplace`

Once those 4 things are done, the rest can be considered as 'optional' and there is no need to do them in order. The more of these you can use in your code, the better

* Use static type declarations (`int`, `double`)
* Reduce overheads by;
    * defining functions using `cdef`
    * generate wrappers using `cpdef`
* Use `cimport` and utilise fast indexing C-numpy arrays and types
* Use compiler directives to turn off certain python features

***