Why should I use a HPC cluster?

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • What is HPC?

  • Why would HPC be important to my work?

  • What is the difference between login nodes and compute nodes

Objectives
  • Understand how a HPC system can benefit you

What is HPC?

A HPC cluster is a collection of ‘computers’ called nodes, all inter connected. Considerably more powerful then just one computer, like your laptop. As an idea of scaleJust for scale - your laptop typically has 8 cores. Kay has 40 cores per node and has 336 normal compute nodes.

Once connected you can avail of the computation power and run scripts too heavy to run on your own computer. There are different types of nodes;

Why would I need to use a HPC cluster?

The best way to answer this is with an example.

Rationale for using a HPC

Imagine you had a 10x10x10 box like the one below, divided up into smaller boxes, each measuring 1x1x1. In one hour, one CPU core can simulate one hour of activity inside the smaller box. If you wanted to simulate what was happening inside the large box for 8 hours, be it a molecular interaction, climate, weather as an example, how long will it take to run if we only use one CPU core?

Solution

8000 hours…close to a year!

This is way longer than is feasibly convenient! But remember, that is utilising just one core. If you had a machine that could simulate each of those smaller boxes simultaneously and a code that enables each box to effectively interact with each other, the whole job would only take roughly an hour (but probably a little more because of issues we will discuss in subsequent episodes).

Although the techniques and exact methodologies of speedup like this is beyond the scope of this course, hopefully you now realise and can imagine the types of things that a supercomputer can make possible through either the implementation of high performance code or high quantities of cores.

Login and compute nodes

Imagine the login and compute nodes as a catalog retailer like Argos. You enter into the shop, you are on the login nodes. These are the initial entry points into a HPC system. You may well see a reflection of this in the prompt.

course00@login2:~$

These login nodes can be considered the “foyer” of the HPC system, the external interface, you can browse the catalog of the HPC, i.e. check out the different libraries, use UNIX commands. But there is a problem with the login nodes, which is everyone is using them. In the same way everyone enters a shop and explores, and the more crowded a shop is, the slower things seem to happen.

If you decide to do work in this shop, for example by running code and utilising libraries, the more space you take up. The shop’s floor space is limited, and similarly, the login node space is limited. What can happen is that someone decides (wrongly) to run a long code in the login node, which takes up space and therefore everyone else is impacted by it. You also get in trouble with systems!

Let’s think about our example of Argos. The “foyer”, where you enter into the shop has an open space with a few desks. You may have a few little things like memory sticks or a packet of mints that you can buy on the way to the till.

But the most important feature are the catalogs on the desks, which have a list of all the items available. Next to it you have some pencils and notepads where you write in the catalog number of the items(s) that you wish to purchase. You then deliver your list of orders to the till and then an attendant gets the items for you behind the scenes while you wait.

These can be thought of as the action of submission scripts and compute nodes. The compute nodes are like the warehouse of the store, all the important things are stored there and is where all the major work is done. On a HPC, the compute nodes are the place that your code is run.

The submission scripts, i.e. the list of orders in our shop, is sent to the compute node to undertake a task, where all the work is done behind the scenes. There are ways to access the compute nodes directly, but that is beyond the scope of this course

These compute nodes are controlled by a queueing system, and depending on the program submitted and the resources asked for, you will be assigned a queue. The job will wait in the queue until the requested resources are available.

For this reason, the login nodes are only to be used for installing packages, moving files, and running small snippets of code, no longer than a few minutes. Just enough to check that your code is working properly

ICHEC’s national service

ICHEC’s National Service provides three different project classes with differing resource allocations, review levels and review turnaround times. The selection of each class largely depends on the necessities and requirements of the research and applicant’s objectives.

You can use our core calculator) to calculate the number of core hours depending on the node type. If you are considering undertaking a project with ICHEC, this can be a useful resource.

Class C Project: Discovery

Class C projects are intended to provide fast access to modest resources with less review overhead. They have multiple possible uses including:

Class C project applicants should note that the maximum resource levels are modest and will be exhausted very quickly if larger scale runs are performed.

Item Amount
Max. CPU 100,000 h
Max. Storage 250 GB
Max. Duration 12 months
Max. Review 1 week
Proposal 2-3 pages.
Applicants Group Leader/Professor/Lecturer/Post-doc/Graduate Student

Class B project: Regular

Class B projects are intended for the needs of the majority of the research community. Typically applicants will be small research groups or individual researchers. Successful applications are expected to lead to referred publications.

Item Amount
Max. CPU 1,000,000 h
Max. Storage 2000 GB
Max. Duration 18 months
Max. Review 8 weeks
Proposal 4-5 pages.
Applicants Group Leader/Professor/Lecturer/Post-doc

Class A project: High Impact

Class A projects are intended for consortia concerned with high impact problems. These groups will require resources representing a substantial fraction of the centre’s resources over a long period of time. Successful applications are expected to yield high-impact scientific publications.

Class A project project applicants are expected to have a good knowledge of the characteristics of the code(s) which they intend to use - such as scalability properties - before writing their proposal. For this reason, applicants who are not in such a position are advised to first apply for an exploratory Class C project in order to undertake a basic scalability and performance study. Such an exercise is essential to provide an accurate estimate and a proper justification of the resources requested.

Item Amount
Max. CPU 10,000,000 h
Max. Storage 5000 GB
Max. Duration 24 months
Max. Review 12 weeks
Proposal 6-10 pages.
Applicants Group Leader/Professor/Lecturer/

Key Points

  • High Performance Computing (HPC) involves connecting to large computing systems elsewhere in the world

  • HPCs typically have thousands to hundreds of thousands of cores

  • HPCs have the ability to perform calculations and run simulations that would be impossible on normal machines