Teaching HPC

Gillian Law looks at the complexities of teaching High Performance Computing, and at what needs to be taught to students throughout their computer science education.

Running any course in High Performance Computing is complicated by the fact that you are faced with such a wide range of people wanting to learn. People from computer science backgrounds want to learn more about working on large-scale software projects. People from industry realise they need to know more in order to manage the computers that clients are putting in place. And then chemists, physicists, astronomers and engineers come in, with a basic understanding of computing but a need to run large parallel simulations in order to do their research.

The MSc in High Performance Computing at the University of Edinburgh has to bring all of these disparate students together and make sure each achieves their goals. It can be challenging, both for the teaching staff and the students, says EPCC group manager, and manager of the MSc course, Dr David Henty.

“Basically, each community finds half of the course more familiar, and half quite new, depending on their background,” he says. “If you’re an IT student you’ll find the programming skills and the hardware stuff familiar, but writing simulation codes, big parallel codes, that will be new. The physics and engineering students, on the other hand, have probably already written some big codes to do their simulations but they really have never been taught the formal stuff. Bringing them together actually works quite well, especially as the HPC and parallel programming aspects are new and exciting for everyone.”

The course has been running for eleven years, with around 25 to 30 students each year. With such a broad range of backgrounds coming in, Henty’s main aim is to give each student a thorough grounding in writing real code to solve real problems, in whatever area interests them personally.

“They need to be able write good code and also to understand the code they’ve written. How fast does it go? Is it behaving as they’d expect? How can it go faster?”

With that understanding in place a student will be able to work with in any area where performance and speed are key, he says.

The course covers a range of subjects, including fundamental architectures and hardware, plus shared memory and distributed memory programming.

Programming skills are vital, from tools like revision control, managing the software, and then higher level ‘project management’ skills to ensure development is heading in the right direction.

It’s also essential that students fully understand what High Performance Computing is really used for, and that can be complicated. “You can’t just give a computer science student a physics course!” Henty says. “You have to pick practical examples, using minimal theory, to illustrate what’s being done.”

Other courses include a numerical algorithms course – a grounding in what work is really being done using HPC – plus different parallel programming languages and parallel design patterns.

 

Parallelisation and the undergraduate

These parallel programming courses raise a controversial topic – should the basics of this be taught earlier, at undergraduate level? With the rise in multicore processors in laptops and practically every electronic device, some claim we will soon hit a crisis point, with too few computer science graduates understanding how to code in a multithreaded way.

Henty, however, is not convinced that it’s a subject essential for every student.

“It’s a progression. You start learning about programming, and some people will be interested in performance – and so those people can go on to learn about parallelism. You’re trying to build a pyramid [of interests].” Currently, parallel programming is perhaps not as widespread as some observers claim, he says.

“I’ve got a multicore laptop, but every programme doesn't need to be parallel. One core’s running Internet Explorer, another is running Powerpoint - it’s only when you want to make one application go faster that you need to think about parallel processing,” he says.

Having said that, parallel programming is starting to become essential in an ever-increasing range of everyday applications, he says. This is in addition to large-scale computer simulations in research and industry, for example climate modelling, where massive parallelism “has been the only game in town for more than a decade”.

“There is often confusion between simply running on multicore hardware and actual parallel programming. In any area where speed is important then parallelisation is the only way to benefit from multicore as the individual cores are no longer getting any faster. This is already true in the games market where new releases succeed or fail based on the realism of their graphics and physics engines, both of which need to exploit many cores at once. As the number of cores in laptops and desktops continues to rise, any performance-critical application that fails to go parallel is likely to fall by the wayside.

Dr Judy Hardy, Director of Studies on the MSc, agrees that, while parallel programming is increasingly important, it’s not essential at undergraduate level. Most undergraduate students, she says, “are not at the point where that would be necessary, not as a compulsory part of teaching. In any subject, and it seems particularly true of computing, you have this huge range of interests and abilities, so there is only a small minority of students that it would [suit],” she says.

Multicore, she says, is “big at the moment, but things change, and there will always be the ‘next big thing’. You have to respond to these changes, but you have to do that in a managed and careful way. What you hope to do is give students skills that will allow them to continue to learn and adapt. Because one thing is for sure – while the actual content of what they learn, the basics, will not change, any type of architecture is going to be different in ten years. So they need the skills that will enable them to keep up to date.”

Talking to a student, though, does bring a different view. Daniel Holmes is now completing a PhD after doing the MSc in High Performance Computing, which he came to from 12 years working in industry. Holmes feels that many undergraduate students would benefit from a basic understanding of parallelisation.

“I had never heard of MPI (Message Passing Interface) until I got here, and that’s basically the dominant way of doing HPC now. How do you get a thousand processors to talk to one another and operate on a task? And nobody in industry knows anything about the fact it even exists. Okay, that’s a pretty sweeping statement, but it’s generally true,” he says.

Using multicore properly takes “a completely different way of approaching thinking about your programme. It’s different – not necessarily harder. If you’re a good programmer in a single threaded environment you probably will be in multithreaded, but you need to learn the skills. And without that extra bit of knowledge it’s very difficult.”

Holmes himself worked as a contract programmer before joining the MSc. One of his clients was a large insurance company that had installed a supercomputer to speed up responses on its website. The problem was that no one, including Holmes, knew how to make it run as fast as it should. This very expensive piece of equipment was used to handle more requests simultaneously, but making each request run faster was beyond the skillset of anyone working on it. That was one of the drivers for Holmes signing up for the course in the first place – a recognition that these skills were important, useful – and in high demand.

 

Thorough understanding

The principal aim of the MSc, Henty says, is to produce students with a thorough understanding in computational science, and in using computers to do the enormous simulations needed by communities like astrophysics.

To date, many scientific disciplines have tended to create their own, community-developed and managed pieces of code. These do work reasonably well, but without a thorough understanding of large-scale computing they are inevitably not as efficient as they could be. By bringing the communities together, Henty is keen to help both the science and computing communities to do the best work possible.

“We’re very much taking the view of the user in our course. There are other courses available that take a more computer science view, and those are useful if you want to design a new parallel computer language, or a new parallel processor. But I use the analogy of a Formula One car – we’re teaching the drivers what they need to know to drive the car and drive it fast. You probably couldn’t build one yourself, but you know what’s important and how to make the best of it,” he says.

Hardy agrees, saying that a good grounding in computer science will help any scientist doing simulation work. “Their interest is really in the science. And while I would argue that they don’t need to know a huge amount about what is essentially the ‘plumbing’ – to some extent you can consider this as ‘black boxes’ – it’s also true that if you really want to get the best performance you can, you do need to understand at least the basics of what is going on.”

 

Better control

Holmes says that the knowledge he has gained from the MSc course has changed his whole understanding of coding. While he used to work with modern, high-level languages like C#, on starting the course he found himself going back “20 or 30 years, to C.”

Older, lower-level languages like C “address the computer in the way the computer wants to be addressed,” Holmes says, and give the coder much more control over how each aspect of the code is handled.

Many younger programmers never learn these older languages, and Holmes is now persuaded that they really should. Learning C was “hideously complicated”, he says, but “I think if it was introduced in the right way, at the right time, when [people] are learning to code, it gives you an understanding that you don’t get from the modern, high-level languages. You understand what the high-level is doing for you, and have a better way of judging if the implementation is good or bad.”

 

Bringing industry ways to academia

Holmes is also keen to bring some of his industry knowledge to the word of HPC. His PhD looks at the potential use of higher-level languages in High Performance Computing, and he says that the pragmatism of industry could also be brought to bear on the way code is written.

“A lot of High Performance Computing codes work really fast on the computer, but are really difficult to maintain. The code is written so that the computer understands it really well, but it’s actually hideous for a human being to understand it. And it doesn’t need to be like that – I’ve seen really good examples of code in industry where it’s been designed well so that it is easy to follow for a human being, but it also runs quickly on the computer.”

 

Spreading the word

Henty’s is keen to spread understanding of HPC more broadly than just the MSc students. A good ‘basic common knowledge’ of what can be done with it is useful for all sorts of people, he says. Just as the average person in the street would know that a car is very unlikely to manage 200 miles to the gallon, he’d like the majority of scientists and programmers to know the basics of HPC.

“If someone says ‘My machine has 16 cores’, what does that mean? If you’re going to run programme X, would you expect it to take a second? A minute? An hour? Is it likely to benefit from parallel techniques? Just some broad-brush understanding can make a big difference.”

Henty and his team also undertake some public outreach work, giving lectures at science festivals and trying to increase the general public’s understanding of what they do. As he says, astrophysicists have done a good job – “they have nice pictures!” – and most people have a vague understanding that ‘cool’ things are happening at CERN – but do they understand the need for supercomputers, and the fundamental role they play in almost all physical science research these days?

“We need to get across to people that computing is fundamental to modern science, to explain the money that’s spent on these huge supercomputers. If people begin to understand what the computing facilities are doing, they will be more supportive of the money that is spent – and may have a greater understanding of the scientific discoveries, in areas like climate change, that come out of the simulations being run", he says.

 

 

Gillian Law

http://www.techliterate.co.uk

© Gillian Law

Comment on this article

 

PlanetHPC,  University of Edinburgh | James Clerk Maxwell Building | Mayfield Road | Edinburgh | EH9 3JZ

FP7 Logo
Web Design Edinburgh by Arcas