Interview with Alfred Geiger, T-Systems

Dr Alfred Geiger is Head of Solutions & Innovations Scientific & Technical ICT at T-Systems. His team of over 20 experts is pioneering the use of HPC systems in advanced industrial applications. He is responsible for T-Systems’ corporate HPC strategy. PlanetHPC asked him about the work of his group

In which application areas do you use HPC?

We offer an HPC service to our customers, so they define the application areas we are involved in. The applications run on systems that we own and manage. Our business model is to offer customers an agreed level of performance for a particular application at an agreed price. We can offer this level of performance by using appropriate hardware or by optimising the performance of the code or through a combination of the two. Because most of the codes we deal with have been developed in-house, we have easy access to the sources and can put significant effort into optimising them. We deal with about 10 codes, mostly from the aerospace and energy sectors. A few of the codes are from ISVs, but we treat these as “black boxes”.

Why do you use HPC (e.g. numerical simulation is cheaper or more feasible that physical experimentation)?

Generally our clients use HPC because it’s cheaper than physical simulation and, in some cases, because physical simulation isn’t practical. This is particularly so in the nuclear industry. In the USA, in particular, the nuclear industry has provided a significant stimulus for HPC.

What are the cost benefits to your business of using HPC?

For our customers there are significant cost benefits of using HPC. Of course, this is subject to market pressures. Are numerical simulations always better value than physical experiments? Is the service we offer good value? Can we reduce the cost of our services through optimisation of codes and through the innovative use of hardware? These types of demands are common to any business providing a service.

Which HPC systems do you use in your business? What is their capital value? How scalable are your applications?

Our in-house machines are based on x86 clusters connected using Infiniband. Currently we own about 20,000 cores in clusters with a value of around €20 million. Of course, this is very much a moving target. For some applications, we buy time on vector systems. Scalability is a major issue for us. It is particularly important for CFD codes where we deal with highly irregular meshes. The important issues here are network bandwidth and latency and packet collisions as data move through the network. We are putting in a lot of work to address these as we optimise codes.

For your business does the Cloud offer a viable alternative to owning and managing your own systems?

The Cloud has several critical issues. Virtualisation of resources such as processing, network bandwidth and storage are not sufficiently well advanced for us to take advantage of the price/performance the Cloud offers. We cannot guarantee that an application optimised for one system will run efficiently on the Cloud and this is a major obstacle. In some sense, we try to provide a Cloud service for our customers, but, of course, this is much more restricted than the offerings from Amazon and Google, because we are not dealing with such large numbers of machines and customers.

What are the challenges you see in the development of your HPC capability (e.g. scalability of applications, power consumption, cost of systems)?

As has been mentioned several times now, scalability is critical. Because processors are not clocking any faster, more performance now means more cores and this immediately raises scalability issues. Power consumption is not yet critical, but it certainly is important. It affects the cost and profitability of our services, so it’s more about optimisation. People are now locating data-centres close to sources of cheap electricity from hydro and biomass installations. We are not yet involved in developments for the very large so-called exascale systems and the inherent challenges these will present.

Are new languages and programming paradigms needed particularly as we move toward exascale systems?

We use MPI and OpenMP, but they have a high administrative overhead on large systems. Because of this we are looking at PGAS (Partitioned Global Address Space) languages with Fraunhofer who have FVM (Fraunhofer Virtual Machine). We have to do something now to address the needs of our customers and can’t wait for standards. We have to take a pragmatic approach because the success of our business depends on being able to satisfy end-user demands in the short term. There are real issues here related to memory models, how these will scale and what performance will result across a range of applications.

What are your views on GPGPU computing?

Energy considerations may bring GPGPUs into mainstream computing. This is all part of a generalised attached processor model which includes FPGAs and the Intel Larrabee concept. In this respect, the new, experimental 48-core Intel processor is very interesting. There’s a lot going on in this area at the moment and how it will end up is not yet clear.

What are your views on reconfigurable (i.e. FPGA-based) computing particularly in light of developments at Convey?

FPGAs have significant potential in certain areas, but they also have significant limitations. They are still a decade away from being a general-purpose device. For example, the program-design cycle with the time consuming place and route makes these devices very inflexible.

Describe the HPC systems you would like to have available in 3, 5 and 10 years' time.

It’s not so much what we would like, it’s more realistic to say what we would expect. I expect systems with 100,000 cores or more to become commonplace over the next decade. Of course, we will have to find ways to program and manage such systems and to develop applications which will run on such massively parallel systems.

In which HPC research areas would EU-funded programmes benefit your business?

The major issues are scalability and programmability. The EU should encourage developments which lead to standards. The US is too much driven by hardware vendors, rather than by end-users. Consequently there is no group to moderate the roadmaps these vendors are setting out. Systems vendors have become divorced from language developers. Somehow the two need to be brought together. The goal has to be scalability and virtualisation and this will not happen quickly if things are left to the hardware people. We have discussed several times the issue of scalability. Virtualisation is important because it will enable developers to protect their software investments rather than be forced into developments with a short-term life. Only when there are clear standards for virtualisation will it be possible to engage with ISVs to address licensing issue which have become a major obstacle to the more flexible use of third-party codes.

Our thanks to Dr Geiger for his interesting and profound comments.


Comment on this article:

 

PlanetHPC,  University of Edinburgh | James Clerk Maxwell Building | Mayfield Road | Edinburgh | EH9 3JZ

FP7 Logo
Web Design Edinburgh by Arcas