A research group from Brazil have been working with NVIDIA using the ISIS facility in preliminary studies of neutron radiation on their GPUs. The results are aiding the development of tailored hardening strategies to increase the reliability of these devices. The development of the CHIPIR test facility at ISIS will open new doors for this scheme of research in the coming years.
The problem
The collision of cosmic rays with atoms of the earth’s atmosphere generates showers of particles including high energy neutrons. These neutrons can disrupt the normal operation of electronics such a graphics processing units (GPUs), giving multiple errors in the output, leading to a large number of failures. GPUs are the life force of all complex computer systems – without them information cannot be processed fast enough. These tiny accelerator chips can process data in a very efficient and fast way by manipulating memory. The technology is so efficient nowadays, that their applications have extended graphics and gaming and they are now extensively used in supercomputers, air traffic flow analysis and medical image processing. Neutron radiation experiment results show that multiple errors are detected in the output in more than 50% of cases. With the use of GPUs in some safety critical applications the effect of neutron radiation on the core reliability is therefore a major concern.
The solution
The solution lies in the design of novel hardening strategies with error correcting capabilities. Because of the presence of multiple errors the currently available hardening strategies may become ineffective or inefficient, however, by analysing radiation-induced error distributions it’s possible to optimise and experimentally tune new software based hardening strategies for GPUs.
The role of ISIS.
NVIDIA Graphics Proccessing Units being tested with neutron radiation on Vesuvio
ISIS facilitates the testing of electronic components such as GPUs in order to check their reliability. A team from the Federal University of Rio Grande do Sul, Brazil led by Dr Paolo Rech conducted simulations of processes directly run on supercomputers like Titan. They used the VESUVIO instrument at ISIS to emulate the atmospheric neutron radiation on these GPUs in order to see the effects of the radiation and observe the error rate. Using this information they have come up with an optimised hardening technique for the GPUs and further tests on the devices incorporating this new strategy have proved promising.
“Modern GPUs are cutting edge processors built with novel technologies and so, may be very prone to experience radiation induced failures. Such graphics processing units make up the backbone of a supercomputer like TITAN which has more than 18,000 GPUs running in parallel. GPUs are also starting to be used in satellites. Clearly radiation is an issue for such devices because they are big novel devices with powerful memory. Vesuvio is great for doing radiation tests on electronics as it gives a very precise neutron count with a very short time slot. The new Chipir instrument currently being built at ISIS will further advance our research as it offers an improved set up for our experiments. CHIPIR will be able to mimic the high energy spectrum of neutrons and the microbeam will mean we can irradiate just a small part of the device which will be fantastic for us as this is not currently possible.” Paolo Rech, Federal University of Rio Grande do Sul, Brazil.
Felice Laake
Research date: January 2014
Further Information
For further information please contact Dr Paolo Rech