AI’s integration into daily life and business has skyrocketed in the past 18 months, transitioning from a luxury to a necessity for both large and small enterprises. This surge has pushed hyperscalers to invest heavily in compute resources, propelling them into the realm of exascale computing. But what does this really mean and what does it even look like?
Graphics processing unit (GPU)
One approach we have seen over the last few years is that GPUs (high-performance chipsets optimized for AI) are more pervasive throughout traditional hyperscale architectures, offering serviceability to those simply wanting GPU resource to do their own training, customization, or inference of privately hosted models. Whilst this imposes requirements for hyperscalers to service those systems with vastly improved power capacity, higher speed networking for interconnecting systems, and high-speed storage, this is still a Swiss army knife approach to ensuring systems can handle workloads, including AI. This is certainly not optimal for very large or foundational AI models.
However, hyperscalers have been building something more analogous with laser scalpels than Swiss army knives; AI supercomputers designed from the ground up specifically for handling AI tasks at scale.
Exascale computing
Up until recently, hyperscalers consistently delivered Peta scale compute systems. It was only in late 2022 when IBM announced the first public Exascale compute system. This was the IBM Frontier system, capable of delivering 1.1 exaflop performance (an exaflop is a staggering quintillion mathematical operations per second—a quintillion is a 1 with 18 zeros after it). The following year, in May 2023, Google announced the A3 supercomputer, offering even greater performance capacity of up to 26 exaflops. Not only was this a feat in terms of pure horsepower, but also in Google’s efficiency in delivering it.
Traditional architectures are rife with bottlenecks. For AI, those bottlenecks include the CPUs themselves, as well as traditional memory, storage, and networks. As an example, loading of training data and synchronization of multiple GPUs or servers is traditionally a CPU bound task with significant overheads on the CPU for task synchronisation or data pre-processing before data is even made available to GPUs. This can be further impacted by insufficient network performance between nodes, which can all lead to GPU idling, slowing the process. To navigate these bottlenecks, Google has developed what it calls an IPU (infrastructure processing unit) and Nvidia has announced similar technology using NVlink to provide a high-speed backplane between server nodes. This enables direct connection of high-performance GPUs at scale, using optical system to system cabled interconnects. With this, not only are they seeing massive efficiency in performance delivery, but they are able to treat thousands of nodes as a single homogenous pool of AI compute capacity, enabling customers to scale workloads vertically or horizontally at truly massive scale depending on the application.
Power savings
These systems are power-hungry, which presents even more of a challenge for hyperscalers to deliver on their sustainability goals. However, many argue that these systems will be able to deliver AI training so much quicker than traditional architectures, that some power savings would come in those of time savings to train these large AI models. Couple this with other development for AI acceleration hardware such as LLM compute engines (from the likes of Groq computing and Intel’s Gaudi AI acceleration hardware), squeezing increasing performance out of silicon at smaller and smaller power footprints.
Removing bottlenecks
For the short to mid-term future of AI’s demands on infrastructure for hyperscalers, it’s about removing bottlenecks for parallel processing of exascale compute systems with increasingly high performance and high density of GPUs and/or other AI accelerators, such as field-programmable gate arrays (FPGAs) or Tensor processing units (TPUs), depending on the need. Beyond that, we still await the consumer ready packaging of technologies such as quantum computing, which although will come with its own disruption, will massively transform simulation and optimization tasks. In parallel, we have the development of neuromorphic computing, which takes its inspiration from the power efficiency of the human brain, mirroring brain function in deep learning software systems—the same will be emulated in hardware to potentially transform the power needed to deliver AI at scale yet again.
For now, hyperscalers must walk the line of capitalizing on the AI gold rush, whilst still striving to hit sustainability goals. The question is, are those goals mutually exclusive or can they co-exist?
Read more about how hyperscalers are preparing for the rise in AI demand: The Coming AI Revolution and Hyperscale (aflhyperscale.com)