Zetascale supers might be nuclear unless chips go on a diet • The Register | Raider Tech

Inside the subsequent 10 years, the world’s strongest supercomputers will not simply simulate nuclear reactions, they could effectively run on them. That’s, if we do not take drastic steps to enhance the effectivity of our compute architectures, AMD CEO Lisa Su stated throughout her keynote on the Worldwide Strong-State Circuits Convention this week.

The basis of the issue is that whereas firms like AMD and Intel have managed to roughly double the efficiency of their CPUs and GPUs each 2.4 years, and firms like HPE, Atos, and Lenovo have achieved comparable positive factors roughly each 1.2 years on the system stage, Su says energy effectivity is lagging behind.

Citing the efficiency and effectivity figures gleaned from the highest supercomputers, AMD says gigaflops per watt is doubling roughly each 2.2 years, about half the tempo the methods are rising.

Assuming this development continues unchanged, AMD estimates that we’ll obtain a zettaflop-class supercomputer in about 10-years give or take. For reference, the US powered on its first exascale supercomputer, Oak Ridge Nationwide Laboratory’s Frontier system, final yr. A supercomputer able to a zettaflop of FP64 efficiency can be 1,000x extra highly effective.

To AMD’s credit score, its estimate for once we’ll cross the zettaflop barrier is no less than a bit extra conservative than Intel’s slightly hyperbolic claims that it’d cross that threshold by 2027. What’s extra, the AMD CEO says such a machine will not precisely be sensible except compute architectures get drastically extra environment friendly and shortly.

If issues proceed on their present trajectory, AMD estimates {that a} zettaflop-class supercomputer would want someplace within the neighborhood of 500 megawatts of energy. “That is most likely an excessive amount of,” Su admits. “That is on the dimensions of what a nuclear energy plant can be.”

“This flattening of effectivity turns into the most important problem that we have now to unravel, each from a know-how standpoint in addition to from a sustainability standpoint,” she stated. “Our problem is to determine how over the following decade we take into consideration compute effectivity because the primary precedence.”

Correcting course

A part of the issue dealing with chipmakers is the means they’ve historically relied on to realize generational effectivity positive factors have gotten much less efficient.

Echoing Nvidia’s leather-based jacket aficionado and CEO Jensen Huang, Su admits Moore’s Legislation is slowing down. “It is getting a lot, a lot tougher to get density efficiency in addition to effectivity” out of smaller course of tech.

“As we get into the superior nodes, we nonetheless see enhancements, however these enhancements are at a a lot slower tempo,” she added, referencing efforts to shrink course of tech a lot past 5nm and even 3nm.

However whereas enhancements in course of tech are slowing down, Su argues there are nonetheless alternatives available, and, maybe unsurprisingly, most of them focus on AMD’s chiplet-centric worldview. “The bundle is the brand new motherboard,” she stated.

Over the previous few years, a number of chipmakers have embraced this philosophy. Along with AMD, which arguably popularized the strategy with its Epyc datacenter chips and later introduced the tech to its Intuition GPUs, chipmakers — together with Intel, Apple, and Amazon — are actually using multi-die architectures to fight bottlenecks and speed up workloads.

Chiplets, argues the AMD boss, will permit chipmakers to deal with three of the low hanging fruits in terms of compute effectivity: compute vitality, communications vitality, and reminiscence vitality.

Modular chiplet or tile architectures have quite a few benefits. As an example, they will permit chipmakers to make use of optimum course of tech for every part. AMD makes use of a few of TSMC’s densest course of tech for its CPUs and GPU dies, however usually employs bigger nodes for issues like I/O and analog signaling which do not scale as effectively.

Chiplets additionally assist cut back the quantity of energy required for communications between the parts because the compute, reminiscence, and I/O might be packaged in nearer proximity. And when stacked vertically, as AMD has completed with SRAM on its X-series Epycs and Intel is doing with HBM on its Ponte Vecchio GPUs, the positive factors are even better, the chipmakers declare.

AMD expects superior 3D packaging strategies will yield 50x extra environment friendly communications in comparison with standard off-package reminiscence and I/O.

That is little question why AMD, Intel, and Nvidia have began integrating CPUs, GPUs, and AI accelerators into their next-gen silicon. For instance, AMD’s upcoming MI300 will combine its Zen 4 CPU cores with its CDNA3 GPUs and a boatload of HBM reminiscence. Intel’s Falcon shores platform will observe an identical trajectory. In the meantime, Nvidia’s Grace Hopper superchips, whereas not built-in to the identical diploma, nonetheless co-package an Arm CPU with 512GB of LPDDR5 with a Hopper GPU die and 80GB of HBM.

AMD is not stopping at CPUs, GPUs, or reminiscence both. The corporate has thrown its help behind the Common Chiplet Interconnect Categorical (UCIe) consortium, which is making an attempt to determine requirements for chiplet-to-chiplet communication, so a chiplet from one vendor might be packaged alongside one from one other.

AMD can be is actively working to combine IP from its Xilinx and Pensando acquisitions into new merchandise. Throughout Su’s keynote, she highlighted the potential for co-packaged optical networking, stacked DRAM, and even in-memory compute as potential alternatives to additional enhance energy effectivity.

Is it time to provide AI a crack at HPC?

However whereas there’s alternative to enhance the structure, Su additionally means that it might be time to reevaluate the best way we go about conducting HPC workloads, which have historically relied on high-precision computational simulation utilizing huge datasets.

As a substitute, the AMD CEO makes the case that it might be time to make heavier use of AI and machine studying in HPC. And she or he’s not alone in pondering this. Nvidia and Intel have each been pushing some great benefits of decrease precision compute, notably for machine studying the place buying and selling a number of decimal locations of accuracy can imply the distinction between days and hours for coaching.

Nvidia has arguably been essentially the most egregious, claiming methods able to a number of “AI exaflops.” What they conveniently miss, or bury within the nice print, is the actual fact they’re speaking about FP16, FP8, or Int8 efficiency, not the FP64 calculations usually utilized in most HPC workloads.

“Simply looking on the relative efficiency during the last 10 years, as a lot as we have improved in conventional metrics round SpecInt Charge or flops, the AI flops have improved a lot quicker,” the AMD chief stated. “They’ve improved a lot quicker as a result of we have had all these blended precision capabilities.”

One of many first purposes of AI/ML for HPC might be for what Su refers to as AI surrogate physics fashions. The final precept is that practitioners make use of conventional HPC in a way more focused approach and use machine studying to assist slim the sector and cut back the computational energy required total.

A number of DoE labs are already exploring using AI/ML to enhance every little thing from local weather fashions and drug discovery to simulated nuclear weapons testing and upkeep.

“It is early. There’s numerous work to be completed on the algorithms right here, and there is numerous work to be completed in methods to partition the issues,” Su stated. ®


Zetascale supers might be nuclear unless chips go on a diet • The Register