AMD: Increasing compute efficiency by 30x is nice, but what about 100x?

Stozzy

Staff
Big quote: The high energy demands for GenAI and other LLMs are accelerating the need for more power-efficient systems. AMD's CEO Lisa Su is confident that the company is on the right path to increase data center power efficiency by 100x in the next three years.

Everywhere you look, there is a new AI service to improve your personal or work life. Google Search now incorporates its Gemini AI for summarizing search results, but this comes at a cost of tenfold energy increase (with poor results) when compared to non-AI search. The global popularity of generative AI has accelerated the need for rapid expansion of data centers and power demands.

Goldman Sachs estimates that data center power requirements will grow by 160% by 2030. This is a huge problem for countries like the US and Europe, where the average age of regional power grids is 50 years and 40 years, respectively. In 2022, data centers consumed 3% US power, and projections suggest this will increase to 8% by 2030. "There's no way to get there without a breakthrough," says OpenAI co-founder Sam Altman.

AMD CEO Lisa Su discussed past successes and future plans to improve compute node efficiency at the ITF World 2024 conference. Back in 2014, AMD committed to make their mobile CPUs 25% more efficient by 2020 (25x20). They exceeded that goal by achieving 31.7% efficiency.

In 2021, AMD saw the writing on the wall regarding the exponential growth of AI workloads and the power requirements to operate these complex systems. To help mitigate the power demand, AMD established a 30x25 goal for compute node efficiency by focusing on several key areas.

It starts with improvements in process node and packaging, which are the fundamental building blocks of CPU/GPU manufacturing. By utilizing 3nm Gate-All-Around (GAA) transistors, an evolution of the FinFET 3D transistors, power efficiency and performance-per-watt will be improved. Additionally, the continual refinement of packaging techniques (e.g., chiplets, 3D stacking) gives AMD the flexibility to swap various components into a single package.

The next area of focus is AI-optimized accelerated hardware architectures. These are known as Neural Processing Units (NPUs) which have been in mobile SoCs like the Snapdragon 8 Gen series for years now. Earlier this year, AMD released the Ryzen 8700G which was the first desktop processor with a built-in AI engine. This dedicated hardware allows the CPU to offload AI compute-intensive tasks to the NPU, improving efficiency and lowering power consumption.

The final pillars of this 30x25 goal are system-level tuning and software/hardware co-design. System-level tuning is another branch of the advanced packaging initiative, focused on reducing the energy needed to move data physically within these computer clusters. Software/hardware co-design aims to improve AI algorithms to work more effectively with next-generation NPUs.

Lisa Su is confident that AMD is on track to meet the 30x25 goal but sees a pathway to achieve a 100x improvement by 2027. AMD and other industry leaders are all contributing to address power needs for our AI-enhanced lives in this new era of computing.

Permalink to story:

 
Given that current EPYC draws 350W now, not sure how in less than 6 years they will have the same compute power with only 3.5W. The nodes will get smaller but not sure on what they count on for this increase.
 
Given that current EPYC draws 350W now, not sure how in less than 6 years they will have the same compute power with only 3.5W. The nodes will get smaller but not sure on what they count on for this increase.
Wrong way of looking at it. Nvidia claims that Blackwell can be 30x more efficient than H100, so how can it be if the architectural gain is a mere 20%? By using lower precision formats, reducing system bottlenecks and reshaping the way processing is done.

AMD can do the same, look at how much more AVX512 does in certain workloads using the same amount of power.
 
Wrong way of looking at it. Nvidia claims that Blackwell can be 30x more efficient than H100, so how can it be if the architectural gain is a mere 20%? By using lower precision formats, reducing system bottlenecks and reshaping the way processing is done.

AMD can do the same, look at how much more AVX512 does in certain workloads using the same amount of power.
Here is what Gemini responded to the question "what means 100x efficiency increase"

A 100x efficiency increase means something can accomplish the same task in 1/100th of the time, use 1/100th of the resources, or produce 100 times the output for the same amount of input. It's a very significant improvement.

Here are some examples:

  • Energy: A 100x efficiency increase in a computer chip could allow it to perform the same calculations while using 1% of the power it currently does. (This is what AMD is aiming for by 2027 according to their CEO [1])
 
It’s funny that changing all of our cars to electronic was fine for the grid (it wasn’t), but more data centers is too much for it.
 
I read the title with disbelief. Then saw that their 25x20 goal for mobile CPUs was to make them 25% more efficient by 2020. That would mean their 30x25 goal for compute efficiency is to make their data centre CPUs 30% more efficient by 2025. Not 30x more efficient or even 100x more efficient.
 
I read the title with disbelief. Then saw that their 25x20 goal for mobile CPUs was to make them 25% more efficient by 2020. That would mean their 30x25 goal for compute efficiency is to make their data centre CPUs 30% more efficient by 2025. Not 30x more efficient or even 100x more efficient.

Yes there is a problem with the writer's numbers. But it's the other way around. He should have written X instead of %. See the amd referenced link he cited.
 
If they do increase compute efficiency, Windows will become more bloated and slower to offset it. This is how it has been for decades.

Actually now windows boots in seconds compared to minutes 15 years ago due to many improvements. And laptop cpus are much more energy efficient at the same time so they achieve 15hours + of battery life vs just a few hours.
 
Actually now windows boots in seconds compared to minutes 15 years ago due to many improvements. And laptop cpus are much more energy efficient at the same time so they achieve 15hours + of battery life vs just a few hours.
Boot time =/= bloat. Getting to the desktop happens quickly, but then you sit there for 10-15 seconds as windows loads all it's bloatware.

But I hate having to point this out, he didn't say anything about boot times. He was talking about bloat, which windows is full of these days. If I'm searching for issues on windows I can't look at anything that's more than 3 or 4 years old due to how much they've changed it. Windows 10 is a completely different OS from launch and I don't play on ever using 11, especially considering that 12 is right around the corner.
 
Actually now windows boots in seconds compared to minutes 15 years ago due to many improvements. And laptop cpus are much more energy efficient at the same time so they achieve 15hours + of battery life vs just a few hours.
High speed SSD's are the main reason why Windows boots faster. I've run many comparisons in my shop. Even a low cost SATA SSD transfers data three times faster than a typical 1TB HDD. NVMe driver, even gen 3, are nine times faster than a HDD. It's not Windows running faster. It's hardware.
 
Given that current EPYC draws 350W now, not sure how in less than 6 years they will have the same compute power with only 3.5W. The nodes will get smaller but not sure on what they count on for this increase.

I think you are confused on 2 levels. First of all, AMDs Ryzen CPUs were designed to run about 2x more efficiently than Intel cores, and they are doing that already, today, and Intel is failing as a microprocessor company, have you noticed it? AMD has 2 types of Ryzen cores now - "normal" ones and "efficiency" cores used by datacenter CPUs and they are also starting to add some "c"-suffix efficiency cores to laptop CPUs. I believe that some combinatorial circuits like the multipliers probably use simpler and deeper / slower circuit, maybe some other circuits are not kept pre-charged for certain operations, and the efficiency cores are only 2/3rds the size of the normal cores, otherwise they are completely identical and use a lot less (maybe 30% less) power.

Secondly, this is an article about AI compute units, it's not about Ryzen or Epyc CPUs. AMD/ATI's AI compute units are not nearly as efficient as NVidias, I'd estimate they are somewhere between half and 2/3rds as efficient. They absolutely could take some of the ideas they've learned making Ryzen efficiency cores and put them to good use on their MI300 and similar AI cores, which are doing much more floating point but often at single-precision (32-bits), half-precision (16 bits), or even smaller levels of floating point precision.

Some of the efficiency improvements come naturally with each new generation of VLSI, it's why Apple cell phones are using the very latest 3nm process node at TSMC. There is probably a 75% power saving left before Moore's Law is exhausted completely (that's my own SWAG not supported by any facts). Add a 60% power savings using more efficient communications busses (signal power often has to be amplified by 1000x to send data between chiplets let alone between CPU and RAM, or between CPUs, or to send it over the data center network). That, plus lower power combinatorial circuits (multipliers) in the NPUs would result in savings that is multiplied by 15,000x or 20,000x depending upon the number of AI compute units, and that would add up to a 90% power savings overall.
 
Last edited:
I think you are confused on 2 levels. First of all, AMDs Ryzen CPUs were designed to run about 2x more efficiently than Intel cores, and they are doing that already, today, and Intel is failing as a microprocessor company, have you noticed it? AMD has 2 types of Ryzen cores now - "normal" ones and "efficiency" cores used by datacenter CPUs and they are also starting to add some "c"-suffix efficiency cores to laptop CPUs. I believe that some combinatorial circuits like the multipliers probably use simpler and deeper / slower circuit, maybe some other circuits are not kept pre-charged for certain operations, and the efficiency cores are only 2/3rds the size of the normal cores, otherwise they are completely identical and use a lot less (maybe 30% less) power.

AMD C-cores are more area saving cores than efficiency cores. Most of that area reduction comes from fact that C-cores only need to sustain around 3 GHz clock speed instead nearly 6 GHz. That allows re-organizing different units inside CPU, shorter pipeline and everything that exists just because 6 GHz is target clock speed.

Unlike Intel that made totally different cut down core. Also Intel's hybrid architecture was never supposed to exist and Thread director is perhaps stupidest thing ever seen on desktop CPU.
 
Back