AMD’s AI Chip Event: Everything Revealed in 8 Minutes

AMD's AI Chip Event: Everything Revealed in 8 Minutes

Hi, this is Wayne again with a topic “AMD’s AI Chip Event: Everything Revealed in 8 Minutes”.
[ Applause ] good morning, everyone welcome to all of you who are joining us here in Silicon Valley and to everyone who’s joining us online from around the world. So that’s why I’m so excited today to launch our Instinct Mi 300X. It’S the highest performance accelerator in the world for generative, AI Mi 300X is actually built on our new cdna 3 data center architecture and it’s optimized for performance and power efficiency. Cdna A3 has a lot of new features.

It combines a new compute engine. It supports sparity. The latest data formats, including fp8 it has industry-leading memory capacity and bandwidth and we’re going to talk a lot about memory uh today, uh and it’s built on the most advanced process, Technologies and 3D packaging. Now, let’s talk about some of the performance and why it’s so so great um for generative, AI memory capacity and bandwidth are really important for performance. If you look at m300x, we made a very conscious decision to add more flexibility, more memory capacity and more bandwidth and what that translates to is 2.4 times more memory, capacity and 1.6 times more memory bandwidth than the competition. Now, when you run things like lower Precision, data types that are widely used in llms, the new cdn3 compute units and memory density actually enable Mi 300X to deliver 1.3 times more Tera flops of fp8 and fp16 performance than the competition. And if you take a look at how we put it together, it’s actually pretty amazing uh.

We start with four IO die in the base layer and what we have on the io dies are 256 megabytes of infinity cache and all of the nextg io that you need uh things like 128 Channel hbm3 interfaces, pcie Gen 5 support our fourth gen Infinity fabric. That connects multiple mi30 X’s so that we get 896 gigabytes per second and then we stack eight cdna 3 accelerator, chiplets or xcds on top of the IOD, and that’s where we deliver 1.3 pedop flops of fp16 and 2.6 pedop flops of fp8 performance. And then we connect these 34 compute units with dense through silicon, vas or tsvs, and that supports up to 17 terabytes per second of bandwidth and, of course, to take advantage of all of this compute. We connect eight stacks of hbm3 for a total of 192 GB of memory at 5.3, terabytes per second of bandwidth.

That’S a lot of stuff on that show. What you see here is eight Mi, 300X um, gbus and they’re connected by our high performance and infinity Fabric. In an ocp compliant design, now what makes that special, so this board actually drops right into any ocp compliant design, which is the majority of AI systems today, and we did this for a very deliberate reason. We want to make this as easy as possible for customers to adopt, so you can take out your other board and put in the Mi 300X Instinct platform, and if you take a look at the specifications um, we actually support all of the same connectivity and networking Capabilities of our competition, so PCI Gen 5 support for 400 gig ethernet um that 896 gbt per second of total system bandwidth, but all of that is with 2. four times more memory and 1.3 times more compute server than the competition. So, that’s really why we call it the most powerful geni system in the world.

AMD's AI Chip Event: Everything Revealed in 8 Minutes

We architect at Rockham to be modular and open source to enable very broad user accessibility and Rapid contribution by the open source. Community and AI Community open source and the ecosystem are really integral to our software strategy and, in fact, really open is, is integral to our overall strategy. This contrast with Cuda, which is proprietary and closed now, the open- source Community. Everybody knows moves at the speed of light in deploying and proliferating, new algorithms models tools and performance enhancements, and we are definitely seeing the benefits of that in the tremendous ecosystem momentum that we’ve established.

AMD's AI Chip Event: Everything Revealed in 8 Minutes

So I’m I’m really super excited that we’ll be shipping Rockham 6. Later this month, I’m really proud of what the team has done. With this really big release. Rockham 6 has been op for Gen, particularly large language models has powerful new features, Library, optimizations, expanded ecosystem support and increases performance by factors it really delivers for AI developers. Rock 6 supports fp16 bf16 and the new fp8 data pipes for higher performance, while reducing both memory and bandwidth needs. We’Ve Incorporated Advanced graph and kernel Ops, ations and optimize libraries for Approved efficiency.

AMD's AI Chip Event: Everything Revealed in 8 Minutes

We’Re shipping state-of-the-art attention algs like flash attention, 2 page attention which are critical for performing llms and other models. In 2021 we delivered the Mi 250 introducing third generation Infiniti architecture. It connected an epic CPU to the Mi 250 GPU through a high-speed bus Infinity fabric that allowed the CPU and the GPU to share a coherent memory, space and easily trade data back and forth, simplifying programming and speeding up processing, but today we’re taking that concept. One step further really to its logical conclusion, with the fourth generation Infinity architecture bringing the CPU and the GPU together into one package, sharing a unified pool of memory.

This is an APU, an accelerated processing unit and I’m very proud to say that the industry’s First Data Center Apu for AI and HPC, the Mi 300a began volume production earlier this quarter and is now being built into what we expect to be the world’s highest performing System and let’s talk about that performance, 61, teraflops of double Precision, floating point: fp3, 64, 122 teraflops, a single Precision, combined with that 128 GB of hpm3 memory at 5.3, tabes, a second of bandwidth. The capabilities of the Mi 300a are impressive and they’re impressive too. When you compare it to the alternative, when you look at the competition, Mi 300a has 1.6 times the memory capacity and bandwidth of hopper for low Precision operations like fp16. The two are at parity in terms of computational performance, but where, where Precision is needed, Mi 300a delivers 1.8 times the double and single precision fp64 and fp32 floating Point performance. So today, I’m very happy to say that we’re launching our Hawk Point – ryzen 8040 series, mobile processors and thank you – haulo, combines all of our industry-leading performance in battery life and it increases AI tops by 60 % compared to the previous generation. So if you just take a look at some of the performance metrics for the ryzen 8040 series, if you look at the top of the stack so ryzen 98945, it’s actually significantly faster than the competition in many areas, delivering more performance for multi-threaded applications, 1.8x higher frame Rates for games and 1.4x faster performance across content creation applications a very, very special thank you to all of our partners who joined us today and thank you all for joining us.

.