Meteor Lake – Can Intel leapfrog AMD?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Meteor Lake is shaping up to be Intel's most important product launch in recent memory, with a big impact on the future of Intels design direction. There are many ways to improve a CPU. Most commonly we talk about changes to the cores inside the CPU, like improved Floating Point or Integer units, higher clock speeds, more cache or just more CPU cores in general. All of these aspects are what I would classify as "architectural" changes. And while Meteor Lake does tackle some of these areas, it's true focus and what makes it so special aren't its architectural changes, but changes that are a level above - or to be more precise - below that: Meteor Lake will completely alter the very foundation of how Intel designs CPUs. With Meteor Lake Intel is finally entering the chiplet era, which was started by AMD back in 2019 with Zen 2. But Intel isn't copying AMD, nor are they trying to play catch up. Meteor Lake is a bold attempt to switch from a 100% monolithic architecture like Raptor Lake to a fully 3D stacked chiplet design, which Intel calls a tile based architecture. And as we will find out later in this video, the term "tile" is actually very fitting, as Intel not only uses different technology but is also executing a very different product strategy compared to AMD. Let's see if Intel is actually able to leapfrog AMD and start our deep-dive into the technology of Meteor Lake. In order to cover all aspects of Meteor Lake this video is structured in three parts. First we will talk about the foundation for Intel's tile based architecture and discuss how interconnect and packaging technology enable Intel's vision of a modular future. Then we will take a look at what's hidden inside the tiles that combined create Meteor Lake and discover new features, like additional CPU cores, where you would never expect them to be. And finally we will compare Intels Tile approach with AMDs chiplet architecture and discuss the difference in strategies. The switch from a monolithic to a fully disaggregated design comes with many challenges, a major one is how to physically connect all the parts that were previously located on a single piece of silicon. The simplest method is to use a available interconnect protocol, for example PCI-Express, and so to speak just run copper wires through the substrate to transfer data between the physically seperated chips. This method doesn't require advanced packaging, it's relatively easy to design and implement and it's low cost. The downsides are low bandwidth, high latency and high energy costs, as transporting data off-silicon is very energy intensive. Then there is 3D stacking, where a large silicon interposer is placed onto the packaging substrate and the chiplets are placed on top of the silicon interposer. All data and power connections have to run through the silicon interposer, which on one side increases design and packaging complexity, requiring the use of so called trough-silicon-vias, but on the other side it offers the higest bandwidth, lowest latency and lowest energy cost, because transfering data through silicon is a lot more efficient. It's a trade-off between complexity and thus cost on one side and great physical properties, like fast data transfer and energy efficiency on the other side. AMD's chiplet architecture is using the less complex approach. The individual chiplets are clearly physically separated and are connected via the substrate using AMD's Infinity Fabric, which is a serial-to-paralell interconnect based on PCI-Express. It's faster, more efficient and specialized for AMD's usecase, but thinking of AMD's chiplets as being connected via PCI-Express isn't too far from reality, at least in terms of understanding the concept. This picture of a Zen 2 PCB shows all the data paths AMD is using to connect the individual chiplets. Its a highly scalable and cost effective design, but we all know about it's drawbacks in form of latency and bandwidth penalties for chiplet-to-chiplet communication. Thats why AMD CPUs with a single chiplet are usually still better when it comes to gaming. If we compare AMD's approach to pictures of Intel's Meteor Lake we can already see a clear difference. Instead of individual and physcially separated chiplets, it almost loooks like a monolithic chip, if it wasnt for the thin lines that reveal that Meteor Lake is in fact four different silicon chips sitting on a large interposer. And just by looking at it, the name "tiles" does start to make sense, as it actually looks a lot like tiles. Intel is using its Foveros technology, which is a die-to-die interconnect method using so called micro-bumps and through-silicon-vias. During the packaging process, in a first step the individual tiles are placed and bonded to the silicon interposer using a method called chip-to-wafer-bonding. Next, the bottom of the silicon interposer is thinned until the through-silicon-vias are revealed. Then the silicon interposer, with tiles on top and TSV connection points on the bottom, is placed onto the package substrate. This order is choosen because the complex advanced packaging steps are manufactured first, in this example bonding the individual tiles to the interposer. Only once this step is successfully completed are the TSVs revealed and placed onto the substrate. The version of Foveros Intel is using for Meteor Lake achieves a 36 micrometer microbump pitch and the die-to-die interconnect operates at about .15 to .3 picojoule per bit, which is up to an order of magnitude lower than AMD's Infinity Fabric at about 1.5 picojoule per bit. This big improvement in data transfer energy efficiency is the most important factor for Intel. In fact, the potential bandwidth and latency benefits of using a interposer are not really Intel's focus with this design, it's all about the energy efficiency, that's where Foveros truely shines. With Foveros, Intel has created a modern 3D stacking method that enables flexible multi-tile chips without sacrificing efficiency when transporting data between each individual tile. Now that we know what interconnect and packaging technology Intel is using to enable Meteor Lake, let's take a look at the overall layout of the chip and then dive into the silicon level, to undercover whats hidden inside the tiles. With a quick top-down analysis we can identify four different tiles: one really large one, one medium sized one, a slim one and really tiny one. My first impression was that the large one had to be the CPU tile, with the other three housing GPU, I/O and probably a machine learning accelerator. But as so often, my first impression was completely off. The biggest tile, located in the middle of the chip, is actually the SoC tile, housing functions you would usually find on a chipset. The slim tile to its right is the GPU tile, where the Alchemist based Xe iGPU is located. The medium sized tile to the left is the CPU tile, housing the CPU cores and below that we have a tiny I/O tile, most likely providing Display Port or other physical outputs. And of course we cant forget the interposer, also called base tile, which is located below the tiles sitting on top. Due to the asymmetrical design, a small part of the base tile is visible, right below the I/O tile. And if that layout isn't crazy enough for you, it gets even wilder when you consider the various different process nodes used to produce the individual tiles. The CPU tile is using the new Intel 4 node, previously known as Intel 7nm and the first Intel EUV based process. The GPU tile is based on TSMCs next-gen 3nm process node, N3B to be precise if Semianalysis is correct. The SoC tile utilizes TSMCs N6 process node, the same is most likely true for the I/O tile. The base tile could either be an older Intel node, for example 22FFL, now called Intel 16, but there are also rumors its using a optimized Intel 7 node, which is the current node used for Alder and Raptor Lake. Combined there are four different process nodes being used to produce the five different tiles that make up Meteor Lake, and the majority of them is not from Intel. Just like AMD, Intel is starting to contract TSMC for a large portion of their chips. Now that we have a overview of all tiles, lets take a closer look at each of them individually. I'll start with the what most of you are here for, the CPU tile. As you expect, it contains the CPU cores, in case of this specific Meteor Lake SKU its a 6+8 design, with six large Redwood Cove performance cores and eight small Crestmont efficiency cores, as visible in this die-shot Intel provided. It's a reduction from the 8+16 setup in Raptor Lake, but remember that we are looking at a mobile chip, a potential desktop CPU would come with a larger CPU configuration, made possible by Meteor Lakes flexible tile design. In the beginning of the video I talked about architectural and foundational changes, in the case of Redwood Cove and Crestmont we are in the realm of architectural improvements. Alder Lake combined Golden Cove with Gracemont, Raptor Lake used Raptor Cove and Gracemont, and with Meteor Lake Intel switches to Redwood Cove and Crestmont, both with potential IPC and efficiency improvements. Since the new cores are manufactured on Intel 4, they have to be adapted to the new process node. Semianalysis and Locuza did a great low level die-shot analysis, but found very little changes. Redwood Cove comes with 2 Megabytes of L2 cache, a upgrade already introduced with Raptor Cove. Visible changes to the performance cores are very minimal and most likely related to the re-design required for the new Intel 4 process node. The same is true for the Crestmont efficiency cores, which seem to be a shrink of the Intel 7 based Gracemont. All-in all, while new CPU core names come with the promise of architectural improvements, it looks like the actual changes for Meteor Lakes CPU side are very mininal, aside from the process node shrink. And it does make sense: implementing a new tile based design and combining it with a node shrink at the same time is already a lot of engineering work, it would be very unlike Intel to increase the probability of failure by adding complex architectural changes on top. To recap, Meteor Lake will most likely offer very limited IPC improvements on the CPU side, as its looks like all we are getting is a process node shrink of the Raptor Lake architecture with only little architectural improvements. Something that has a lot more potential to be exciting is the GPU tile, which will utilize Intel's Alchemist architectre and is one of the first chips produced in TSMCs next-gen 3nm node. With up to 192 Execution Units, the Intel Xe iGPU has the potential for a large peformance uplift, especially if TSMCs N3B performs well. Currently, AMD's Phoenix does reign supreme, I'm hopeful Intel can strike back. This version of Meteor Lake isn't equipped with the largest GPU tile and most likely only has somewhere between 64 to 96 Execution Units, which still will be plenty fast. Integrated GPUs could very soon completey replace low-end dedicated GPUs in the laptop space, something that not only saves energy but also allows for thinner and lighter laptop designs. Even tho Alchemist discrete GPUs didn't perform as well as hoped, iGPUs are a whole different story, like AMD's Vega based iGPUs, which performed much better than their desktop counterparts. And with Alchemist, Meteor Lake also inherits Intels amazing de- and en-code unit, supporting all modern codecs, including AV1. The GPU side of Meteor Lake is definitely something to look out for, with much higher performance potential than it's CPU side. The modular natur of Meteor Lake means Intel can prepare a number of GPU tiles with different amounts of GPU cores and thus scale its products, unlike AMD with a fixed amount of GPU cores on their mononlithic mobile chips like Rembrandt and Phoenix. AMD has to watch out. Before we look at the huge SoC tile, which does come with some cool surprises, let's quickly cover the tiny I/O tile. It's only around 10 square millimeters in size and possible houses Display Port and or Tunderbolt ports. It could also be used for the memory controller, tho thats to be seen. A lot of mistery for a chip that small. I'm really interested why Intel choose to outsource some ports in such a small chip, adding packaging complexity. It might has something to do with power efficiency and the ability to basically completely shut of individual tiles, but until Intel releases more information, there's not much to say about it. But there's a lot to say about the absolutely massive SoC tile. Why is it so large? First of all, all modern CPUs are also Systems on a Chip, meaning they provide much more than just CPU cores. AMDs Zen 4, Intels Raptor Lake or Apples M2, they all provide graphics, internal and external I/O, physical ports, management engines and so on. Just take a look at this die-shot of Alder Lake. Yes, the CPU cores do take up a lot, but about 40% of the die are used for other functions. And Alder Lake is a dekstop chip that connects to a chipset located on the motherboard, which enhances it's connectivity options. Meteor Lake, at least on mobile, wont use a external chipset, meaning all connectivity has to be provided by the SoC tile, including build-in WiFi-6E support. Then, just as AMD's Phoenix introduced a dedicated on-die AI-Engine, Meteor Lake will follow. The new Vision Processing Unit, or VPU for short, is specifically designed to accelerate AI workloads. Just a year ago, AI was this abstract thing of the future, right now it seems like it manifested itself into our reality in the blink of an eye. All future SoCs will have dedicated AI and machine learning accelerators, Phoenix and Meteor Lake are only the beginning. And they will take up an increasing amount of die-space. Aside from housing a lot of connectivity and the new AI VPU, theres another reason the SoC tile is as big as it is. And its something I never would have expected: there are two additional Crestmont efficiency cores located within the SoC tile. So when we talk about Meteor Lake being a 6+8 design, its actually more of a 6+8+2 design. Intel calls them "low power E-cores", which is funny because the "E" in E-cores already stands for efficiency and implies low power, at least compared to the large Performance-cores. It's a really intersting choice, but something I suspect has been done for a single reason: to further increase energy efficiency, especially during sleep or low power states. We have extensively talked about the modular design of Meteor Lake and the energy efficiency focus of its Foveros interconnect technology. But even with it's incredible efficiency, it's always more efficient to disable large parts of the chip and completely shut them off. I think that during low power states, for example when you have been away from your laptop for a while or put it to sleep, the entire CPU tile will be disabled in order to save energy. During this state, the additional low-power E-cores located inside the SoC tile will handle all CPU tasks.This is just me guessing, but with the Intels clear focus on power efficiency for Meteor Lake, which is visible in every aspec of its design, its a very possible usecase. Intel's hybrid CPU is getting more complex over time, now the E-cores have their own E-cores. I'm wondering if we will get another layer in the future, like one or two high power P-cores or a 3rd tier of E-cores. Completely opposite of AMDs "one core fits all" design. To recap, the SoC tile not only contains a lot of I/O and system functionality, but also a new AI accelerator and two additional efficiency CPU cores. And that's why Meteor Lake engineering samples show up with these strange core readings, because yes, Meteor Lake can have 16 physical CPU cores, 6 performance and 10 efficiency cores, only not in the way we expected. If you think the SoC tile must be the most intersting part of Meteor Lake, now is the right time reconsider, because we have one last tile to talk about: the base tile, also used as the interposer. If it was just a passive silicon interposer for the sole purpose of connecting the tiles via Foveros, we would call the stacking method 2.5D, as its still vertical, but not actually active chip-on-chip. But Meteor Lake is true 3D stacking, because the interposer has active transistors. It's quiet a genius combination of different functions. For one, the interposer contains the metal layers for I/O and power delivery and Die2die Foveros routing. In addition, the base tile holds active silicon for memory and logic, most likely a rather large amount of cache, almost but not exactly like AMD's 3D V-Cache, only placed below the tiles. This supposedly Adamantine called last level L4 cache made the rounds just a few weeks ago, creating some buzz in the hardware community. And while getting confirmation of its actual name and early size indications is super interesting, Moore's Law is Dead is talking about 128 to 512 Megabytes of cache, the fact that Meteor Lake could include such a cache system is nothing new. At Hot Chips in August of last year, Intel revealed architectural details for Meteor and Arrow Lake. Slide 23 of the presentation clearly states that the Base tile contains active silicon for memory and logic. It was right there, for everyone to see. In additon, a patent filed in March of 2021 also explained the Adamantine cache, including a detailed overview of Meteor Lakes layout. Interesting to note here is that the interposer does not need to contain cache or logic, it can potentioally also function as a 2.5D interposer for lower priced SKUs. But back to the fact that the base layer will contain a large amount of L4 cache, how will it affect Meteor Lake? Can we expect a 3D V-Cache like performance jump? Maybe it will act like a Infinity Cache for the GPU too? Of course we can't know for sure until Intel reveals more information or we get hands on with Meteor Lake, but I suspect Intel is using the Adamantine cache for different reasons than what AMD is doing with cache on its CPUs and GPUs. Adamantine doesnt directly integrate and expand the L3 cache of the CPU tile, like 3D V-Cache does on AMD's X3D CPUs, and it's also not acting as a buffer between the GPU tile and the memory controller like AMD's Infinity Cache, at least from what I can tell. This means it won't be able to deliver the same performance improvements. Adamantine wont be Intels X3D counter. Of course it wont hurt CPU and GPU performance, more cache is always benefitial, as we have seen in the past with Intels Broadwell, which also used a L4 cache. But being faster than going off-chip to the LPDDR5 memory still isnt nearly as fast as AMDs L3 cache extention. I think that the actual goal for the L4 cache on Meteor Lake is power efficiency, which seems to be the main theme in Intels design choices. For one, a larger cache means more data is on-die, or in this case on package, and thus energy intensive memory access will be reduced, decreasing power draw and increasing efficiency. But I expect Intel is going a step further. Remember the two low-power E-cores inside the SoC tile and the outsourced display controller inside the tiny I/O tile? The whole setup looks like its made for a low-power always-on display mode. Intel can completely disable the entire CPU tile, request are handled by the SoC tile CPU. And then the GPU-state is dumped into the large Adamantine cache, while display updates are handled by the I/O and SoC tile, which means the GPU tile can also be deactivated. Plus memory access also isn't required, since theres a large on-package L4 cache in the form of Adamantine. As a result, Meteor Lake should be able to shut down two of its four top tiles and the LPDDR5 memory at the same time, entering a extremely low power state, while still being responsive and able to interact with incoming data. Intel could call it something like "14th gen sentinel always on mode" or some other BS marketing name and create a whole line of premium laptops that advertise this "never turn off your laptop" feature. I could be wrong on this one, but it just fits a bit too well. I'd love to get your thoughts on this idea. In a nutshell, the base layer is the heart of Meteor Lake. It enables the efficient Foveros 3D stacking and supports the entire chip with a large amount of cache. A rather elegant and smart solution. I'm really happy to see Intel innovating like this, when it seemed like over the last few years it was always AMD in front. With Meteor Lakes foundation and its tiles uncovered, how does it compare to AMDs chiplet approach? Meteor Lake is clearly a lot more complex, is Intel actually leapfrogging AMD? The answer is a clear no, although Intel is finally on a path of innovation again. AMD and Intel use very different approaches for their modular architectures, because they have very different goals. We know that AMD is able to execute much more complex chiplet and packaging technologies, like 3D V-Cache and especially as seen on MI300, a insane combination of multiple chips, process nodes and stacking methods. MI300 looks a lot more like tiles, a name that is actually growing on me. AMDs ZEN chiplet architecture isn't the way it is because AMD can't implement more complex designs, but because was designed with a single goal: cost efficient scalability. With only three individual tape-outs, a 8-core CPU die, a desktop I/O-die and a server I/O-die, AMD is able to scale its entire desktop and server line-up, from entry level Ryzen CPUs to high-end Epyc server chips. Just choose the right I/O die for the platform you want and connect any number of CPU chiplets. No other architecture even comes close and it quiet literally made AMD into the company it is today. Intel's tile approach has a completely different focus. It's not about low cost scalability across a entire client-to-server line-up, it's about creating highly flexible and extremely power efficient chips. This architecture was tailored for low-power mobile SoCs that are easily adaptable for very specific workloads. Intel can switch out different tiles without having to rework other areas of the chip, reducing follow up engineering costs. You can combine a 4-core CPU tile with a huge 192 execution unit GPU tile. You could design a huge CPU tile with a 8+16 configuration and maybe not add a GPU tile at all. We might not see any "F" branded chips from Intel in the future, because there wont be any defect iGPUs to bin. And in the future Intel will add other tiles into the mix, like dedicated AI tiles. AMD is focusing on macro scalability while Intel is focusing on micro scalability. Intel scales CPU performance with its small E-cores, AMD scales CPU peformance with more chiplets. Two very different approaches but equally interesting. After learning so much about Meteor Lake, it's even more of a bummer that, according to current rumors, we wont see a desktop release of Intels tile based design, at least not until Arrow Lake. But it does make sense if we take a look at what areas Meteor Lake improves over Raptor Lake. Energy efficiency, especially new low power modes, are clearly targeting the mobile segment and are not that useful in a desktop enviroment. At the same time, with only 6 P- and 8 E-cores in the CPU tile, disregarding the new SoC cores, it's a clear regression in CPU performance, even if Redwood Cove and Crestmont do provide some form of IPC gain. Intel 4 might also intitially struggle to achieve the same high clock speeds as Intel 7. That's why this year Intel will release a Raptor Lake referesh for desktop. But not to worry, Intels tile based architecture will find it's way into the desktop with Arrow Lake. Intel's next-gen architecture will add new CPU cores with a focus on more IPC. Once Intel has its tile architecture and packaging dialed in, monolithic CPUs will be a thing of the past, even in the desktop. Intel is on the right path. For the first time in many years I'm actually truly excited about Intels innovation and I can't wait to see how Meteor Lake and Arrow Lake turn out. As in every video, I want to know your thoughts and opinions. What do you think about Intels fundamental changes towards their tile based design? Do you think Meteor Lake will meet our expectations? And what is your favorite feature? Leave a comment down below, I'm looking forward to read what you have to say, including any crazy ideas to explain the extra SoC cores, the huge L4 cache and the tiny I/O tile. You know what to do if you found this video interesting and see you in the next one!

Info

Channel: High Yield

Views: 75,300

Rating: undefined out of 5

Keywords: intel meteor lake, intel 14th gen, intel chiplet, adamantine cache, adamantine l4, tile based design, intel foveros, intel 3d stacking, advanced packaging, intel 2024 cpu, semiconductor, intel tsmc

Id: 3QWpHjz-k7Y

Channel Id: undefined

Length: 24min 40sec (1480 seconds)

Published: Sun May 28 2023