Meteor Lake – Can Intel leapfrog AMD?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Meteor Lake is shaping up to be Intel's most  important product launch in recent memory,   with a big impact on the future  of Intels design direction.   There are many ways to improve a CPU.  Most commonly we talk about changes to the   cores inside the CPU, like improved Floating  Point or Integer units, higher clock speeds,   more cache or just more CPU cores in  general. All of these aspects are what   I would classify as "architectural" changes. And while Meteor Lake does tackle some of these   areas, it's true focus and what makes it so  special aren't its architectural changes,   but changes that are a level above -  or to be more precise - below that:   Meteor Lake will completely alter the very  foundation of how Intel designs CPUs.   With Meteor Lake Intel is finally entering the  chiplet era, which was started by AMD back in   2019 with Zen 2. But Intel isn't copying  AMD, nor are they trying to play catch up.   Meteor Lake is a bold attempt to switch from a  100% monolithic architecture like Raptor Lake   to a fully 3D stacked chiplet design, which  Intel calls a tile based architecture. And   as we will find out later in this video,  the term "tile" is actually very fitting,   as Intel not only uses different technology  but is also executing a very different   product strategy compared to AMD. Let's see if Intel is actually able   to leapfrog AMD and start our deep-dive  into the technology of Meteor Lake.   In order to cover all aspects of Meteor  Lake this video is structured in three   parts. First we will talk about the foundation  for Intel's tile based architecture and discuss   how interconnect and packaging technology  enable Intel's vision of a modular future.   Then we will take a look at what's hidden inside  the tiles that combined create Meteor Lake   and discover new features, like additional CPU  cores, where you would never expect them to be.   And finally we will compare Intels Tile  approach with AMDs chiplet architecture   and discuss the difference in strategies. The switch from a monolithic to a fully   disaggregated design comes with  many challenges, a major one is   how to physically connect all the parts that were  previously located on a single piece of silicon.   The simplest method is to use a available  interconnect protocol, for example PCI-Express,   and so to speak just run copper wires through the  substrate to transfer data between the physically   seperated chips. This method doesn't require  advanced packaging, it's relatively easy to   design and implement and it's low cost. The  downsides are low bandwidth, high latency and   high energy costs, as transporting data  off-silicon is very energy intensive.   Then there is 3D stacking, where a large silicon  interposer is placed onto the packaging substrate   and the chiplets are placed on top of the silicon  interposer. All data and power connections have   to run through the silicon interposer, which  on one side increases design and packaging   complexity, requiring the use of so called  trough-silicon-vias, but on the other side   it offers the higest bandwidth, lowest latency  and lowest energy cost, because transfering data   through silicon is a lot more efficient. It's a trade-off between complexity and   thus cost on one side and great physical  properties, like fast data transfer and   energy efficiency on the other side. AMD's chiplet architecture is using the   less complex approach. The individual chiplets are  clearly physically separated and are connected via   the substrate using AMD's Infinity Fabric,  which is a serial-to-paralell interconnect   based on PCI-Express. It's faster, more  efficient and specialized for AMD's usecase,   but thinking of AMD's chiplets as being connected  via PCI-Express isn't too far from reality,   at least in terms of understanding the concept. This picture of a Zen 2 PCB shows all the data   paths AMD is using to connect the individual  chiplets. Its a highly scalable and cost effective   design, but we all know about it's drawbacks  in form of latency and bandwidth penalties for   chiplet-to-chiplet communication. Thats why  AMD CPUs with a single chiplet are usually   still better when it comes to gaming. If we compare AMD's approach to pictures   of Intel's Meteor Lake we can already see  a clear difference. Instead of individual   and physcially separated chiplets, it almost  loooks like a monolithic chip, if it wasnt   for the thin lines that reveal that Meteor Lake  is in fact four different silicon chips sitting   on a large interposer. And just by looking at  it, the name "tiles" does start to make sense,   as it actually looks a lot like tiles. Intel is using its Foveros technology,   which is a die-to-die interconnect method using  so called micro-bumps and through-silicon-vias.   During the packaging process, in a first  step the individual tiles are placed and   bonded to the silicon interposer using a  method called chip-to-wafer-bonding. Next,   the bottom of the silicon interposer is thinned  until the through-silicon-vias are revealed.   Then the silicon interposer, with tiles on  top and TSV connection points on the bottom,   is placed onto the package substrate. This order  is choosen because the complex advanced packaging   steps are manufactured first, in this example  bonding the individual tiles to the interposer.   Only once this step is successfully completed are  the TSVs revealed and placed onto the substrate.   The version of Foveros Intel is using for Meteor  Lake achieves a 36 micrometer microbump pitch and   the die-to-die interconnect operates at about  .15 to .3 picojoule per bit, which is up to   an order of magnitude lower than AMD's Infinity  Fabric at about 1.5 picojoule per bit. This big   improvement in data transfer energy efficiency  is the most important factor for Intel. In fact,   the potential bandwidth and latency benefits  of using a interposer are not really Intel's   focus with this design, it's all about the energy  efficiency, that's where Foveros truely shines.   With Foveros, Intel has created a modern 3D  stacking method that enables flexible multi-tile   chips without sacrificing efficiency when  transporting data between each individual tile.   Now that we know what interconnect and packaging  technology Intel is using to enable Meteor Lake,   let's take a look at the overall layout of  the chip and then dive into the silicon level,   to undercover whats hidden inside the tiles. With a quick top-down analysis we can identify   four different tiles: one really large one,  one medium sized one, a slim one and really   tiny one. My first impression was that  the large one had to be the CPU tile,   with the other three housing GPU, I/O and probably  a machine learning accelerator. But as so often,   my first impression was completely off. The biggest tile, located in the middle of the   chip, is actually the SoC tile, housing functions  you would usually find on a chipset. The slim tile   to its right is the GPU tile, where the Alchemist  based Xe iGPU is located. The medium sized tile   to the left is the CPU tile, housing the CPU  cores and below that we have a tiny I/O tile,   most likely providing Display Port or other  physical outputs. And of course we cant forget   the interposer, also called base tile, which  is located below the tiles sitting on top. Due   to the asymmetrical design, a small part of the  base tile is visible, right below the I/O tile.   And if that layout isn't crazy enough for you,  it gets even wilder when you consider the various   different process nodes used to produce the  individual tiles. The CPU tile is using the   new Intel 4 node, previously known as Intel 7nm  and the first Intel EUV based process. The GPU   tile is based on TSMCs next-gen 3nm process node,  N3B to be precise if Semianalysis is correct. The   SoC tile utilizes TSMCs N6 process node, the  same is most likely true for the I/O tile.   The base tile could either be an older Intel  node, for example 22FFL, now called Intel 16,   but there are also rumors its using a optimized  Intel 7 node, which is the current node used for   Alder and Raptor Lake. Combined there are  four different process nodes being used to   produce the five different tiles that make up  Meteor Lake, and the majority of them is not   from Intel. Just like AMD, Intel is starting to  contract TSMC for a large portion of their chips.   Now that we have a overview of all tiles, lets  take a closer look at each of them individually.   I'll start with the what most of you are  here for, the CPU tile. As you expect,   it contains the CPU cores, in case of this  specific Meteor Lake SKU its a 6+8 design,   with six large Redwood Cove performance cores  and eight small Crestmont efficiency cores,   as visible in this die-shot Intel provided. It's  a reduction from the 8+16 setup in Raptor Lake,   but remember that we are looking at a mobile  chip, a potential desktop CPU would come with   a larger CPU configuration, made possible  by Meteor Lakes flexible tile design.   In the beginning of the video I talked about  architectural and foundational changes,   in the case of Redwood Cove and Crestmont we  are in the realm of architectural improvements.   Alder Lake combined Golden Cove with Gracemont,  Raptor Lake used Raptor Cove and Gracemont,   and with Meteor Lake Intel switches to Redwood  Cove and Crestmont, both with potential IPC   and efficiency improvements. Since the  new cores are manufactured on Intel 4,   they have to be adapted to the new process node. Semianalysis and Locuza did a great low level   die-shot analysis, but found very little changes.  Redwood Cove comes with 2 Megabytes of L2 cache,   a upgrade already introduced with Raptor  Cove. Visible changes to the performance   cores are very minimal and most likely related  to the re-design required for the new Intel 4   process node. The same is true for the Crestmont  efficiency cores, which seem to be a shrink of the   Intel 7 based Gracemont. All-in all, while new CPU  core names come with the promise of architectural   improvements, it looks like the actual changes for  Meteor Lakes CPU side are very mininal, aside from   the process node shrink. And it does make sense:  implementing a new tile based design and combining   it with a node shrink at the same time is already  a lot of engineering work, it would be very unlike   Intel to increase the probability of failure by  adding complex architectural changes on top.   To recap, Meteor Lake will most likely offer  very limited IPC improvements on the CPU side,   as its looks like all we are getting is a process  node shrink of the Raptor Lake architecture with   only little architectural improvements. Something that has a lot more potential to   be exciting is the GPU tile, which will utilize  Intel's Alchemist architectre and is one of the   first chips produced in TSMCs next-gen 3nm node.  With up to 192 Execution Units, the Intel Xe iGPU   has the potential for a large peformance uplift,  especially if TSMCs N3B performs well. Currently,   AMD's Phoenix does reign supreme,  I'm hopeful Intel can strike back.   This version of Meteor Lake isn't equipped  with the largest GPU tile and most likely   only has somewhere between 64 to 96 Execution  Units, which still will be plenty fast. Integrated   GPUs could very soon completey replace  low-end dedicated GPUs in the laptop space,   something that not only saves energy but also  allows for thinner and lighter laptop designs.   Even tho Alchemist discrete GPUs didn't perform as  well as hoped, iGPUs are a whole different story,   like AMD's Vega based iGPUs, which performed much  better than their desktop counterparts. And with   Alchemist, Meteor Lake also inherits  Intels amazing de- and en-code unit,   supporting all modern codecs, including AV1. The GPU side of Meteor Lake is definitely   something to look out for, with much higher  performance potential than it's CPU side. The   modular natur of Meteor Lake means Intel can  prepare a number of GPU tiles with different   amounts of GPU cores and thus scale its  products, unlike AMD with a fixed amount   of GPU cores on their mononlithic mobile chips  like Rembrandt and Phoenix. AMD has to watch out.   Before we look at the huge SoC tile,  which does come with some cool surprises,   let's quickly cover the tiny I/O tile. It's only  around 10 square millimeters in size and possible   houses Display Port and or Tunderbolt ports. It  could also be used for the memory controller,   tho thats to be seen. A lot of mistery for a  chip that small. I'm really interested why Intel   choose to outsource some ports in such a small  chip, adding packaging complexity. It might has   something to do with power efficiency and  the ability to basically completely shut of   individual tiles, but until Intel releases more  information, there's not much to say about it.   But there's a lot to say about the absolutely  massive SoC tile. Why is it so large?   First of all, all modern CPUs are also Systems  on a Chip, meaning they provide much more than   just CPU cores. AMDs Zen 4, Intels Raptor  Lake or Apples M2, they all provide graphics,   internal and external I/O, physical ports,  management engines and so on. Just take a   look at this die-shot of Alder Lake. Yes,  the CPU cores do take up a lot, but about   40% of the die are used for other functions.  And Alder Lake is a dekstop chip that connects   to a chipset located on the motherboard, which  enhances it's connectivity options. Meteor Lake,   at least on mobile, wont use a external chipset,  meaning all connectivity has to be provided by   the SoC tile, including build-in WiFi-6E support. Then, just as AMD's Phoenix introduced a dedicated   on-die AI-Engine, Meteor Lake will follow. The  new Vision Processing Unit, or VPU for short,   is specifically designed to accelerate AI  workloads. Just a year ago, AI was this abstract   thing of the future, right now it seems like it  manifested itself into our reality in the blink   of an eye. All future SoCs will have dedicated  AI and machine learning accelerators, Phoenix   and Meteor Lake are only the beginning. And they  will take up an increasing amount of die-space.   Aside from housing a lot of connectivity and the  new AI VPU, theres another reason the SoC tile is   as big as it is. And its something I never would  have expected: there are two additional Crestmont   efficiency cores located within the SoC tile. So  when we talk about Meteor Lake being a 6+8 design,   its actually more of a 6+8+2 design.  Intel calls them "low power E-cores",   which is funny because the "E" in E-cores already  stands for efficiency and implies low power,   at least compared to the large Performance-cores. It's a really intersting choice, but something I   suspect has been done for a single reason:  to further increase energy efficiency,   especially during sleep or low power states.  We have extensively talked about the modular   design of Meteor Lake and the energy efficiency  focus of its Foveros interconnect technology.   But even with it's incredible efficiency, it's  always more efficient to disable large parts of   the chip and completely shut them off. I think that during low power states,   for example when you have been away from  your laptop for a while or put it to sleep,   the entire CPU tile will be disabled in  order to save energy. During this state,   the additional low-power E-cores located inside  the SoC tile will handle all CPU tasks.This is   just me guessing, but with the Intels clear  focus on power efficiency for Meteor Lake,   which is visible in every aspec of its  design, its a very possible usecase.   Intel's hybrid CPU is getting more complex  over time, now the E-cores have their own   E-cores. I'm wondering if we will get another  layer in the future, like one or two high power   P-cores or a 3rd tier of E-cores. Completely  opposite of AMDs "one core fits all" design.   To recap, the SoC tile not only contains a lot of  I/O and system functionality, but also a new AI   accelerator and two additional efficiency CPU  cores. And that's why Meteor Lake engineering   samples show up with these strange core readings,  because yes, Meteor Lake can have 16 physical CPU   cores, 6 performance and 10 efficiency  cores, only not in the way we expected.   If you think the SoC tile must be the  most intersting part of Meteor Lake,   now is the right time reconsider, because  we have one last tile to talk about:   the base tile, also used as the interposer.  If it was just a passive silicon interposer   for the sole purpose of connecting the tiles via  Foveros, we would call the stacking method 2.5D,   as its still vertical, but not actually active  chip-on-chip. But Meteor Lake is true 3D stacking,   because the interposer has active transistors. It's quiet a genius combination of different   functions. For one, the interposer contains  the metal layers for I/O and power delivery   and Die2die Foveros routing. In addition, the base  tile holds active silicon for memory and logic,   most likely a rather large amount of  cache, almost but not exactly like   AMD's 3D V-Cache, only placed below the tiles. This supposedly Adamantine called last level L4   cache made the rounds just a few weeks ago,  creating some buzz in the hardware community.   And while getting confirmation of its actual name  and early size indications is super interesting,   Moore's Law is Dead is talking about 128 to 512  Megabytes of cache, the fact that Meteor Lake   could include such a cache system is nothing new. At Hot Chips in August of last year,   Intel revealed architectural details for Meteor  and Arrow Lake. Slide 23 of the presentation   clearly states that the Base tile contains active  silicon for memory and logic. It was right there,   for everyone to see. In additon, a patent filed in  March of 2021 also explained the Adamantine cache,   including a detailed overview of Meteor Lakes  layout. Interesting to note here is that the   interposer does not need to contain cache or  logic, it can potentioally also function as   a 2.5D interposer for lower priced SKUs. But back to the fact that the base layer   will contain a large amount of L4 cache, how  will it affect Meteor Lake? Can we expect a 3D   V-Cache like performance jump? Maybe it will act  like a Infinity Cache for the GPU too? Of course   we can't know for sure until Intel reveals more  information or we get hands on with Meteor Lake,   but I suspect Intel is using the Adamantine  cache for different reasons than what AMD is   doing with cache on its CPUs and GPUs. Adamantine doesnt directly integrate and   expand the L3 cache of the CPU tile, like 3D  V-Cache does on AMD's X3D CPUs, and it's also   not acting as a buffer between the GPU tile and  the memory controller like AMD's Infinity Cache,   at least from what I can tell. This means it  won't be able to deliver the same performance   improvements. Adamantine wont be Intels X3D  counter. Of course it wont hurt CPU and GPU   performance, more cache is always benefitial, as  we have seen in the past with Intels Broadwell,   which also used a L4 cache. But being faster  than going off-chip to the LPDDR5 memory still   isnt nearly as fast as AMDs L3 cache extention. I think that the actual goal for the L4 cache on   Meteor Lake is power efficiency, which seems to be  the main theme in Intels design choices. For one,   a larger cache means more data is  on-die, or in this case on package,   and thus energy intensive memory access will be  reduced, decreasing power draw and increasing   efficiency. But I expect Intel is going a  step further. Remember the two low-power   E-cores inside the SoC tile and the outsourced  display controller inside the tiny I/O tile?   The whole setup looks like its made for a  low-power always-on display mode. Intel can   completely disable the entire CPU tile, request  are handled by the SoC tile CPU. And then the   GPU-state is dumped into the large Adamantine  cache, while display updates are handled by   the I/O and SoC tile, which means the GPU  tile can also be deactivated. Plus memory   access also isn't required, since theres a large  on-package L4 cache in the form of Adamantine.   As a result, Meteor Lake should be able to shut  down two of its four top tiles and the LPDDR5   memory at the same time, entering a extremely low  power state, while still being responsive and able   to interact with incoming data. Intel could call  it something like "14th gen sentinel always on   mode" or some other BS marketing name and create  a whole line of premium laptops that advertise   this "never turn off your laptop" feature. I could  be wrong on this one, but it just fits a bit too   well. I'd love to get your thoughts on this idea. In a nutshell, the base layer is the heart of   Meteor Lake. It enables the efficient Foveros  3D stacking and supports the entire chip with   a large amount of cache. A rather elegant and  smart solution. I'm really happy to see Intel   innovating like this, when it seemed like over  the last few years it was always AMD in front.   With Meteor Lakes foundation  and its tiles uncovered,   how does it compare to AMDs chiplet approach?  Meteor Lake is clearly a lot more complex, is   Intel actually leapfrogging AMD? The answer is a  clear no, although Intel is finally on a path of   innovation again. AMD and Intel use very different  approaches for their modular architectures,   because they have very different goals. We know that AMD is able to execute much   more complex chiplet and packaging technologies,  like 3D V-Cache and especially as seen on MI300,   a insane combination of multiple chips,  process nodes and stacking methods.   MI300 looks a lot more like tiles, a  name that is actually growing on me.   AMDs ZEN chiplet architecture isn't the way it is  because AMD can't implement more complex designs,   but because was designed with a single goal: cost  efficient scalability. With only three individual   tape-outs, a 8-core CPU die, a desktop I/O-die  and a server I/O-die, AMD is able to scale its   entire desktop and server line-up, from entry  level Ryzen CPUs to high-end Epyc server chips.   Just choose the right I/O die for the platform you  want and connect any number of CPU chiplets. No   other architecture even comes close and it quiet  literally made AMD into the company it is today.   Intel's tile approach has a completely  different focus. It's not about low cost   scalability across a entire client-to-server  line-up, it's about creating highly flexible   and extremely power efficient chips. This  architecture was tailored for low-power mobile   SoCs that are easily adaptable for very specific  workloads. Intel can switch out different tiles   without having to rework other areas of the  chip, reducing follow up engineering costs.   You can combine a 4-core CPU tile with a huge  192 execution unit GPU tile. You could design   a huge CPU tile with a 8+16 configuration and  maybe not add a GPU tile at all. We might not   see any "F" branded chips from Intel in the  future, because there wont be any defect iGPUs   to bin. And in the future Intel will add other  tiles into the mix, like dedicated AI tiles.   AMD is focusing on macro scalability while Intel  is focusing on micro scalability. Intel scales CPU   performance with its small E-cores, AMD scales  CPU peformance with more chiplets. Two very   different approaches but equally interesting. After learning so much about Meteor Lake, it's   even more of a bummer that, according to current  rumors, we wont see a desktop release of Intels   tile based design, at least not until Arrow Lake. But it does make sense if we take a look at what   areas Meteor Lake improves over Raptor Lake.  Energy efficiency, especially new low power modes,   are clearly targeting the mobile segment and  are not that useful in a desktop enviroment.   At the same time, with only 6 P- and 8 E-cores  in the CPU tile, disregarding the new SoC cores,   it's a clear regression in CPU performance, even  if Redwood Cove and Crestmont do provide some   form of IPC gain. Intel 4 might also intitially  struggle to achieve the same high clock speeds   as Intel 7. That's why this year Intel will  release a Raptor Lake referesh for desktop.   But not to worry, Intels tile based architecture  will find it's way into the desktop with Arrow   Lake. Intel's next-gen architecture will  add new CPU cores with a focus on more IPC.   Once Intel has its tile architecture and  packaging dialed in, monolithic CPUs will   be a thing of the past, even in the desktop. Intel is on the right path. For the first time   in many years I'm actually truly excited  about Intels innovation and I can't wait   to see how Meteor Lake and Arrow Lake turn out. As in every video, I want to know your thoughts   and opinions. What do you think about Intels  fundamental changes towards their tile based   design? Do you think Meteor Lake will meet our  expectations? And what is your favorite feature?   Leave a comment down below, I'm looking  forward to read what you have to say,   including any crazy ideas to explain the extra SoC  cores, the huge L4 cache and the tiny I/O tile.   You know what to do if you found this video  interesting and see you in the next one!
Info
Channel: High Yield
Views: 75,300
Rating: undefined out of 5
Keywords: intel meteor lake, intel 14th gen, intel chiplet, adamantine cache, adamantine l4, tile based design, intel foveros, intel 3d stacking, advanced packaging, intel 2024 cpu, semiconductor, intel tsmc
Id: 3QWpHjz-k7Y
Channel Id: undefined
Length: 24min 40sec (1480 seconds)
Published: Sun May 28 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.