Meteor Lake is shaping up to be Intel's most
important product launch in recent memory, with a big impact on the future
of Intels design direction.
There are many ways to improve a CPU.
Most commonly we talk about changes to the cores inside the CPU, like improved Floating
Point or Integer units, higher clock speeds, more cache or just more CPU cores in
general. All of these aspects are what I would classify as "architectural" changes.
And while Meteor Lake does tackle some of these areas, it's true focus and what makes it so
special aren't its architectural changes, but changes that are a level above -
or to be more precise - below that: Meteor Lake will completely alter the very
foundation of how Intel designs CPUs.
With Meteor Lake Intel is finally entering the
chiplet era, which was started by AMD back in 2019 with Zen 2. But Intel isn't copying
AMD, nor are they trying to play catch up. Meteor Lake is a bold attempt to switch from a
100% monolithic architecture like Raptor Lake to a fully 3D stacked chiplet design, which
Intel calls a tile based architecture. And as we will find out later in this video,
the term "tile" is actually very fitting, as Intel not only uses different technology
but is also executing a very different product strategy compared to AMD.
Let's see if Intel is actually able to leapfrog AMD and start our deep-dive
into the technology of Meteor Lake.
In order to cover all aspects of Meteor
Lake this video is structured in three parts. First we will talk about the foundation
for Intel's tile based architecture and discuss how interconnect and packaging technology
enable Intel's vision of a modular future.
Then we will take a look at what's hidden inside
the tiles that combined create Meteor Lake and discover new features, like additional CPU
cores, where you would never expect them to be.
And finally we will compare Intels Tile
approach with AMDs chiplet architecture and discuss the difference in strategies.
The switch from a monolithic to a fully disaggregated design comes with
many challenges, a major one is how to physically connect all the parts that were
previously located on a single piece of silicon.
The simplest method is to use a available
interconnect protocol, for example PCI-Express, and so to speak just run copper wires through the
substrate to transfer data between the physically seperated chips. This method doesn't require
advanced packaging, it's relatively easy to design and implement and it's low cost. The
downsides are low bandwidth, high latency and high energy costs, as transporting data
off-silicon is very energy intensive.
Then there is 3D stacking, where a large silicon
interposer is placed onto the packaging substrate and the chiplets are placed on top of the silicon
interposer. All data and power connections have to run through the silicon interposer, which
on one side increases design and packaging complexity, requiring the use of so called
trough-silicon-vias, but on the other side it offers the higest bandwidth, lowest latency
and lowest energy cost, because transfering data through silicon is a lot more efficient.
It's a trade-off between complexity and thus cost on one side and great physical
properties, like fast data transfer and energy efficiency on the other side.
AMD's chiplet architecture is using the less complex approach. The individual chiplets are
clearly physically separated and are connected via the substrate using AMD's Infinity Fabric,
which is a serial-to-paralell interconnect based on PCI-Express. It's faster, more
efficient and specialized for AMD's usecase, but thinking of AMD's chiplets as being connected
via PCI-Express isn't too far from reality, at least in terms of understanding the concept.
This picture of a Zen 2 PCB shows all the data paths AMD is using to connect the individual
chiplets. Its a highly scalable and cost effective design, but we all know about it's drawbacks
in form of latency and bandwidth penalties for chiplet-to-chiplet communication. Thats why
AMD CPUs with a single chiplet are usually still better when it comes to gaming.
If we compare AMD's approach to pictures of Intel's Meteor Lake we can already see
a clear difference. Instead of individual and physcially separated chiplets, it almost
loooks like a monolithic chip, if it wasnt for the thin lines that reveal that Meteor Lake
is in fact four different silicon chips sitting on a large interposer. And just by looking at
it, the name "tiles" does start to make sense, as it actually looks a lot like tiles.
Intel is using its Foveros technology, which is a die-to-die interconnect method using
so called micro-bumps and through-silicon-vias.
During the packaging process, in a first
step the individual tiles are placed and bonded to the silicon interposer using a
method called chip-to-wafer-bonding. Next, the bottom of the silicon interposer is thinned
until the through-silicon-vias are revealed. Then the silicon interposer, with tiles on
top and TSV connection points on the bottom, is placed onto the package substrate. This order
is choosen because the complex advanced packaging steps are manufactured first, in this example
bonding the individual tiles to the interposer. Only once this step is successfully completed are
the TSVs revealed and placed onto the substrate.
The version of Foveros Intel is using for Meteor
Lake achieves a 36 micrometer microbump pitch and the die-to-die interconnect operates at about
.15 to .3 picojoule per bit, which is up to an order of magnitude lower than AMD's Infinity
Fabric at about 1.5 picojoule per bit. This big improvement in data transfer energy efficiency
is the most important factor for Intel. In fact, the potential bandwidth and latency benefits
of using a interposer are not really Intel's focus with this design, it's all about the energy
efficiency, that's where Foveros truely shines.
With Foveros, Intel has created a modern 3D
stacking method that enables flexible multi-tile chips without sacrificing efficiency when
transporting data between each individual tile.
Now that we know what interconnect and packaging
technology Intel is using to enable Meteor Lake, let's take a look at the overall layout of
the chip and then dive into the silicon level, to undercover whats hidden inside the tiles.
With a quick top-down analysis we can identify four different tiles: one really large one,
one medium sized one, a slim one and really tiny one. My first impression was that
the large one had to be the CPU tile, with the other three housing GPU, I/O and probably
a machine learning accelerator. But as so often, my first impression was completely off.
The biggest tile, located in the middle of the chip, is actually the SoC tile, housing functions
you would usually find on a chipset. The slim tile to its right is the GPU tile, where the Alchemist
based Xe iGPU is located. The medium sized tile to the left is the CPU tile, housing the CPU
cores and below that we have a tiny I/O tile, most likely providing Display Port or other
physical outputs. And of course we cant forget the interposer, also called base tile, which
is located below the tiles sitting on top. Due to the asymmetrical design, a small part of the
base tile is visible, right below the I/O tile.
And if that layout isn't crazy enough for you,
it gets even wilder when you consider the various different process nodes used to produce the
individual tiles. The CPU tile is using the new Intel 4 node, previously known as Intel 7nm
and the first Intel EUV based process. The GPU tile is based on TSMCs next-gen 3nm process node,
N3B to be precise if Semianalysis is correct. The SoC tile utilizes TSMCs N6 process node, the
same is most likely true for the I/O tile. The base tile could either be an older Intel
node, for example 22FFL, now called Intel 16, but there are also rumors its using a optimized
Intel 7 node, which is the current node used for Alder and Raptor Lake. Combined there are
four different process nodes being used to produce the five different tiles that make up
Meteor Lake, and the majority of them is not from Intel. Just like AMD, Intel is starting to
contract TSMC for a large portion of their chips.
Now that we have a overview of all tiles, lets
take a closer look at each of them individually.
I'll start with the what most of you are
here for, the CPU tile. As you expect, it contains the CPU cores, in case of this
specific Meteor Lake SKU its a 6+8 design, with six large Redwood Cove performance cores
and eight small Crestmont efficiency cores, as visible in this die-shot Intel provided. It's
a reduction from the 8+16 setup in Raptor Lake, but remember that we are looking at a mobile
chip, a potential desktop CPU would come with a larger CPU configuration, made possible
by Meteor Lakes flexible tile design.
In the beginning of the video I talked about
architectural and foundational changes, in the case of Redwood Cove and Crestmont we
are in the realm of architectural improvements. Alder Lake combined Golden Cove with Gracemont,
Raptor Lake used Raptor Cove and Gracemont, and with Meteor Lake Intel switches to Redwood
Cove and Crestmont, both with potential IPC and efficiency improvements. Since the
new cores are manufactured on Intel 4, they have to be adapted to the new process node.
Semianalysis and Locuza did a great low level die-shot analysis, but found very little changes.
Redwood Cove comes with 2 Megabytes of L2 cache, a upgrade already introduced with Raptor
Cove. Visible changes to the performance cores are very minimal and most likely related
to the re-design required for the new Intel 4 process node. The same is true for the Crestmont
efficiency cores, which seem to be a shrink of the Intel 7 based Gracemont. All-in all, while new CPU
core names come with the promise of architectural improvements, it looks like the actual changes for
Meteor Lakes CPU side are very mininal, aside from the process node shrink. And it does make sense:
implementing a new tile based design and combining it with a node shrink at the same time is already
a lot of engineering work, it would be very unlike Intel to increase the probability of failure by
adding complex architectural changes on top.
To recap, Meteor Lake will most likely offer
very limited IPC improvements on the CPU side, as its looks like all we are getting is a process
node shrink of the Raptor Lake architecture with only little architectural improvements.
Something that has a lot more potential to be exciting is the GPU tile, which will utilize
Intel's Alchemist architectre and is one of the first chips produced in TSMCs next-gen 3nm node.
With up to 192 Execution Units, the Intel Xe iGPU has the potential for a large peformance uplift,
especially if TSMCs N3B performs well. Currently, AMD's Phoenix does reign supreme,
I'm hopeful Intel can strike back.
This version of Meteor Lake isn't equipped
with the largest GPU tile and most likely only has somewhere between 64 to 96 Execution
Units, which still will be plenty fast. Integrated GPUs could very soon completey replace
low-end dedicated GPUs in the laptop space, something that not only saves energy but also
allows for thinner and lighter laptop designs.
Even tho Alchemist discrete GPUs didn't perform as
well as hoped, iGPUs are a whole different story, like AMD's Vega based iGPUs, which performed much
better than their desktop counterparts. And with Alchemist, Meteor Lake also inherits
Intels amazing de- and en-code unit, supporting all modern codecs, including AV1.
The GPU side of Meteor Lake is definitely something to look out for, with much higher
performance potential than it's CPU side. The modular natur of Meteor Lake means Intel can
prepare a number of GPU tiles with different amounts of GPU cores and thus scale its
products, unlike AMD with a fixed amount of GPU cores on their mononlithic mobile chips
like Rembrandt and Phoenix. AMD has to watch out.
Before we look at the huge SoC tile,
which does come with some cool surprises, let's quickly cover the tiny I/O tile. It's only
around 10 square millimeters in size and possible houses Display Port and or Tunderbolt ports. It
could also be used for the memory controller, tho thats to be seen. A lot of mistery for a
chip that small. I'm really interested why Intel choose to outsource some ports in such a small
chip, adding packaging complexity. It might has something to do with power efficiency and
the ability to basically completely shut of individual tiles, but until Intel releases more
information, there's not much to say about it.
But there's a lot to say about the absolutely
massive SoC tile. Why is it so large?
First of all, all modern CPUs are also Systems
on a Chip, meaning they provide much more than just CPU cores. AMDs Zen 4, Intels Raptor
Lake or Apples M2, they all provide graphics, internal and external I/O, physical ports,
management engines and so on. Just take a look at this die-shot of Alder Lake. Yes,
the CPU cores do take up a lot, but about 40% of the die are used for other functions.
And Alder Lake is a dekstop chip that connects to a chipset located on the motherboard, which
enhances it's connectivity options. Meteor Lake, at least on mobile, wont use a external chipset,
meaning all connectivity has to be provided by the SoC tile, including build-in WiFi-6E support.
Then, just as AMD's Phoenix introduced a dedicated on-die AI-Engine, Meteor Lake will follow. The
new Vision Processing Unit, or VPU for short, is specifically designed to accelerate AI
workloads. Just a year ago, AI was this abstract thing of the future, right now it seems like it
manifested itself into our reality in the blink of an eye. All future SoCs will have dedicated
AI and machine learning accelerators, Phoenix and Meteor Lake are only the beginning. And they
will take up an increasing amount of die-space.
Aside from housing a lot of connectivity and the
new AI VPU, theres another reason the SoC tile is as big as it is. And its something I never would
have expected: there are two additional Crestmont efficiency cores located within the SoC tile. So
when we talk about Meteor Lake being a 6+8 design, its actually more of a 6+8+2 design.
Intel calls them "low power E-cores", which is funny because the "E" in E-cores already
stands for efficiency and implies low power, at least compared to the large Performance-cores.
It's a really intersting choice, but something I suspect has been done for a single reason:
to further increase energy efficiency, especially during sleep or low power states.
We have extensively talked about the modular design of Meteor Lake and the energy efficiency
focus of its Foveros interconnect technology. But even with it's incredible efficiency, it's
always more efficient to disable large parts of the chip and completely shut them off.
I think that during low power states, for example when you have been away from
your laptop for a while or put it to sleep, the entire CPU tile will be disabled in
order to save energy. During this state, the additional low-power E-cores located inside
the SoC tile will handle all CPU tasks.This is just me guessing, but with the Intels clear
focus on power efficiency for Meteor Lake, which is visible in every aspec of its
design, its a very possible usecase.
Intel's hybrid CPU is getting more complex
over time, now the E-cores have their own E-cores. I'm wondering if we will get another
layer in the future, like one or two high power P-cores or a 3rd tier of E-cores. Completely
opposite of AMDs "one core fits all" design.
To recap, the SoC tile not only contains a lot of
I/O and system functionality, but also a new AI accelerator and two additional efficiency CPU
cores. And that's why Meteor Lake engineering samples show up with these strange core readings,
because yes, Meteor Lake can have 16 physical CPU cores, 6 performance and 10 efficiency
cores, only not in the way we expected.
If you think the SoC tile must be the
most intersting part of Meteor Lake, now is the right time reconsider, because
we have one last tile to talk about: the base tile, also used as the interposer.
If it was just a passive silicon interposer for the sole purpose of connecting the tiles via
Foveros, we would call the stacking method 2.5D, as its still vertical, but not actually active
chip-on-chip. But Meteor Lake is true 3D stacking, because the interposer has active transistors.
It's quiet a genius combination of different functions. For one, the interposer contains
the metal layers for I/O and power delivery and Die2die Foveros routing. In addition, the base
tile holds active silicon for memory and logic, most likely a rather large amount of
cache, almost but not exactly like AMD's 3D V-Cache, only placed below the tiles.
This supposedly Adamantine called last level L4 cache made the rounds just a few weeks ago,
creating some buzz in the hardware community. And while getting confirmation of its actual name
and early size indications is super interesting, Moore's Law is Dead is talking about 128 to 512
Megabytes of cache, the fact that Meteor Lake could include such a cache system is nothing new.
At Hot Chips in August of last year, Intel revealed architectural details for Meteor
and Arrow Lake. Slide 23 of the presentation clearly states that the Base tile contains active
silicon for memory and logic. It was right there, for everyone to see. In additon, a patent filed in
March of 2021 also explained the Adamantine cache, including a detailed overview of Meteor Lakes
layout. Interesting to note here is that the interposer does not need to contain cache or
logic, it can potentioally also function as a 2.5D interposer for lower priced SKUs.
But back to the fact that the base layer will contain a large amount of L4 cache, how
will it affect Meteor Lake? Can we expect a 3D V-Cache like performance jump? Maybe it will act
like a Infinity Cache for the GPU too? Of course we can't know for sure until Intel reveals more
information or we get hands on with Meteor Lake, but I suspect Intel is using the Adamantine
cache for different reasons than what AMD is doing with cache on its CPUs and GPUs.
Adamantine doesnt directly integrate and expand the L3 cache of the CPU tile, like 3D
V-Cache does on AMD's X3D CPUs, and it's also not acting as a buffer between the GPU tile and
the memory controller like AMD's Infinity Cache, at least from what I can tell. This means it
won't be able to deliver the same performance improvements. Adamantine wont be Intels X3D
counter. Of course it wont hurt CPU and GPU performance, more cache is always benefitial, as
we have seen in the past with Intels Broadwell, which also used a L4 cache. But being faster
than going off-chip to the LPDDR5 memory still isnt nearly as fast as AMDs L3 cache extention.
I think that the actual goal for the L4 cache on Meteor Lake is power efficiency, which seems to be
the main theme in Intels design choices. For one, a larger cache means more data is
on-die, or in this case on package, and thus energy intensive memory access will be
reduced, decreasing power draw and increasing efficiency. But I expect Intel is going a
step further. Remember the two low-power E-cores inside the SoC tile and the outsourced
display controller inside the tiny I/O tile?
The whole setup looks like its made for a
low-power always-on display mode. Intel can completely disable the entire CPU tile, request
are handled by the SoC tile CPU. And then the GPU-state is dumped into the large Adamantine
cache, while display updates are handled by the I/O and SoC tile, which means the GPU
tile can also be deactivated. Plus memory access also isn't required, since theres a large
on-package L4 cache in the form of Adamantine.
As a result, Meteor Lake should be able to shut
down two of its four top tiles and the LPDDR5 memory at the same time, entering a extremely low
power state, while still being responsive and able to interact with incoming data. Intel could call
it something like "14th gen sentinel always on mode" or some other BS marketing name and create
a whole line of premium laptops that advertise this "never turn off your laptop" feature. I could
be wrong on this one, but it just fits a bit too well. I'd love to get your thoughts on this idea.
In a nutshell, the base layer is the heart of Meteor Lake. It enables the efficient Foveros
3D stacking and supports the entire chip with a large amount of cache. A rather elegant and
smart solution. I'm really happy to see Intel innovating like this, when it seemed like over
the last few years it was always AMD in front.
With Meteor Lakes foundation
and its tiles uncovered, how does it compare to AMDs chiplet approach?
Meteor Lake is clearly a lot more complex, is Intel actually leapfrogging AMD? The answer is a
clear no, although Intel is finally on a path of innovation again. AMD and Intel use very different
approaches for their modular architectures, because they have very different goals.
We know that AMD is able to execute much more complex chiplet and packaging technologies,
like 3D V-Cache and especially as seen on MI300, a insane combination of multiple chips,
process nodes and stacking methods. MI300 looks a lot more like tiles, a
name that is actually growing on me.
AMDs ZEN chiplet architecture isn't the way it is
because AMD can't implement more complex designs, but because was designed with a single goal: cost
efficient scalability. With only three individual tape-outs, a 8-core CPU die, a desktop I/O-die
and a server I/O-die, AMD is able to scale its entire desktop and server line-up, from entry
level Ryzen CPUs to high-end Epyc server chips. Just choose the right I/O die for the platform you
want and connect any number of CPU chiplets. No other architecture even comes close and it quiet
literally made AMD into the company it is today.
Intel's tile approach has a completely
different focus. It's not about low cost scalability across a entire client-to-server
line-up, it's about creating highly flexible and extremely power efficient chips. This
architecture was tailored for low-power mobile SoCs that are easily adaptable for very specific
workloads. Intel can switch out different tiles without having to rework other areas of the
chip, reducing follow up engineering costs. You can combine a 4-core CPU tile with a huge
192 execution unit GPU tile. You could design a huge CPU tile with a 8+16 configuration and
maybe not add a GPU tile at all. We might not see any "F" branded chips from Intel in the
future, because there wont be any defect iGPUs to bin. And in the future Intel will add other
tiles into the mix, like dedicated AI tiles.
AMD is focusing on macro scalability while Intel
is focusing on micro scalability. Intel scales CPU performance with its small E-cores, AMD scales
CPU peformance with more chiplets. Two very different approaches but equally interesting.
After learning so much about Meteor Lake, it's even more of a bummer that, according to current
rumors, we wont see a desktop release of Intels tile based design, at least not until Arrow Lake.
But it does make sense if we take a look at what areas Meteor Lake improves over Raptor Lake.
Energy efficiency, especially new low power modes, are clearly targeting the mobile segment and
are not that useful in a desktop enviroment. At the same time, with only 6 P- and 8 E-cores
in the CPU tile, disregarding the new SoC cores, it's a clear regression in CPU performance, even
if Redwood Cove and Crestmont do provide some form of IPC gain. Intel 4 might also intitially
struggle to achieve the same high clock speeds as Intel 7. That's why this year Intel will
release a Raptor Lake referesh for desktop.
But not to worry, Intels tile based architecture
will find it's way into the desktop with Arrow Lake. Intel's next-gen architecture will
add new CPU cores with a focus on more IPC. Once Intel has its tile architecture and
packaging dialed in, monolithic CPUs will be a thing of the past, even in the desktop.
Intel is on the right path. For the first time in many years I'm actually truly excited
about Intels innovation and I can't wait to see how Meteor Lake and Arrow Lake turn out.
As in every video, I want to know your thoughts and opinions. What do you think about Intels
fundamental changes towards their tile based design? Do you think Meteor Lake will meet our
expectations? And what is your favorite feature? Leave a comment down below, I'm looking
forward to read what you have to say, including any crazy ideas to explain the extra SoC
cores, the huge L4 cache and the tiny I/O tile.
You know what to do if you found this video
interesting and see you in the next one!