- Hi, and welcome to "Talking Tech." I'm your host, Alejandro Hoyos, and today we're gonna be talking about Lunar Lake's Thread Director. (upbeat music begins) With me, we have Rajshree. Hey Rajshree, how you doing today? - Hi Alex, how are you? - It's good to see you
again, it's been while. - It's great to see you again, yeah. - For those who are joining
us for the first time, why don't you tell us about
what you do here at Intel and what you did for Lunar Lake? - Oh, absolutely. Happy to share that. So my name is Rajshree Chabukswar, I am a Fellow here in Intel's
client computing group. I look at ISV software analysis, see how hardware should be
going, in what direction, provide the feedback back
to them on redesigning some of our technologies and things. And especially for Lunar Lake, I'm really excited about this product. And I worked with a software side of teams and then with RP code
and hardware designers on where the software trends are going, and how do we need to make adjustments to what we have been
doing in Thread Director so that the benefits of power and performance come out to end users. This also included work with
our software OS vendors, which is Microsoft, right? So we'll talk a little bit about that. - So again, let's start
from the beginning. What is Thread Director? So I want to have like
a base for our viewers to know what what it is. - Absolutely, yeah. So if we go back three generations, when we launched Alder
Lake, hybrid performance, hybrid architecture, we basically took two
different micro architectures, P-core and E-core, and
put them on the same SoC. Now, if you look at from
functional capabilities perspective like instruction sets, various
functionalities, et cetera, they're equivalent. They do the same ISA. There is no different ISA on both of them, instruction sets on both of them. So that's same. But just because there are two different micro architectures, this basically means that we
need to have an ability to say that if I'm executing a particular
sequence of instruction, is that more performant
on one type of core or another type of core? We needed that. And if you look at it, operating system is not gonna get that level of information from outside. We need something in SoC to provide that. This is where Thread Director
was born and created. So it provided guidance
to operating system that, if you are doing this type of work, or you're running into
power thermal constraints and scenarios, then use
P-core for this work or use E-core for this work. We provide order efficiency
list of cores on the SoC, and ordered performant
list of cores on the SoC, and OS then reads that feedback. We don't move any threads
behind the scene from OS, right? This is a hardware guided hint that is going to operating system, so operating system will consume it and direct work accordingly. That's where Thread Director comes in. - So what you're saying pretty much is that we are providing a
hint, a suggestion for the OS, but the OS at the end
of the day is the one who makes the decision of where to specify where the thread lands? - Absolutely, absolutely. So if you look at it, OS has
a lot of knowledge of software that's running, such as
priority, quality of service, which is QOS, foreground/background, are the threads waiting in the
ready queue for a long time? All of that is in their domain, they have that information, we don't. But we have information such as what type of instructions are executing, what is a power thermal
constraint on the platform, is any other IP such as NPU
or GPU taking power budget and less is left for core? In that case, which ones are
the most performant cores? So we take all of this information, aggregate in a format
of Thread Director table and expose it to operating system. Then they consume that information, with their own goodness that they have, and then come up with a
scheduling recommendation. That's where Thread Director comes in. - Okay, so now that we
know about Thread Director, and we know has been
through several generations on our hybrid architectures, what is new for Lunar Lake, kind of a high level? - Oh, absolutely, yeah. So if you look at it, there are four, I would say major new innovations that we have put in Lunar Lake. One is innovations in hardware itself. Like what do we do for classification? How often we do classification? Because if you look, remember
from Alder Lake, Raptor Lake, Meteor Lake timeframe, we have
four classes of categories in the thread director table, class 0, 1, 2, and 3. They're dependent on IPC,
instructions per cycle ratio between P and E-cores, to say that which one are
gonna be more performant, which one are gonna be more efficient? As we have micro
architectural innovations, like E-core and P-core have
evolved for Lunar Lake, we have to reclassify them. We have to adjust the
classification boundaries. What makes class zero now? What makes class one,
what makes class two? Et cetera. So we do all of those changes. That's new definitely in Lunar Lake. And then the granularity at which classification is happening. It happens at milliseconds of level. We change some of that granularity. We have new power management
enhancements that are going in. This is yet another innovation
in Lunar Lake that we have where we look at type of
workload that's running on SoC and then we come up with
some of the internal power management optimizations. But the decisions are communicated to us via Thread Director table. On the performance and efficiency cores. Then we have a feature with
Microsoft that we created, it's called OS Containment, where low power type of scenarios or work, such as Teams, video call,
video playback, et cetera, are able to run contained
on our low power island. And this needed some of the
inputs from Thread Director that are going, which
are new in Lunar Lake, as well as Microsoft consuming that and creating something called
as the containment zone. So that's new. And last but not least, we
have enhanced our ability to consume the system intent into hardware so that power management can
make appropriate decisions. And this is tied to our
platform system software such as dynamic tuning. OEMs use that, they have
something called as gears. They dial into gear zero or
gear one for most performance and gear seven for most power. Those gears did not have
connection to internal lists, SoC and power management algorithms. And we started that work in Meteor Lake. Now we are enhancing that as
we go to Lunar Lake as well. So these are some of the changes that we have done for Lunar Lake. - Those are pretty
cool, different changes. So I think what we're
gonna do is just gonna dive a little bit to each one of those. And for the first one, I
think you mentioned that yes, as the core, the P and
E-cores have evolved throughout the different generations of different processors, you
have to take a look again at how Thread Director
prioritizes, which one, which one. - [Rajshree] Yes. - So can you give us a quick
example of like, for example, how Thread Director did all the scheduling when it came to like
Raptor Lake, Meteor Lake, and then Lunar Lake? - Absolutely, yeah. So if you look at what we did
in Alder Lake and Raptor Lake, those were our performance
hybrid architecture. We went hybrid, but our goal was clear, this is for MT performance, right? So we always started scheduling
from most performant cores. We also did not have disaggregation. We had shared uncore,
shared ring and everything. So going to E-cores didn't make sense if you're going to keep the
uncore awake for that long, because uncore was shared. So we started from
P-cores, then we scaled, if we needed multithreaded, to E-cores, and then we used the hyper
threaded cores after that. When we moved to Meteor
Lake, what we did was, in some of the configurations
we used SoC Cores first. And when we started from SoC Cores, we had two SoC cores on the SoC tile. So we started from there,
if the work fit there, we stayed there, from power
efficiency perspective, but if the work didn't,
then we moved the work to compute tile, because
it was not efficient to keep both the tiles up
in Meteor Lake timeframe. So there were some changes there. Now, as we go to Lunar
Lake, what we have is, for most configurations, we
start from the low power island that we have the Skymont
Cores that we have in there, the E-cores. And then if the work exceeds
the demand of that core, or we have concurrency,
then we jump to P-cores. So we still have use of P-core, we get great performance out of them. Responsiveness, snappy
responsiveness that we need. But E-cores are also evolving. E-cores are getting more performant. Some of the user experience conditions we are able to meet in E-core space. So we try and stay there, and then if we can't meet the
demand, we move to P-core. So OS scheduling has evolved a lot along with Thread Director to kind of bring all this
goodness to end users. - Right, so actually that's a great point that you made there with the E-cores. The E-cores have evolved throughout the different
generations and architectures. And this one actually for Lunar Lake, which we'll cover in in another
video, in another deep dive, it definitely has evolved a lot and has a lot of performance- - Absolutely, great enhancements. As well as energy efficiency, right? - Yeah, and that's why right now, Thread Director is kind of
like, all right, we start here, on our E-cores, and then depending, I mean they're still very good cores. - Absolutely. - All right, so we also talk
about more intelligent feedback on different algorithms and also thermal and power
hits when it comes to that. Can you tell us a little
bit more about that? - Absolutely, so as I
said, we have four classes that Thread Director exposed
to operating system, right? Class 0, 1, 2, 3. Now depending on how the IPC ratio looks between P and E-core, we put instructions into different classes. As we are adding more
and more improvements and enhancements to both P
and E-cores on IPC front, these deltas change. So we have made adjustments to that on what gets classified as 0, 1, 2, 3, and we expose it accordingly. We also have finer level
workload classification in the SoC itself. We'll talk about it a little bit in power management section,
but that's new in Lunar Lake and that has been added. And then last but not least,
we have a special hint going to operating system
when we run into really low power thermal scenarios. And this is to give continuity
in the user experience usage of the system so that we don't see any sudden changes in
the system operation. So some of these things are pretty new and they're added in Lunar Lake timeframe. - So nowadays there's a lot, obviously there's a lot of
talk about AI and AI workloads. So when this workload gets
assigned to the compute tile for the P-cores and the E-cores, how does the Thread Director
help in that manner? - Yeah, so we can talk
about two things, right? There's AI that runs on CPU, which is, it can use VNNI, our new vector neural
instructions that we have. Or there is AI that runs on GPU or NPU. We are designing IPs to handle
some of the AI operations. And so let's take a example of when you have AI running on GPU or NPU. Let's take that that AI
is a very heavy load AI, we are running things there, they need more power frequency, et cetera. Now less is gonna be left for core, right? Because obviously SoC is gonna say, this demand is high, I'm
giving more power to our IPs. Now Thread Director then
can consume that feedback and say that hey, because
less is left to core, if, for example in that case, I'm doing class zero instruction, E-core can be more performant than P-core. And it updates that and
provides that guidance to operating system. Now if we are doing AI on CPU itself, then all these power distribution between different IPs
don't come into picture. Then we look at the
Thread Director feedback on classification and say that, oh, this is class two instruction. Class two instruction still provides some significant benefits
or some benefits on P-cores. So let's run them on P-core. Now that Delta is collapsing, right? We have some that shows that
there's 7-8% performance delta, but when you're running
mix of different workloads, that 7-8% delta matters and Thread Director helps prioritize the right work for the right core. - Okay, so the other
thing that you talk about, the kind of the four major improvements as I look at Lunar Lake, and that you actually touched a little bit was the always containment zone. Can you explain us what that
is and how does it work? - Absolutely, yeah. So this is great feature that we are really excited
to partner with Microsoft. So as the Thread Director
initializes the table and everything from OS,
initializes the hardware, populates the data, OS reads
that and kind of looks at it, hey, which ones are the performance cores, which ones are the efficient cores, different tiles, et cetera? And then it creates zones, or
called as containment zones. So the cores that are on the efficient or marked as the efficiency complex, they're put as efficiency zone. Then we have something
called as hybrid zone. Now Lunar Lake is a little bit special because on compute tile
we don't have any E-cores because of the power thermal
and other things that we have the TDP that we are going. But if you take a look at Meteor Lake, we do have E-cores on
the compute tile also. So all the compute tile will
be set up as hybrid zone and you get best of both
worlds on performance as well as efficiency because
we have E-cores there. And then there is a zone called none zone, which means that basically
all zones are taken away, you get all Cores to play with. And the great thing about this feature that Microsoft has introduced
is we get an ability to customize based on the PPM
parameters that they provide, where if you want to constrain work more towards efficiency zone, because we are running on battery power and we have KPIs like
realistic IT workload, or Teams 3x3 video conferencing call, which runs great there, so that we make it harder
to spill over there, or we are running in performance mode, and then we want to
quickly spill over work, because we want that snappy response or we want to get the maximum performance. So we are able to customize all of that. So this is kind of a
really great feature for us to bring the goodness of
the product to the market. And we are very excited to
partner with Microsoft on this. - So I think you just touched
on the different examples for containment zone, depending on, I think it's Team and Office, right? - Sure. - Can you go a little bit
more in detail on those? - Yeah, absolutely. So if you look at what
happened in Meteor Lake, we have the low power island,
the SoC die that we had, but we had only two cores on it. Take a workload like Teams,
sometimes the concurrency, how many threads are
active at the same time, it goes to four. Two cores are not able to contain that. Then we had limited usages
on what we could run there, but then we had to quickly
jump to the compute complex and run on the E-cores of that. Still, great data on that front when we had that architecture out there. Now Lunar Lake takes it a notch forward, because now we have four
cores on our low power island, the four E-cores that we have. They're performant as well as efficient. So when the containment zones are set and we run something like Teams that, in some phases need four
threads active at the same time, we are able to keep the work there. There is less spillage
that happens to P-cores, and that's how we are able to
keep that compute tile down as much as possible and
still get the responsiveness, performance, the 30 FPS that Team needs, or 15 FPS depending on
which version is being used, and at the same time
get great power savings. - So the next thing that we wanna do is talk about the consuming system intent, which has been enhanced for Lunar Lake. So let us know a little
bit more about that. - Sure, so when we
created Thread Director, we always got requests
from our customers that, "Hey, how can I give inputs
into Thread Director?" Because we didn't have any direct. And if you think about
it, it doesn't make sense for an external software to tell that, oh, run me on P-core or E-core. Because which one is, when
we think about P or E-core in terms of functionality, it's performance core
is an efficient core. Now as I gave you an example, if an IP, like a graphics or NPU
is taking power budget and you want performance, you are likely going to be
better off running on E-core, because E-core, if there is
less power budget left on CPU E-core is probably going
to be more performant. But an end user or a
third party application doesn't know that, because they don't know
power thermal constraints. So this is where we kind
of told our customers that probably you don't
want a direct input there, but you want to specify your hint. That my intent right now
is maximize performance or my intent is maximize
energy efficiency or power. So what we did was our
dynamic tuning utility, which is our value added platform software that a lot of OEMs use
on their mobile designs and everything. We said that, hey you
use these gears anyways, why don't we pass those
gears as an input to our SoC? And we started this work in Meteor Lake, this is not new in Lunar Lake, we started this work in Meteor Lake, but in Lunar Lake we have enhanced it so that we consume some of this input into using internal SoC optimization, such as workload detection, et cetera, to provide the right input to the user or to operating system using
Thread Director to say that, am I meeting my intent of performance or am I meeting my intent of power? In some cases, when user
wants to maximize for power, the choices of core, which are performant and efficient cores are different. When they want to meet
performance, they're different. The frequency operating
points are different. So we use the intent information, and SoC will take some of the decisions and expose that to OS as a guidance via Thread Director table. - Yeah, that makes complete sense. Because as maybe as an application
you just want on P-cores, but the difference is they always has, and Thread Director has a global view of what everything's going on. - Exactly, yeah. - So that is like a, yeah, an extra hint for the PC, for
the OEMs, for our partners that actually, hey, like you said, I want a little bit more performance, but I don't want to completely
tell you what to do. - And this is where we
tell our ISV partners also, don't affinitize your software, because if you think you're
gonna get best performance by running only on P-cores, in some cases that may not be true. So if you create hard affinities, then hardware and software
OS cannot work together to give you what you want as
a best performance product. So that that's where it's going. - Oh, that's great. Okay, so
kind at the end of the day, Thread Director is more of
a power management story that we have there. - Both power and performance. - Yes, correct. Power and performance story. And we also have the SoC with the power and performance management. So what has changed here for Lunar Lake? - So we talked about the
system intent, right? Do I want maximum power or performance? Some of that plumbing that's going in, and this is for power management, not necessarily just
restricted to Thread Director, but internal power management algorithms, we consume that input, we try and detect type of workload that's
running on the system. Is it like a battery
life type of workload? Is it bursty? Is it you're running
Cinebench, which is sustained? We try and detect, not by application, but by just load that runs on SoC. And then using this intent
hint that is coming in, using the detection of the workload, we take some of the
frequency decisions on cores to match that intent. And that's something
that's new on Lunar Lake. So like one example I can give
you, if you look at Teams, many times our customers tell us that, oh, if I run Teams on, you're familiar with the
Windows Office sliders that there's a max performance, balanced, and best efficiency, right? If I run teams or something
on best efficiency, I get great power. I'm still running at 30 FPS, 15 FPS, I meet my quality of service, et cetera. But my power is low. The moment I move it to balance
or the performance mode, I still get same quality of service, because that's how my app is designed. But my power goes high. Why is that happening? Why can't we close that gap in here? Some of these innovations
that are put in Lunar Lake are absolutely done for that purpose, that if you're running
same type of workload, irrespective of where you are running it, you get similar efficiency. Are we there yet fully? No, this is our start of the journey. We are gonna have more
enhancements coming in future, but this is where we think
that consuming that hint, knowing the type of workload,
SoC can add a lot more value. And that's really exciting for me. - Yeah, so we were talking about the different power managements
that we're used to seeing that Windows provides,
depending on if you're plug or no plug, or like you said, what do you want, what
you were looking for. So no, that's great that
now those are all integrated to provide more savings. Actually there is one more thing that I wanted to ask you is, so you start talking about the future. How do you see Thread
Director in the future? What can we expect from
Thread Director in the future? - Oh yeah, yeah, that's a great question. And we are on this journey. We started this in Alder Lake
and Raptor Lake timeframe, and we are continuing that three, four products down the line. We have roadmap plans for
future products that come out next year or the year
after that, et cetera. So definitely. There are enhancements to
how we do classification, how we consume some of the
detection of type of workload that's running on the system, et cetera. Last but not least, that
I'm really excited about is, we talk about AIPC. AI is everywhere. AI is gonna run on CPU, GPU, NPU. We are kind of looking at, how do we use something
like Thread Director to figure out what's the
best IP to run the work on? I'll leave you at that. I mean there are more
things that are coming, but. - That's what can you tell us so far without getting us in trouble. I think that's pretty neat. Now, 'cause we're moving,
like right now it's only set for the cores, but in
the future it's like, okay, now we're gonna start. - We are looking in that
direction, definitely. - [Alejandro] Yeah, oh
that's great. One more thing. What is the definition, on Lunar Lake, of the low power island? Because it has changed,
or it's probably changed from what we had
previously in Meteor Lake. - So the intent of the
low power island is same, provide the efficiency, the
best efficiency possible. Because remember, we have
put it on a separate complex. It doesn't share the uncore
and everything with that. So if you're able to keep the other performance complex down, then you get a lot of
savings from that uncore and other portions of it. Now, from SoC construction perspective, what was on Low Power Island
or a SoC tile on Meteor Lake to Lunar Lake has changed. Some of the graphics components and other things that we have in there. And also we have kind
of a localized caching that we provide on this low power island in Lunar Lake timeframe. So that is different as well. And of course the core count. We have four cores on the low
power island in Lunar Lake. Meteor Lake had two on the SoC island. So things have evolved for good in terms of providing
better performance also, or moderate performance as you need from the low power island. In Lunar Lake's case, it's simply amazing what E-core team has
done and provided there. And also we get efficiency. So we are trying to do this
fine balance that we get to use it as our
multi-threaded performance core as well as labor on
efficiency at the same time by uncoupling some of the
uncore and other components. So that's kind of, that's different. - Oh, that's pretty interesting. And yeah, we can see all
the changes definitely as now we have moved a lot of stuff around compared to what we had immediately. - Absolutely, absolutely. - Rajshree, thank you
so much, appreciate it. This has been great. - Thank you for inviting me here. Always a pleasure to talk to you. Thanks. (upbeat music begins) (chirpy tune chimes)