Intel Thread Director: Lunar Lake Optimizations Explained | Talking Tech | Intel Technology

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

- Hi, and welcome to "Talking Tech." I'm your host, Alejandro Hoyos, and today we're gonna be talking about Lunar Lake's Thread Director. (upbeat music begins) With me, we have Rajshree. Hey Rajshree, how you doing today? - Hi Alex, how are you? - It's good to see you again, it's been while. - It's great to see you again, yeah. - For those who are joining us for the first time, why don't you tell us about what you do here at Intel and what you did for Lunar Lake? - Oh, absolutely. Happy to share that. So my name is Rajshree Chabukswar, I am a Fellow here in Intel's client computing group. I look at ISV software analysis, see how hardware should be going, in what direction, provide the feedback back to them on redesigning some of our technologies and things. And especially for Lunar Lake, I'm really excited about this product. And I worked with a software side of teams and then with RP code and hardware designers on where the software trends are going, and how do we need to make adjustments to what we have been doing in Thread Director so that the benefits of power and performance come out to end users. This also included work with our software OS vendors, which is Microsoft, right? So we'll talk a little bit about that. - So again, let's start from the beginning. What is Thread Director? So I want to have like a base for our viewers to know what what it is. - Absolutely, yeah. So if we go back three generations, when we launched Alder Lake, hybrid performance, hybrid architecture, we basically took two different micro architectures, P-core and E-core, and put them on the same SoC. Now, if you look at from functional capabilities perspective like instruction sets, various functionalities, et cetera, they're equivalent. They do the same ISA. There is no different ISA on both of them, instruction sets on both of them. So that's same. But just because there are two different micro architectures, this basically means that we need to have an ability to say that if I'm executing a particular sequence of instruction, is that more performant on one type of core or another type of core? We needed that. And if you look at it, operating system is not gonna get that level of information from outside. We need something in SoC to provide that. This is where Thread Director was born and created. So it provided guidance to operating system that, if you are doing this type of work, or you're running into power thermal constraints and scenarios, then use P-core for this work or use E-core for this work. We provide order efficiency list of cores on the SoC, and ordered performant list of cores on the SoC, and OS then reads that feedback. We don't move any threads behind the scene from OS, right? This is a hardware guided hint that is going to operating system, so operating system will consume it and direct work accordingly. That's where Thread Director comes in. - So what you're saying pretty much is that we are providing a hint, a suggestion for the OS, but the OS at the end of the day is the one who makes the decision of where to specify where the thread lands? - Absolutely, absolutely. So if you look at it, OS has a lot of knowledge of software that's running, such as priority, quality of service, which is QOS, foreground/background, are the threads waiting in the ready queue for a long time? All of that is in their domain, they have that information, we don't. But we have information such as what type of instructions are executing, what is a power thermal constraint on the platform, is any other IP such as NPU or GPU taking power budget and less is left for core? In that case, which ones are the most performant cores? So we take all of this information, aggregate in a format of Thread Director table and expose it to operating system. Then they consume that information, with their own goodness that they have, and then come up with a scheduling recommendation. That's where Thread Director comes in. - Okay, so now that we know about Thread Director, and we know has been through several generations on our hybrid architectures, what is new for Lunar Lake, kind of a high level? - Oh, absolutely, yeah. So if you look at it, there are four, I would say major new innovations that we have put in Lunar Lake. One is innovations in hardware itself. Like what do we do for classification? How often we do classification? Because if you look, remember from Alder Lake, Raptor Lake, Meteor Lake timeframe, we have four classes of categories in the thread director table, class 0, 1, 2, and 3. They're dependent on IPC, instructions per cycle ratio between P and E-cores, to say that which one are gonna be more performant, which one are gonna be more efficient? As we have micro architectural innovations, like E-core and P-core have evolved for Lunar Lake, we have to reclassify them. We have to adjust the classification boundaries. What makes class zero now? What makes class one, what makes class two? Et cetera. So we do all of those changes. That's new definitely in Lunar Lake. And then the granularity at which classification is happening. It happens at milliseconds of level. We change some of that granularity. We have new power management enhancements that are going in. This is yet another innovation in Lunar Lake that we have where we look at type of workload that's running on SoC and then we come up with some of the internal power management optimizations. But the decisions are communicated to us via Thread Director table. On the performance and efficiency cores. Then we have a feature with Microsoft that we created, it's called OS Containment, where low power type of scenarios or work, such as Teams, video call, video playback, et cetera, are able to run contained on our low power island. And this needed some of the inputs from Thread Director that are going, which are new in Lunar Lake, as well as Microsoft consuming that and creating something called as the containment zone. So that's new. And last but not least, we have enhanced our ability to consume the system intent into hardware so that power management can make appropriate decisions. And this is tied to our platform system software such as dynamic tuning. OEMs use that, they have something called as gears. They dial into gear zero or gear one for most performance and gear seven for most power. Those gears did not have connection to internal lists, SoC and power management algorithms. And we started that work in Meteor Lake. Now we are enhancing that as we go to Lunar Lake as well. So these are some of the changes that we have done for Lunar Lake. - Those are pretty cool, different changes. So I think what we're gonna do is just gonna dive a little bit to each one of those. And for the first one, I think you mentioned that yes, as the core, the P and E-cores have evolved throughout the different generations of different processors, you have to take a look again at how Thread Director prioritizes, which one, which one. - [Rajshree] Yes. - So can you give us a quick example of like, for example, how Thread Director did all the scheduling when it came to like Raptor Lake, Meteor Lake, and then Lunar Lake? - Absolutely, yeah. So if you look at what we did in Alder Lake and Raptor Lake, those were our performance hybrid architecture. We went hybrid, but our goal was clear, this is for MT performance, right? So we always started scheduling from most performant cores. We also did not have disaggregation. We had shared uncore, shared ring and everything. So going to E-cores didn't make sense if you're going to keep the uncore awake for that long, because uncore was shared. So we started from P-cores, then we scaled, if we needed multithreaded, to E-cores, and then we used the hyper threaded cores after that. When we moved to Meteor Lake, what we did was, in some of the configurations we used SoC Cores first. And when we started from SoC Cores, we had two SoC cores on the SoC tile. So we started from there, if the work fit there, we stayed there, from power efficiency perspective, but if the work didn't, then we moved the work to compute tile, because it was not efficient to keep both the tiles up in Meteor Lake timeframe. So there were some changes there. Now, as we go to Lunar Lake, what we have is, for most configurations, we start from the low power island that we have the Skymont Cores that we have in there, the E-cores. And then if the work exceeds the demand of that core, or we have concurrency, then we jump to P-cores. So we still have use of P-core, we get great performance out of them. Responsiveness, snappy responsiveness that we need. But E-cores are also evolving. E-cores are getting more performant. Some of the user experience conditions we are able to meet in E-core space. So we try and stay there, and then if we can't meet the demand, we move to P-core. So OS scheduling has evolved a lot along with Thread Director to kind of bring all this goodness to end users. - Right, so actually that's a great point that you made there with the E-cores. The E-cores have evolved throughout the different generations and architectures. And this one actually for Lunar Lake, which we'll cover in in another video, in another deep dive, it definitely has evolved a lot and has a lot of performance- - Absolutely, great enhancements. As well as energy efficiency, right? - Yeah, and that's why right now, Thread Director is kind of like, all right, we start here, on our E-cores, and then depending, I mean they're still very good cores. - Absolutely. - All right, so we also talk about more intelligent feedback on different algorithms and also thermal and power hits when it comes to that. Can you tell us a little bit more about that? - Absolutely, so as I said, we have four classes that Thread Director exposed to operating system, right? Class 0, 1, 2, 3. Now depending on how the IPC ratio looks between P and E-core, we put instructions into different classes. As we are adding more and more improvements and enhancements to both P and E-cores on IPC front, these deltas change. So we have made adjustments to that on what gets classified as 0, 1, 2, 3, and we expose it accordingly. We also have finer level workload classification in the SoC itself. We'll talk about it a little bit in power management section, but that's new in Lunar Lake and that has been added. And then last but not least, we have a special hint going to operating system when we run into really low power thermal scenarios. And this is to give continuity in the user experience usage of the system so that we don't see any sudden changes in the system operation. So some of these things are pretty new and they're added in Lunar Lake timeframe. - So nowadays there's a lot, obviously there's a lot of talk about AI and AI workloads. So when this workload gets assigned to the compute tile for the P-cores and the E-cores, how does the Thread Director help in that manner? - Yeah, so we can talk about two things, right? There's AI that runs on CPU, which is, it can use VNNI, our new vector neural instructions that we have. Or there is AI that runs on GPU or NPU. We are designing IPs to handle some of the AI operations. And so let's take a example of when you have AI running on GPU or NPU. Let's take that that AI is a very heavy load AI, we are running things there, they need more power frequency, et cetera. Now less is gonna be left for core, right? Because obviously SoC is gonna say, this demand is high, I'm giving more power to our IPs. Now Thread Director then can consume that feedback and say that hey, because less is left to core, if, for example in that case, I'm doing class zero instruction, E-core can be more performant than P-core. And it updates that and provides that guidance to operating system. Now if we are doing AI on CPU itself, then all these power distribution between different IPs don't come into picture. Then we look at the Thread Director feedback on classification and say that, oh, this is class two instruction. Class two instruction still provides some significant benefits or some benefits on P-cores. So let's run them on P-core. Now that Delta is collapsing, right? We have some that shows that there's 7-8% performance delta, but when you're running mix of different workloads, that 7-8% delta matters and Thread Director helps prioritize the right work for the right core. - Okay, so the other thing that you talk about, the kind of the four major improvements as I look at Lunar Lake, and that you actually touched a little bit was the always containment zone. Can you explain us what that is and how does it work? - Absolutely, yeah. So this is great feature that we are really excited to partner with Microsoft. So as the Thread Director initializes the table and everything from OS, initializes the hardware, populates the data, OS reads that and kind of looks at it, hey, which ones are the performance cores, which ones are the efficient cores, different tiles, et cetera? And then it creates zones, or called as containment zones. So the cores that are on the efficient or marked as the efficiency complex, they're put as efficiency zone. Then we have something called as hybrid zone. Now Lunar Lake is a little bit special because on compute tile we don't have any E-cores because of the power thermal and other things that we have the TDP that we are going. But if you take a look at Meteor Lake, we do have E-cores on the compute tile also. So all the compute tile will be set up as hybrid zone and you get best of both worlds on performance as well as efficiency because we have E-cores there. And then there is a zone called none zone, which means that basically all zones are taken away, you get all Cores to play with. And the great thing about this feature that Microsoft has introduced is we get an ability to customize based on the PPM parameters that they provide, where if you want to constrain work more towards efficiency zone, because we are running on battery power and we have KPIs like realistic IT workload, or Teams 3x3 video conferencing call, which runs great there, so that we make it harder to spill over there, or we are running in performance mode, and then we want to quickly spill over work, because we want that snappy response or we want to get the maximum performance. So we are able to customize all of that. So this is kind of a really great feature for us to bring the goodness of the product to the market. And we are very excited to partner with Microsoft on this. - So I think you just touched on the different examples for containment zone, depending on, I think it's Team and Office, right? - Sure. - Can you go a little bit more in detail on those? - Yeah, absolutely. So if you look at what happened in Meteor Lake, we have the low power island, the SoC die that we had, but we had only two cores on it. Take a workload like Teams, sometimes the concurrency, how many threads are active at the same time, it goes to four. Two cores are not able to contain that. Then we had limited usages on what we could run there, but then we had to quickly jump to the compute complex and run on the E-cores of that. Still, great data on that front when we had that architecture out there. Now Lunar Lake takes it a notch forward, because now we have four cores on our low power island, the four E-cores that we have. They're performant as well as efficient. So when the containment zones are set and we run something like Teams that, in some phases need four threads active at the same time, we are able to keep the work there. There is less spillage that happens to P-cores, and that's how we are able to keep that compute tile down as much as possible and still get the responsiveness, performance, the 30 FPS that Team needs, or 15 FPS depending on which version is being used, and at the same time get great power savings. - So the next thing that we wanna do is talk about the consuming system intent, which has been enhanced for Lunar Lake. So let us know a little bit more about that. - Sure, so when we created Thread Director, we always got requests from our customers that, "Hey, how can I give inputs into Thread Director?" Because we didn't have any direct. And if you think about it, it doesn't make sense for an external software to tell that, oh, run me on P-core or E-core. Because which one is, when we think about P or E-core in terms of functionality, it's performance core is an efficient core. Now as I gave you an example, if an IP, like a graphics or NPU is taking power budget and you want performance, you are likely going to be better off running on E-core, because E-core, if there is less power budget left on CPU E-core is probably going to be more performant. But an end user or a third party application doesn't know that, because they don't know power thermal constraints. So this is where we kind of told our customers that probably you don't want a direct input there, but you want to specify your hint. That my intent right now is maximize performance or my intent is maximize energy efficiency or power. So what we did was our dynamic tuning utility, which is our value added platform software that a lot of OEMs use on their mobile designs and everything. We said that, hey you use these gears anyways, why don't we pass those gears as an input to our SoC? And we started this work in Meteor Lake, this is not new in Lunar Lake, we started this work in Meteor Lake, but in Lunar Lake we have enhanced it so that we consume some of this input into using internal SoC optimization, such as workload detection, et cetera, to provide the right input to the user or to operating system using Thread Director to say that, am I meeting my intent of performance or am I meeting my intent of power? In some cases, when user wants to maximize for power, the choices of core, which are performant and efficient cores are different. When they want to meet performance, they're different. The frequency operating points are different. So we use the intent information, and SoC will take some of the decisions and expose that to OS as a guidance via Thread Director table. - Yeah, that makes complete sense. Because as maybe as an application you just want on P-cores, but the difference is they always has, and Thread Director has a global view of what everything's going on. - Exactly, yeah. - So that is like a, yeah, an extra hint for the PC, for the OEMs, for our partners that actually, hey, like you said, I want a little bit more performance, but I don't want to completely tell you what to do. - And this is where we tell our ISV partners also, don't affinitize your software, because if you think you're gonna get best performance by running only on P-cores, in some cases that may not be true. So if you create hard affinities, then hardware and software OS cannot work together to give you what you want as a best performance product. So that that's where it's going. - Oh, that's great. Okay, so kind at the end of the day, Thread Director is more of a power management story that we have there. - Both power and performance. - Yes, correct. Power and performance story. And we also have the SoC with the power and performance management. So what has changed here for Lunar Lake? - So we talked about the system intent, right? Do I want maximum power or performance? Some of that plumbing that's going in, and this is for power management, not necessarily just restricted to Thread Director, but internal power management algorithms, we consume that input, we try and detect type of workload that's running on the system. Is it like a battery life type of workload? Is it bursty? Is it you're running Cinebench, which is sustained? We try and detect, not by application, but by just load that runs on SoC. And then using this intent hint that is coming in, using the detection of the workload, we take some of the frequency decisions on cores to match that intent. And that's something that's new on Lunar Lake. So like one example I can give you, if you look at Teams, many times our customers tell us that, oh, if I run Teams on, you're familiar with the Windows Office sliders that there's a max performance, balanced, and best efficiency, right? If I run teams or something on best efficiency, I get great power. I'm still running at 30 FPS, 15 FPS, I meet my quality of service, et cetera. But my power is low. The moment I move it to balance or the performance mode, I still get same quality of service, because that's how my app is designed. But my power goes high. Why is that happening? Why can't we close that gap in here? Some of these innovations that are put in Lunar Lake are absolutely done for that purpose, that if you're running same type of workload, irrespective of where you are running it, you get similar efficiency. Are we there yet fully? No, this is our start of the journey. We are gonna have more enhancements coming in future, but this is where we think that consuming that hint, knowing the type of workload, SoC can add a lot more value. And that's really exciting for me. - Yeah, so we were talking about the different power managements that we're used to seeing that Windows provides, depending on if you're plug or no plug, or like you said, what do you want, what you were looking for. So no, that's great that now those are all integrated to provide more savings. Actually there is one more thing that I wanted to ask you is, so you start talking about the future. How do you see Thread Director in the future? What can we expect from Thread Director in the future? - Oh yeah, yeah, that's a great question. And we are on this journey. We started this in Alder Lake and Raptor Lake timeframe, and we are continuing that three, four products down the line. We have roadmap plans for future products that come out next year or the year after that, et cetera. So definitely. There are enhancements to how we do classification, how we consume some of the detection of type of workload that's running on the system, et cetera. Last but not least, that I'm really excited about is, we talk about AIPC. AI is everywhere. AI is gonna run on CPU, GPU, NPU. We are kind of looking at, how do we use something like Thread Director to figure out what's the best IP to run the work on? I'll leave you at that. I mean there are more things that are coming, but. - That's what can you tell us so far without getting us in trouble. I think that's pretty neat. Now, 'cause we're moving, like right now it's only set for the cores, but in the future it's like, okay, now we're gonna start. - We are looking in that direction, definitely. - [Alejandro] Yeah, oh that's great. One more thing. What is the definition, on Lunar Lake, of the low power island? Because it has changed, or it's probably changed from what we had previously in Meteor Lake. - So the intent of the low power island is same, provide the efficiency, the best efficiency possible. Because remember, we have put it on a separate complex. It doesn't share the uncore and everything with that. So if you're able to keep the other performance complex down, then you get a lot of savings from that uncore and other portions of it. Now, from SoC construction perspective, what was on Low Power Island or a SoC tile on Meteor Lake to Lunar Lake has changed. Some of the graphics components and other things that we have in there. And also we have kind of a localized caching that we provide on this low power island in Lunar Lake timeframe. So that is different as well. And of course the core count. We have four cores on the low power island in Lunar Lake. Meteor Lake had two on the SoC island. So things have evolved for good in terms of providing better performance also, or moderate performance as you need from the low power island. In Lunar Lake's case, it's simply amazing what E-core team has done and provided there. And also we get efficiency. So we are trying to do this fine balance that we get to use it as our multi-threaded performance core as well as labor on efficiency at the same time by uncoupling some of the uncore and other components. So that's kind of, that's different. - Oh, that's pretty interesting. And yeah, we can see all the changes definitely as now we have moved a lot of stuff around compared to what we had immediately. - Absolutely, absolutely. - Rajshree, thank you so much, appreciate it. This has been great. - Thank you for inviting me here. Always a pleasure to talk to you. Thanks. (upbeat music begins) (chirpy tune chimes)

Info

Channel: Intel Technology

Views: 2,168

Rating: undefined out of 5

Keywords:

Id: agJwHsShFd8

Channel Id: undefined

Length: 23min 59sec (1439 seconds)

Published: Thu Jun 27 2024