iOS Kernel PAC, One Year Later

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello uh welcome uh so today i'm going to be talking about uh pointer authentication in the ios kernel and in particular how pointer authentication is used to build kernel control flow integrity uh now unfortunately this time slot is not long enough for me to talk about all the things i would like to share so if you're interested i highly recommend checking out either my previous talk on pac or the project zero blog so about a year and a half ago at the beginning of 2019 i'd written a kernel exploit for ios 12 on the iphone 10s and i'd achieved a kernel read write primitive and at this point i had wanted to find a way to use kernel read write to execute arbitrary kernel code that is calling arbitrary kernel functions in the ios kernel so the idea was apple had added this new pack security feature and i wanted to find a way to bypass the control flow integrity uh that it provided so i put pointer authentication under the microscope and eventually ended up finding five different ways to bypass pointer authentication in ios 12 which i then presented in study in pac so at the time once i had found this fifth uh bypass in my original work in 2019 i eventually decided that this fifth bypass was uh indicative of kind of systematic problems with the design with the original design of pointer authentication and so i wanted to give uh you know apple some time to fix it to improve the the design a little bit and maybe i would revisit it at a later point you know give ios 13 a chance to fix some of the problems that i uncovered so uh now one year later this talk is my revisiting of pointer authentication in ios 13. so what is pointer authentication it's a security feature from arm the 8.3 and the basic idea is that here you have a kernel pointer and if you look you'll notice that the upper bits of this pointer are basically all one they're all kind of unused storing redundant information so the idea with pack is to replace those unused bits of the pointer with a cryptographic signature over the lower bits of the pointer and that what that allows us to do is basically ensure that these pointers can't be tampered with during the operation of the kernel even if you have a kernel read write primitive if an attacker tries to modify these pointers that should result in this using of this pointer causing a kernel panic so there are a number of different instructions which are provided by the arm architecture i'm not going to go into these in details but these are the instructions that you use to manipulate pointer authentication codes so how has apple used this uh security feature in order to implement control flow integrity in the ios kernel well there are a number of different uses of these pointer authentication keys i'm not going to go into all of them in detail here the basic idea is that they segment out uh different uh use cases into different architectural keys so that pointers used with one key can't be substituted for other uses for other pointers using different keys the only one we're going to be focusing on for this talk is a thread saved state so basically imagine that there you have a kernel thread and all of a sudden a timer interrupt fires when that happens you're going to jump execution from the kernel thread over to the exception handler the the the exception vector and in the exception vector you're going to spill all of the kernel threads registers out to memory and then run the interrupt handler in place and then when the interrupt handler is finished running it's going to pop all of those registers back and then resume executing the kernel thread but while those registers are spilled to memory it's possible that they might be modified by an attacker with kernel read write to try to subvert control flow integrity so it's very important that any registers which could be used to influence uh kernel control flow those registers need to be protected to ensure that they aren't modified by an attacker and that's what this function right here is used for this sign thread state function is going to take the address of the saved state blob where all the registers are being stored it's going to take the pc register cpsr and lr so pc is your program counter which is the address you were executing from cpsr is your current program status register which saved the exception level that you were previously running at i.e were you running in the kernel or you were you in user mode and lr is your return address register and assign thread state is going to entangle all of these registers into a signature a pac protected signature which then gets stored into your saved state blob and correspondingly there's also a function that verifies these signatures verify thread state for when you want to actually pop these registers back out and so this function is basically going to regenerate the signature and uh check that it hasn't been none of the registers have been tampered with uh and if it's true that there has been some sort of tampering uh then this function will panic uh now a very important place for uh this to be used is uh during exception return uh so uh as i mentioned before if you for example are running a kernel thread and a timer interrupt fires you're gonna jump to the exception vector and eventually after running your interrupt handler to completion you're going to want to resume executing the original kernel thread and that's what uh this exception return function does it's going to return from that exception that was just handled so the ending of this function basically pops all of those registers that have been stored during the exception vector it pops them back into the architectural registers which means that this is pretty much the best rop gadget that you could ever possibly hope for it gives you control over every single register so it's very important that this thing is protected so that you cannot just get it called with arbitrary arguments you want to ensure that when you do an exception return pc cpsr and lr are indeed protected they haven't not been modified during the execution of the interrupt handler and so apple has inserted this call to verify the thread state before executing the exception return instruction all right so that's an overview of what pointer authentication control flow integrity looked like in ios 12. now i'm going to briefly summarize just two of the bypasses that i reported back in 2019 so jumping all the way to bypass number four this was a very interesting pac bypass from my perspective because it was an example of something which only showed up it didn't show up if you looked at the c code it only showed up when you started looking at the assembly so here's this kernel function ipc k message clean body and it has a switch statement in it and what you'll notice is that it's loading the jump table for this switch statement in register x25 and then if you trace the flow of this register eventually it loads the jump target into register x9 and then executes this unprotected branch instruction through register x9 so in my mind i thought that this was problematic because uh if you remember from before when you have an interrupt delivered only the pc cpsr and lr registers are protected when your thread state gets spilled to memory in particular neither register x25 nor register x9 is protected so they're vulnerable to be modified uh by some concurrent kernel right primitive um by an attacker in this particular case uh even though kind of the idea was to find things that were vulnerable to interrupts i actually found that there's a function call which actually directly spills register x25 uh to the stack so there is in fact no need uh in the proof of concepts that i submitted to leverage preemption in order to take advantage of this uh uh switch statement in order to get uh arbitrary kernel code execution uh the next bypass that i want to uh talk about was uh something more of a fundamental issue that i realized existed with signing thread states so here we have a kernel function which is creating a new uh thread and what you can see is that it has a call to the sign thread state function from before now a thing that i noticed is that interrupts are enabled during the whole execution of this function and this is problematic because uh as you recall from before sign thread state only protects your pc cpsr and lr registers and yet the four arguments to assign thread state are passed in registers x0 through x3 which are not protected registers and therefore could be modified if an interrupt is delivered directly before the signed thread state operation uh but once again kind of in the end i found that uh it wasn't actually necessary to um have uh interrupts be used in order to trigger this issue because it turns out that this function would actually just read the parameters to sign directly from memory if you were to use your kernel right primitive to swap out this user state pointer to some address you control you could just directly get a signature on the state without having to mess with interrupts at all so those were two of the bypasses that i found in ios 12. and after this point i realized you know this this signing thread states is kind of fundamentally insecure i'll give apple some amount of time to uh to fix these issues and i'll revisit pack at a later point in time so let's look at what has happened in the intervening year with ios 13. so fundamentally the kind of uses of pointer authentication through the ios kernel fundamentally it's the same stuff there's no for example data pack using pac to protect additional data pointers i was hoping to see that but it didn't end up making it into ios 13. but regardless there is one change which is worth pointing out which is that there are two new protected registers so if we look at the disassembly of this sign thread state function what you'll observe is that rather than having four parameters as an ios 12 it now has six parameters and the two additional parameters are used to sign the values of architectural registers x16 and x17 now what that means is that x16 and x17 are now kind of considered interrupt safe in the sense that they can't be modified during preemption during an interrupt because any attempts to modify them will invalidate the pac signature and what this has allowed apple to do is basically harden their implementation of switch statements so here you can see the original switch statement that i reported in pack bypass number four and on the right you have the hardened version in ios 13 where you can see that the switch statement is only using uh registers x16 and x17 to conduct this indirect branch so uh theoretically this should be safe from a concurrent modification during interrupts all right so that's a high level overview of the changes to pointer authentication in ios 13. uh now let's get to the bulk of the research uh for uh this year which was looking at uh are there still ways to bypass this hardened uh pointer authentication based control flow integrity in ios 13. and the place that i wanted to start out was revisiting this this fifth bypass that i had found uh the first time around uh so if you recall the fifth bypass was this issue with a cyan thread state kind of being fundamentally insecure due to the use of interrupts uh so i was very curious how apple actually addressed this uh in ios 13. and it seems like the solution was to uh basically replace the reading of parameters from memory with just hard coding all the parameters to zero so this is certainly an improvement because it uh completely nullifies the technique that i used in my poc but the fundamental issue is still there which is that interrupts are still enabled for the duration of this function call meaning that we are still vulnerable to preemption this is a lot easier to see if you look at the assembly so imagine that we're executing the instruction sequentially and then right before we get to this sign thread state call we have a timer interrupt that fires and we immediately jump from here directly to the timer interrupt exception vector so now during this exception vector as you can see it's going to spill all of the architectural registers to memory so you can see x0 x1 x2 x3 x4 x5 all of them get spilled to memory uh and because none of those are protected by the pac signature they can all be modified by an attacker coming in racing during this interrupt to overwrite them while the interrupt handler is running and then once those registers get popped back out an attacker has control over all of the parameters to the sign thread state function so once again this is a full thread state forgery pack bypass now for a couple of uh complicated reasons it wasn't actually easy to trigger this issue right here so i started searching for other places that sign thread state was called where it might be easier and i eventually ran across this function thread state 64 to save state so this function is the implementation of the user system call thread set state which is responsible for setting the registers in a user space thread so imagine that a user space process has created a new thread and it wants to set the values of all of the registers in that thread so this is the actual implementation of thread set state in the kernel and what you can see if you look at the assembly is that it does something kind of interesting here so first off it's returning from this function with an unprotected ret instruction and second the uh return address is actually being stored to register x8 for the duration of this kind of verify thread state then sign thread state uh sequence of operations and this is problematic because x8 is not a protected register it is vulnerable to being modified if an interrupt gets delivered somewhere during the operation of this sequence so how would you actually use this uh to bypass pointer authentication uh let's walk through an example uh so for this example we're gonna have two threads uh thread a is going to be running on cpu4 so actually having it run on cpu four is important because for whatever architectural reason cpu4 seems to get a lot more interrupts than other threads uh so thread a is going to be just calling thread set state in a loop and thread b is going to be checking to see when cpu4 gets interrupted so each time thread set state gets called that's going to jump into the kernel and it's going to call machine thread set state and then that is going to call the vulnerable function and as we're kind of iterating through the instructions this function eventually at some point we're going to have an interrupt arrive and it's going to cause us to jump to the exception vector spilling all of our registers to memory including register x8 which currently holds the return address so once all these registers have been spilled to memory uh execution is going to run in the interrupt handler and while this is happening we're now going to come in on thread b and see oh look indeed cpu4 is interrupted so it's going to go ahead and overwrite the value of register x8 which used to hold the return address now it's going to be an attacker controlled value instead so at this point we now have control over the return address and when the interrupt handler finishes running and it's about to do an exception return all of those registers are now going to be popped back out of the saved state blob and back into the registers giving us control over register x8 when normal execution resumes uh this means that uh this move instruction will give us control over the return address register and then this uh rhett instruction is going to give us pc control so let's see a demo all right so uh here we have an iphone running uh it's an iphone 11 pro running ios 13.3 and uh it is i'm going to run a kernel exploit on it uh which is going to give it a kernel read write primitive and then i'm going to use that to call this kernel function uh io malloc which is going to allocate uh some memory from the kernel so this is basically demonstrating the ability to call arbitrary kernel functions uh from user space in spite of uh the presence of pointer authentication so if i just run this what i should see is it'll bypass pack and then uh immediately i get uh the ability to call i o malek and it returns this uh pointer which does which does indeed look like a kernel heap pointer so uh this demonstrates a the ability to bypass the control flow integrity mechanism using preemption with thread set state so we've demonstrated being able to bypass uh control flow integrity using this uh interrupt-based technique um but as i kind of thought about uh the issue here a little bit more uh it really boils down to uh basically anytime interrupts are enabled during this thread state signing operation uh that's just fundamentally an unsafe thing to have happen but the more i thought about it the more i realized that this doesn't just apply to this sign thread state function basically any time you have a pac signature being generated it needs to be the case that it's either has interrupts disabled or it's only using interrupt safe registers and so this let me search for other patterns of a variant of the same bypass so here you can see an example in this function b copy in where we have a unsafe pack ia operating on registers x3 and x11 i'm not going to go into this one in detail but it's fundamentally the same bypass as before just with a raw pack ia instruction rather than this thread set state all right uh so uh after looking while i was uh in the process of looking at the uh the the bypass involving thread set state and interrupts being delivered one of the things that i had to look at was how exactly are the registers spilled to memory during execution of the exception vector and i saw something kind of interesting i wasn't expecting to see this but it ended up being another pack bypass in the exception vector itself so if you look closely what you'll see is that actually the return address register x30 is being spilled to memory and then just a few instructions later it's being re-read back from memory right before this call to sign thread state and this is problematic because this basically gives the attacker a window and time to modify the return address while it's built to memory before it has been protected by generating this pac signature on the saved state so this is additionally just another pac bypass right in the exception vector itself uh the way you might actually exploit this is you'd need to find some sort of gadget which for example is going to spin while some memory location is zero and then it's going to return with this again an unprotected ret instruction so now eventually while this gadget is executing uh you'll eventually hit a preemption or an interrupt will fire and you're going to jump to the uh exception vector and so now right at the beginning of this exception vector you're going to need to race in another thread so uh this thread is going to us at the very beginning store the return address register to the saved state blob and then we need to come in on another cpu core and immediately overwrite it to change it to some attacker-controlled value instead then just a few instructions later the exception vector is going to read the now attacker-controlled return address value back in and then it's going to call sign thread state uh meaning that we have now uh controlled the return address that has been incorporated into this pac signature giving us control of where this spin while zero gadget will return to at a later point uh so this was actually another kind of interesting pack bypass i hadn't expected to find something just sitting right there in the exception vector but it also kind of got me thinking like what is the generalization of this issue here and what i really eventually settled on was that the problem was that you're reading parameters from memory before calling sign thread state doing this is kind of fundamentally an insecure thing to do since it'll always give an attacker a window in time to modify the parameters to the sign thread state function so i started looking for other places where sign thread state is called where it's going to be reading parameters from memory and to my surprise i actually found another function which does this exact same thing so there's this function switch context which is used during voluntary kernel context switches so imagine like a kernel thread is blocking on a mutex and yielding execution to another thread when this happens all of the callee saved registers are going to be spilled to your saved state and in particular that include the return address which means you have to protect it which means a call to sign thread state now when you are when switch context is calling sign thread state it actually reads in the value of pc and cpsr originally in this saved state blob from memory before the signing operation so once again this is basically a way to directly get control of the pc and cpsr registers in the saved state blob before the signature gets generated so kind of once again the switch context is uh responsible for managing thread states for voluntary kernel context switches and so because this is really doing like you know voluntary context switches between kernel threads that's the reason why these pc and cpsr registers aren't needed and kind of why this thing isn't fundamentally broken but what it enables is this really straightforward pack bypass basically you wait for some kernel thread to be active while it's running and using all of its registers and while that's the case you overwrite the pc and cpsr registers in its saved state blob and then eventually this kernel thread is going to block it's going to call switch context and switch context is going to read pc and cpsr which are again the attacker controlled overwritten values it's going to read those into memory into registers as parameters to the called assigned thread state and therefore those are going to get signed into the pac signature and then because you have a valid uh signature on an attacker controlled saved state blob you can reuse that save state for an exception return operation with arbitrary pc and cpsr so basically you'd set cpsr to be exception level one or kernel mode and pc you'd set to some attacker hijacking gadget so this is an interesting pack bypass as well i wasn't expecting to find this issue in like the context switching code which is called all the time but kind of once again i wanted to take a step back and think of what is kind of the fundamental issue here uh and pretty soon it dawned on me that there is really something much bigger going on here and that is that there is a design issue with how these thread states are managed fundamentally there are two different ways in which signed thread states are being used in the kernel so first off you have the the method that we are already very familiar with which is during an exception return so an interrupt gets delivered you run your interrupt handler and then you're calling this exception return function to resume execution of the interrupted thread so we've already seen that but then there's this other way in which signed thread states are used which is via switch context during voluntary kernel context switches and as it turns out these two uses of signed thread states have very different security requirements so for exception returns when you're doing an exception return back into kernel mode you really do care about all three of the registers pc cpsr and lr since they all have an effect on what kernel code gets executed so they all need to be protected in order to ensure control flow integrity uh for exception returns to user mode so exam imagine like a system call and you're returning from the end of a system call um you really only care about the cpsr register and the reason for that is all you really care about is that when you return from the system call you are indeed jumping back into user space someone hasn't tampered with cpsr to make you return into kernel mode instead so in this case when you're returning to user mode you really only care about cpsr and finally during a switch context for switch context we only care about the return address register pc and cpsr just don't have meaning in this context so fundamentally it turns out that since thread states can be used in these two really different ways in order to ensure integrity we really want to be sure that thread states signed for use by switch context shouldn't be usable by exception return and also vice versa we don't want thread states signed for exception return to be usable by switch context instead unfortunately as it turns out there's only one function cyan thread state which means that unless additional care is taken in the implementation thread states signed for one purpose can always be swapped out and used for the other purpose instead uh so this gives us kind of a more fundamental lens on what is happening in this bypass which is that a thread states signed by switch context for context switching and remember context switching does not care about pc and cpsr these thread states can instead be used for exception returns which do care about cpc and cpsr so this is cool but it also begs kind of the obvious question which is what about the inverse can thread states signed for use by exception return instead be used for switch context and this actually brought me to what i think is the coolest uh pack by pass of all because it was the one that was staring me in the face the whole time uh which is uh how when you swap user and kernel thread states so uh if you remember from before there is this function a system called thread set state which basically allows a process to set the registers in a user space thread and it's implemented in the kernel by this function which we also saw before in the very first pack bypass thread state 64 to saved state which is responsible for verifying the old user space registers and then re-setting the new registers and then re-signing the thread state such that it has the new registers instead with the signature intact uh so now for this operation of a thread set state uh we only really care about cpsr being restricted we want to make sure that cpsr is set so that we return to uh user mode but we really don't care about the return address register because it like if you set a kernel pointer in your return address register as long as you're executing in user mode that'll just cause a seg fault when you try to return there it's not going to violate kernel control flow integrity so thread set state is fully secure against exception return you can't use thread set state to sign thread states that are then usable by exception return to violate kernel control flow integrity unfortunately this is not at all the case if you were to reuse thread states signed via thread set state with switch context because switch context really cares about the return address but completely ignores cpsr so this gives us a really lovely kind of logical pac bypass which is a very reliable 100 deterministic uh so once again we're going to create uh two threads uh thread and thread b uh and uh we're going to take a close look at thread a which has two signed thread states one for uh user mode execution and one for kernel execution so what's going to happen is that thread a is going to call some system call which is going to block and eventually that's going to reach switch context so when switch context is called it's going to save all of the registers including the return address register x30 and it's going to sign that state and then it's going to cause the thread to block and while thread a is blocked we're now going to come in on thread b and we're going to swap out the pointer to the user state so that it now points to the kernel state instead so this is going to leave us in this situation where the uh pointer to the thread a's user state blob now points to its kernel state blob uh and at this point we can now call thread set state uh on thread a to set the registers in its kernel state now of course thread set state as i mentioned is going to restrict the value of cpsr but it does not at all restrict the value of the return address which means that if we now unblock thread a some other kernel thread is going to context switch to it and when it does so it's going to load this completely arbitrary return address into its registers it's going to verify the signature that's correct and then finally it's going to move that into the return address register and return once again giving us uh pc control so let's see a pack a demonstration of this pack bypass all right so here again we have uh the same iphone and we are going to uh run this pack bypass to demonstrate uh how to hijack control flow integrity using the just discussed uh thread set state bypass technique so uh this is actually as i said because it's a logical bypass it's really elegant and short so you can see that the implementation starts on line 19 and ends on line 92 so that's the entirety of the code for this bypass it's very short completely 100 deterministic and uh what all this is going to do is it's going to set the value of the return address register to some controlled value uh causing the phone to panic so we will run that now and you can see immediately uh the device panics and uh if we check the panic log it will indeed be uh have this uh return address uh register set to uh 4242-42 uh demonstrating that we have uh broken the control flow integrity so uh what are the things that i want to you to take away from this talk i know it's been kind of a whirlwind whirlwind of pack bypasses so uh like what is the thing that i want you to walk away understanding about all of this when i originally gave the presentation demonstrating the first five pack bypasses in 2019 one of my conclusions was that more thorough analysis could have helped in the design of pointer authentication and while my views now are a little bit more nuanced i still stand by this original conclusion uh even in ios 13 uh pac still feels quite ad hoc i really don't get a sense of what the formal underlying security model for pac is that governs all of these design decisions and as a result i even though i'm not aware of any i wouldn't be surprised if at a later point it was revealed that there are in fact other kind of fundamental design issues like the one we just discussed earlier now another thing which i think is worth pointing out is that when i initially reported uh the proof of concepts for i the ios 12 bypasses um apple was able to fix the uh specific pox that i reported but they did not address the underlying issue for that fifth bypass which is that interrupts were enabled and i find this a little bit uh disconcerting because i explicitly called out the fact that interrupts were dangerous during thread state signing operations in my initial report and even you know one year later uh despite this being something that i talked about publicly it still was uh right there as a technique that worked so that was a little bit uh disconcerting to see that it's taken so long and yet it still wasn't addressed off the bat um one other thing which i think is definitely worth pointing out is that it's very very important to look at the output of your compiler a lot of these issues with pointer authentication are not visible if you're just looking at the c code so it is crucially crucially important that you pop the kernel into a disassembler take it apart look at the register allocation to get a good understanding of what are the low-level characteristics of your code [Music] uh all that uh being said i do still think that uh pac is a good mitigation i kind of see it as a pack having two different faces so there's pac as an exploit mitigation preventing you from getting a kernel read write to begin with and then there's also pac as control flow integrity uh making sure that you can't call arbitrary kernel functions once you have gotten this read write primitive uh and so everything that i've talked about right now is addressing pac as cfi and does not in any way diminish pac as an exploit mitigation and i think that pac has been quite successful at eliminating the exploitability of certain bug classes i think that any time you can force attackers to use better bugs that's always going to be a win for the long-term security of the platform um i also think that pac is promising that there's a lot of untapped potential in it improving over time in particular with regards to protecting data pointers with pac and i'm looking forward to seeing some promising improvements in this specific regard in ios 14. uh now the last thing that i want you to take away from all of this is that as much fun as all this research was to conduct pack bypass just aren't all that important in the grand scheme of things like if i'm writing a kernel exploit and i've obtained colonel read write i really don't see pac as like the last step that needs to be achieved it's more like the cherry on top of really nice exploit um i could see you know perhaps uh pack bypasses might make an expensive up charge when you're selling an exploit for example there may be some threat actors out there who have uh legacy implants that rely on kernel function calling uh to accomplish their goals and so for these actors it may be they would prefer to buy a pack bypass rather than like re-implement this implant but kernel cfi just fundamentally is not the last line of defense keeping your device safe hardening the kernel is always going to be more important for end user security because it's going to prevent the attacker from getting read write uh to begin with and once you have read write i mean it's pretty much game over at that point so i'm excited to see kernel cfi i think it's a really cool mitigation um but just fundamentally i think it is much more important that this kernel hardening work uh is happening so i don't see these pack bypasses as all that important in the grand scheme of things so that's all i have for you today i hope you enjoyed watching and uh thank you very much all right i hope you enjoyed my presentation uh i'll try to address uh some of the questions that were uh raised in the chat right now uh so uh first off uh there was a question are these types of flaws in programming likely to continue or become reintroduced in the future um so in my opinion just based on how i've uh understood pac to have changed over time uh it doesn't seem like there is a comprehensive strategy for pac it feels somewhat more ad hoc so i wouldn't be surprised if uh these types of pack bypasses do persist into the future um that being said that isn't a uh this isn't a certainty uh i could certainly see uh pac being hardened enough to uh make these things become very very rare um but there isn't anything that i've seen yet which demonstrates to me comprehensively uh that pac is uh robustly uh mitigated for cfi um and then finally kind of the last thing that i want to leave you with just to re-drive the last point that i made in the talk home is that pack bypasses just aren't comparable to local privilege escalation to begin with if you have a bug which gets kernel read write on the system uh your your device is pretty much toast anyway so i don't see these pack bypasses as all that important uh in the scheme of things i also see one new question that came up could apple disable interrupts on the sign thread state to alleviate some of these issues um yeah so that was one of the things that is definitely going to be required in order to have a secure implementation it's not sufficient to just disable it during uh the function itself it has to also include the point at which the arguments to the function are loaded into registers um but yeah that's definitely something which needs to uh occur in order for the implementation of signed thread state to be safe so i believe that's all the time that i have thank you so much everyone for listening enjoy the rest of black hat
Info
Channel: Black Hat
Views: 3,068
Rating: undefined out of 5
Keywords:
Id: 7zCBOFxATFs
Channel Id: undefined
Length: 40min 47sec (2447 seconds)
Published: Fri Feb 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.