Making C Less Dangerous - Kees Cook, Google

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi my name is case cook this is a little bit about a specific area of the kernel cell production project sort of looking at the C language generally and why it causes us so many problems and what sorts of things we can do to improve that if you want to follow along any of the links or read some of the very small text I have in here you can download the slides there or once I get that linked from the Linux Foundation website as well so this is specifically about the Linux kernel obviously and the agenda here is I want to give sort of a quick background on KS pp and talking about C as a language and how it's really just a fancy assembler and then looking towards to some specific issues that we can try to solve hopefully or at least minimize so the kernel self protection project was started a couple years ago to sort of focus on bringing kernel protections into the kernel and we've had a lot over the years of protections the kernel supports for defending user space from user space but there hadn't been as much focus in the upstream kernel on protecting the kernel from user space and this is a pretty wide project we've got about 12 organizations with maybe 10 individuals working on a bunch of stuff this is an upstream project it's not a it's not a fork or anything so it sort of follows the upstream development models and is slow and steady is way I like to think about it so this brings us to one of the main problems we've had is dealing with C gets treated mostly like machine code it's trying to be an abstract adverse of this the kernel does this because it's trying to be as fast and as small as possible and there's a lot of things that the colonel does that there is no capi for you know setting up page tables switching the 64-bit mode those are machine specific issues they're not about the C language at a higher level so that's why you know that's as close as we can get to machine code without all the pain there but this comes with some really fun things with the language itself a lot of undefined undefined behaviors in the C language which comes sort of from its history and there's some problems associated with having a weak standard library that have old problems so some quick examples that I'll get into more detail in a bit are you know the the idea of an uninitialized variable from the C language perspective we just say we don't know what's happening it's fine we'll throw a warning maybe but on an actual machine code that obviously does have a value it's whatever was in memory before and then in C we start to forget that it's supposed to be a language and we think about it as machine code again and we can just call function pointers without any regard to what the actual type of the function is because when it boils down to it you're running the machine code the machine code says well we're just jumping to a location in memory and running but that's not actually what you were trying to say with the C language so there there isn't as tight a binding between those things and then you also get a you know things out of the API like mem copy where you say well I have an address and I'm just gonna copy as much as I want to it but that doesn't really help anyone using that library and normally you see people who are trying to build up a series of copies you know they'll have a size and they're tracking how much size they've copied but they're not really paying attention to how much is left in the destination so why don't we have better api's in that regard of course this is an tiny fraction of all the other undefined behaviors in C there was a a great blog post recently on this with undefined behavior anything is possible and I bought the shirt because you have to have the shirt but this is a huge topic and I'm trying to sort of focus in on specific areas where we can try to improve the kernel itself or at least deal with the problems that have been created so one of those is variable-length arrays when you define a local variable it ends up on the stack and in C you can just sort of say well I want the size of this to be however large based on an input variable for from from the function and this creates problems because the stack is a fixed size and you can have a linear overflow that just runs right past the end of the stack and writes over things next to it but this is a valid stack frame so things like the stack protection the stack canary the stack cookie is actually not going to stop this because see when mapping this down into assembly basically says oh this is fine it's just a huge stack but of course we've gone way past it and then there are cases where even if you had a guard page which is now possible you could potentially still just jump past the guard guard page and create problems as well and again as far as the C language was concerned it was perfectly happy with this the nice thing is this is easy to find with - W turn on the VLA warning - w VLA so from a security perspective the main thing I'm looking at is that they're they're bad but it turns out they are also slow when we went to remove these one of the driver authors actually did a did a micro benchmark of the code because it was I think a checksum or something code where he could actually do that and he had all the instrumentation and he saw that a fixed size stack array actually gave him a 13% speed-up it's just like great I can now justify the security improvement with improved speed but I had to know why why is it so so bad so if you can read this the having a fixed size array generates this tiny chunk of assembly and having a variable sized array did all of that I didn't I haven't bothered to read all of that but it seems impossibly bizarre that it's that bad but apparently it is so just don't don't use relays another another case is the switch fall through so C specifies break to stop a switch case but there isn't anything to say please move on to the next one it is simply the absence of a statement that says move on to the other one but an absence of a statement could also mean you forgot to put a break so this weakness in C actually has its own common weakness enumeration item the omitted break statement and switch so you know is this actually a bug we don't know we have to look at every single case so static analyzers have had this problem for a while so they flag them and to white list cases where you do want a fall through actually with static analyzers they start adding a comment that said fall through so the compiler is following the the static analyzers have added parsing of a comment as a statement to indicate I do want to fall through here but that's sort of where we are but adding this to your compiler now you can say I want implicit you know warn on implicit fall through and if you don't find the comment statement it will yell at you so we've been going through the kernel adding these looking at every every place where it's missing and trying to decide was this an accident and we've had a lot of bugs found this way so another one is back to stack is you're getting rid of the uninitialized variable case right now if you try to use with with most compilers if you try to use a variable that you've declared locally that you didn't initialize first you get a warning that's is trying to use uninitialized variable however this warning gets silenced if you pass a variable into a function by by reference and suddenly you have no idea the compiler just forgets like well I assume since you pass it into a function that now it's initialized you did it but of course there's no reason to believe that it actually got initialized so there are some plugins in the kernel for doing various versions of this one for force initializing any structure that has underscore underscore user pointers in it this was expanded to all things that are passed by reference and then there are still some leftover cases especially with structure padding where you still want to initialize them for sure and in some discussions we actually encountered leanness praising the idea of always initializing all the variables all the time so that's we're trying to work towards there was a patch for GCC to do this it's not upstream there's a patch and clang to do this it is also not upstream we're looking at building a function or a plug-in to perform them to do this as well but this sort of gets rid of the the C problem of well what's on what's in the memory it's like well we just declare everything is zero initialized no matter what you can just depend on that as a feature of the new Linux kernel version of the C language and that makes things more smore easy to think about one interesting side effect that I thought was adorable as part of the moving from C to machine language is you got this this error by once you force initialize your variables I got a warning out of GCC that said well you have unrunnable code and I went looking and it was because you had initializers before the first case statement in a switch which never gets executed because nothing will actually ever go there because variable declaration which in in assembly is making room for it on the stack and initialization requires running something to write stuff to that area of the stack so by forcing the initialization in the area of the switch statement that I didn't even know you could put declarations it would never run it and it would never be initialized so I went through and lifted out all of these places and where this occurred there weren't a lot but this was a yet another surprising side effect of see I just didn't know another case is dealing with integer overflows GCC has support for checking for signed integer overflows this is one of the many things that gets unable to with config you be seen right now so the good news is it's very very fast because it's just checking for an existing hardware flag and if you want to and I'm like I couldn't actually measure the difference I need to do better micro benchmarks to really figure out how many cycles difference it is but it's I think gonna be very very small and if you just want the kernel to abort immediately it grows the kernel image by 0.1 percent which is good but the downside is if you want warnings about this it grows the kernel image by 6 percent because there are thousands and thousands of integer calculations being made as you might imagine in the meantime there are we can do explicit single operation tests where we say I want to know for sure in this code flow whether or not I overflowed so we now have a set of arithmetic overflow detection helpers in the kernel clang can do unsigned an integer overflow detection specifically signed overflow is considered an undefined behavior for a variety of reasons but unsigned is considered well defined except that it is usually unexpected there are however a lot of cases in the kernel where we do intentionally perform on the sign to overflow so we'd have to go through and mark that and deal with it but this is one difference in implementations between GCC and clang and for do clang gives you quite a variety of ways to handle it sort of showed them in this slide here where you can have it abort you can have it warn but continue you can have it warn and give up you can do a bunch of different things so plumbing that into the kernel would be nice and then generally bounds checking this is this remains a big area of vulnerabilities in the kernel is just having string copy or mem copy wander past the end of an allocation and just keep writing into whatever memory is next in the kernel we have the the hardened user copy which checks the the places where we're explicitly copying to and from user space in the copy to from user checking and this is under 1% performance hit I tried to extend this to the string family and the mem family functions and they're about a two percent performance hit each I still need to look at this a little bit more so you know pre meltdown this was an totally unacceptable performance hit for security post meltdown it's under five percent so maybe I have a chance to land this stuff to we'll see but it would be nice because we there we keep getting vulnerabilities where the mem copy is just wrong and we could have easily detected it we got everything we knew we know how big the allocation is we know how big everything around it is anyway and this moves on des can we just get better api's and get rid of old bad api's that came came from the standard c library and this is this tends to be also quite a political problem because in trying to bring developers into the Linux kernel community don't want to have to teach them an entirely new C API however we're already doing that because we said well string string copy was no good let's use string n copy except that string n copy doesn't always null terminate and if it's too long and if it's too short it just null pads the entire allocation that you did specify so that's not good so we made string L cop but that reads the source string beyond the max length also so how about string s copy seems okay so far so maybe we can improve my macapá to that that would be great so yes the the point was this is slow but there is hopefully some some future world where we're gonna have hardware supported memory allocation tagging in hardware so the example here is that your your allocator in this case k malloc you say i 128 bytes and the allocator says okay this blue area is 128 bytes i've given a tag 5 that tag lives in the high byte of the pointer value that comes back from the allocator and so you can say great I'm gonna write at an offset from that pointer and the hardware is looking at that and says ok you have the right tag for your offset it's in within the range 120 28 bytes we're good and then later on you say well I want an offset slightly beyond that and says well the memory region passed that has a different tag you're gonna fail because you're outside of what you were expecting that pointer to actually point to so stuff like this exists already in spark with their with their application data integrity extension in arm this is coming and supposedly we might have this on Intel at some point moves on to CF I control flow integrity so with with decent control over having memory not be writable and executable attackers have moved on to trying to use the existing code that's in the kernel to take advantage of indirect calls where you have saved a function pointer somewhere and you eventually turn around and actually run it in this case it's you know for the forward edge calling out you've got a function pointer saved in the heap you go fetch it and you just call it and then on return you return from somewhere to your to the to where you came from and that's effectively a an indirect call off the what was stored on the stack but this is all implementation details in see you specified I wanted to make this function call and then come back from it and without CFI it's just kind of like well I can overload I can change what what I'm calling I can just tell see don't pay no attention to this and we'll go ahead and call you know the the call one versus call two which have completely different function prototypes or violating what we'd asked this function to be but again wind map down into machine code we're like sure it's a function pointer whatever just go there and run with whatever happens to be there and we don't care so doing forward edge checking like with clangs CFI this will actually blow up because it tries to execute and it says but I was expecting to call this type of function but I arrived at a different type of function I'm gonna freak out now this isn't perfect it's based on the function prototype pattern so right now in the kernel there's still plenty of functions that return on the signed long and take as one argument and unsigned long so that's not great but for a lot of other routines it does narrow the window of an arrow the attack surface for indirect calls of course this is forward edge for backward edge and return there's things like like splitting up stacks where you say okay we're going to push all of our weird variables all of our weird locals buffers and by reference variables into this unsafe area because we don't know what's gonna happen the bad stuff might happen but things we can prove are safe to use register spills and safe accesses and the return address will split into a different stack this is one approach to solving that because it makes it so that if the attacker doesn't know where the safe stack is it's harder to deal with similar to this but with less logic is to do a shadow call stack which is only thing that you put on the other stack is the return address and it's harder to get at this one because you can keep a dedicated register for this entire stack sort of how there's you know the regular stack register for the unsafe and then the another separate register effectively for the call stack and this works in clang right now so there is hardware support for dealing with backward edge CFI intel CET deals with one aspect of this which is you're doing it in software leaves that second stack writable which means it's still if it can be found by an attacker written to they've taken over your return path with CET this is effectively a read-only area of memory that is writable only during the call and return instructions that you know do this implicit read and write to that area and then a different version is the pointer authentication and an arm v 8.3 a that adds new instructions to effectively add sort of an encrypted tag to what you're writing out to the to the stack and then when you pull it back you can Rivera fie it and that the difference on that is pretty simple it says you enter a function you sign where you're coming from and then when you're about to leave you double check that what you have is what you wanted so where are we now with V la's it's been about four releases of the kernel we went through a little bit over a hundred of these which were each a little bit different so it's taken quite a bit of time to get rid of these but we're down to only a handful and crypto remaining I'm hoping that that will be completely finished by the 4.20 or whatever is next after 4.19 the explicit switch case fall through I know that Gustavo had been sending patches slowly over quite a while and I thought well how many has he sent I saw that he had sent 745 patches and like well I wonder how many we had started with so um we have only I'm sorry he hadn't said he'd sent more than 700 and he's like over a thousand so now we're only down to about 700 of these remaining but again each one of these you have to look at it and decide what did the author mean is there a comment here to describe whether or not the fall through was intentional or not but once we get through those that entire class can go away as well the always initialized automatic variables we have a lot of this is available through the plugins but we don't have complete coverage it's not quite the way we think we want it in the kernel yet we'll see it would be nice to get more complete support from the compilers on this so up streaming those existing patches would be great on overflow detection it would be nice to have GCC grow the unsigned overflow protection but this does work right now we just need to specifically tear it out of config ub san and we should have this it would be nice balance checking mainly it's crying about performance and waiting for hardware that's ok and see if I this actually works right now in Android there's a talk later on this it's pretty impressive and again waiting for hardware so sort of how do we get there you know those that's where we are how do we get there is like trying to get people involved we have a lot of cultural challenges and getting stings into upstream there's a lot of conservatism in not wanting to make changes to code and accepting responsibility of the overhead and sort of sacrificing ones time to make that happen obviously the technical piece there is a lot of complexity here but um we can solve that and of course just getting people to help with doing it reviewing it testing it and in cases where you're not running the latest kernel actually back porting it to your releases since traditionally the LTS kernels only have bug fixes they haven't normally back ported features and the reason for that as you could see with a lot of the patches you know the hundreds of patches to fix VLA is and and stack switch statements and other things it's actually a huge number of patches so back porting that is somewhat prohibitive so that's it you can reach me at these places there's the link to the sub protection project and these slides again I got I caught us back up on time any questions or other things Casey yes oh here's a microphone for you I've been doing C programming since 1977 and yes always know the comment was always no break uh-huh where did they come up with this fall through it was a static analyzers so why why that comment why not the the one that's been in use for 50 years flatters I probably because the static analyzer folks hadn't been writing C since 1977 well okay I guess it was yeah it was just a it was just I mean the reading the in the feature requests for here's the support for parsing a comment as a C statement there was great anger in the fact that the that the compiler has got painted into a corner because the static analyzers like well this is what we're doing this is we're checking here are all of the giant numbers of code bases that we've instrumented now with an you know we've actually updated all the all the code to say to have fall through as a comment the compiler people were just kind of like but we could give you a statements too late now it's a comment anything else back there you wanna microphone so once upon a time is a an effort made to try and enforce things with the string api's for them to be more secure say no more stir copy fq stern copy and it resulted in some what I would call stern copy anti-patterns where people were just doing things like calling sterling on fixed sized strings or other things like that so what's the plan to try and make sure we don't turn these supposedly more secure into api's into perhaps still insecure api's I I think it's mostly us designing it right and actually getting people who have strong opinions about this and looking at the past anti-pattern isn't saying what do we need to have like what is actually helpful API for the author that provides us the defensive characteristics we want without getting in their way and like in the past we've just kept going you know doing tiny band-aid fixes like well a string and copy we're good just ship it and I I think the other problem we've had is doing evolution of api's and Linux kernel we've had a long history of saying here is a new API I will use it in this one place and it's everyone else's problem to fix all the old api's and I have tried in some of the some of the conversions we've made to look at past api's and remove them so first you know move all old api's up to you know move an ancient api up to the old api and then move all of the old api is the bad api and then move the bad api to the good api in the process of wiping out the availability of all the others and i think that's part of the the cost associated with this is actually getting rid of the old API s and not allowing them to exist anywhere and get misused in the future [Laughter] just a further bit to Loras come on when we've found anti-patterns in the past we've added like coccynelle scripts and katsu check scripts do we perhaps need to proactively figure out we're adding this API here is a way people might misuse it at add checks for those kind of things in advance before we start seeing them that would be nice we have some sets of the coccynelle scripts already in the kernel but they are effectively disjoint from a regular compiled in in some in some places where the kernels get built you know for vendors they will actually do two staged compiles what they'll say first we're going to do the static checker compile which includes coccynelle and some other things and if that those tend to be so noisy that really is if that does not produce a difference in the output from before and after then continue and do the build for real but that's actually been something that's bothered me for a while is we don't include that in the common build so there's no there's no warning that something that bad has happened which is why I've sort of pushed to just eliminate the API from the kernel because if it's gone it won't even build but we're forced into some cases where we span multiple releases with api's we have to continue to support and then people get distracted by other things so it's just a matter of doing it as completely as we can Oh the question was is there any way we can mark api's as obsolete so we did have underscore underscore deprecated but leanness deprecated it so yeah well I mean Lina says argument was effectively the same which was if you're removing an API remove it don't make it someone else's problem which is agonizing right that's yeah I have done this and yeah so there isn't a particularly good solution here I don't know without having some form of developmental mandate where someone can say I am removing this API it is your problem to fix or your code gets left out I don't know like there isn't we could add things to check patch that's happened in the past I mean there's sort of a potpourri of various mechanisms that people have tried so yeah getting just killing the API appears to be best but it is extremely time consuming yeah and however I've kept this as a hey we'd like to get rid of an API this is a bit tedious but it's actually usually pretty mechanical and that works as decent like Colonel Newby types of stuff so if there's a list of here we'd like to get rid of this timer interface or this string interface things like that keeping that list in one place is another idea and now I'm over time anything else Oh so they're also mention of hardware support for bounce checking on x86 and I am pretty sure that there are already instructions for that that there is an instruction called bound I think in x86 assembly so what's the problem with it because I've heard about it I've read about it but I'm pretty sure nobody's using it not even compilers minor saying is bound is separate I looked I came across that a while back but it doesn't provide the protections we want because even even if we have that it's it requires an explicit check you would say am i inbounds and then you'd do it but that still needs to that instruction still needs to understand what the bounds were and that information may be totally separate from from the execution path so having it having the support in the MMU where it's actually working like when it's actually trying to dereference pointers and do other things attaching that at the hardware level it will actually get us what we want otherwise I you know we can just do it in software and maybe we get those instructions getting used so plumbing plumbing access to the to the allocation is the slow way and software to do it but in hardware if we can just associate it with the memory region then we get it fast for free anyway I think that's it come ask me questions if you want to in person or email me thanks [Applause]

Info

Channel: The Linux Foundation

Views: 13,836

Rating: undefined out of 5

Keywords:

Id: XfNt6MsLj0E

Channel Id: undefined

Length: 33min 54sec (2034 seconds)

Published: Sat Sep 01 2018