CppCon 2019: Greg Law “Modern Linux C++ debugging tools - under the covers”

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

alright so we walk that second we alright should we get going so thanks for coming everyone I'm Greg I am co-founder and CTO at undo I'm going to be talking today with durang who's Solutions Architect with it's an obvious working on Coverity and we're going to talk about some of the some of the under the covers on advanced decks C++ debugging tools and so I'm gonna go first to talk about some of the stuff that I know about and then doing is going to talk about what he knows about maybe come away from that ago less a curry is it just meet the terrible echo can everyone else hear the echo is it just me yeah is it a both mics on maybe is there anything we can do about that it's just the room alright just the way it is okay my friend get our try and get it past it okay so I think debugging is our dirty secret as programmers right most programmers spend most of their time finding and fixing bugs and if you dispute that just think like how often does it work first time my roots done I'd like to fancy myself as Earth say better to be honest better than average programmer what's the longest program I can write and it'd be completely and it'd be correct straight up the bat ten lines 30 maybe I'm if I'm really careful and pushing it and that's if it's a new program if I'm changing an existing program this thing complex piece of code then that number is I think depressingly close to zero lines of code will be written without without a bug right and it's really a whole process is just trying to understand what happened right my program did something different from what I was exploring I was expecting it to do now if you're a C++ programmer as I'm guess most of us here are then you you've deliberately chosen the language that's like very close to the metal right you must have a reason for choosing C++ over something higher-level hopefully that's a better reason than it makes you feel clever usually it's like performance or something but you're deliberately close to the metal and you kind of need to know what's going on and when you're debugging these systems you need to know what's going on you need to use the tools that are available right I think that's really you can't be a really good programmer if you're not very good at debugging and that means you have to use the tools that are available but you kind of really need to know what the tools are doing as well because when you know when life gets really fun when you get that kind of delicious mystery of like the thing I'm observing can't possibly happen everything is just I cannot have a model for what the hell is going on here and then if the tools actually you misinterpret what the tools are doing then that can even lead you astray and make it even worse right so I think having a basic understanding of how these things work underneath is is essential so I'd like to divide these tools up into these different kind of categories right the debugger what we all think of and I spend most of my time talking about that today not how to use it I've done other talks on how to bead like how to do lots of cool stuff in gdb but how it works underneath there's record and replay tools which is obviously something that's close to my heart where I'm from the dynamic checkers and static analysis which DeWine will talk about so the debugger is really it saying what is my program doing right it's the ability to kind of freeze time it's a bit like you know there was that that style in in in movies that was very popular sort of popularized I think by the matrix and then where there's a freeze time and the camera kind of pans around and you can see everything kind of frozen and then that style got copied by like everybody for a few years including every commercial and everything well that's kind of what I think of the debugger like and you can see everything and then time can just run again right and then you can stop again and have a look around so it's very powerful they're telling you what's my program doing right now and where is the record and replay tools already tell you what did my program do what happened now there's a bit of overlap here admittedly debuggers can give you a back trace which is kind of the best they can do about telling you how you got here an obviously that's very limited but it can be very useful which we got there which stick with my analogy of matrix film direction like you've got that ability with with the replay systems you've got that ability to go and see everything from any angle you want and like a Time Lord you can sort of move time back and forth to arabi want to be and see everything for any point in time but jackers really they're about did a class of a certain thing happen buffer overrun being the kind of canonical example but there are all kinds of others to do with you know race conditions or out of order lock acquisition or something like that you're looking for instances if did this specific thing happen when I ran my program and the static checkers are saying could this thing happen they don't even have to run the program okay so here's the big difference of course with the dynamic checkers you have to it won't tell you whether a buffer overflow might happen you have to cause the buffer overflow in your testing or whatever in order to detect it it can tell you where it happened which of course this is extremely useful so let's get on with with gdb I'm gonna talk everything about gdb but any Linux debugger will be working in basically the same way but for the purposes of this talk so this is ll DB or total view or whatever your debugger running on Linux or indeed any unix-like system it's gonna be basically the same thing so you've got gdb which is a process right just the regular user level process doesn't have any special rights or anything and that's talking to your program here as good we're gonna call it hello world but whatever your program is and it's over this kernel API called P trace right which is a really lousy interface it's a horribly designed interface but it's pretty powerful and that's just like that's what it is so that's not what we have to use and then information goes back or asynchronous notifications go back from the program to the debugger over-over signals so it's just take attempts that back to how signals work so on on a on a unix system when a process receives a signal a lot of the time it's going to terminate that's often the default most often the default behavior depending on what the signal was and how the system was configured maybe it may be at dump score if it's thickened it won't if it's egg V it might but maybe the program has been configured to ignore the signal or maybe the default behavior of the signal to be ignored so signals like sig sig alarm default behavior is ignore or maybe the default behavior is to stop the program so signals like sink sink titi out sixty TN or sig t stop or sig stop their default behavior is to just to stop the program and if the program is so just so-so set things up it can run a signal handler on receipt of the signal and the other thing you might get is if your program is being P traced right so if it has a tracer and on receipt of a signal it goes into a tracing stop which looks very like a regular stop you can see this if you look in proc pin status you see this that the status of the of the process and if it's T that's a that's a that's a tracing stop it means it's got a pin it's got a tracer and it's received a signal so in this world so hello world here has its received some kind of signal it's a sig alarm whatever it is doesn't matter and it's now gone into this stopped state so I've shown it read it stopped and then the debugger will continue it with P trace cont which general unstop the target process and run it forward by Gd because the the the process being debugged the inferior which I was think is a is a lousy name but that's what it calls it but we can call it the target or the debugger you however you want to think of it so a call speed trace Conte and that process now starts running until something happens until actually it receives a signal and when it receives a signal it will go stop into that tracing stop the debugger gets notified and actually my little diagram I could have made gdb go red while the debug he was running because that's - while there the inferior was running because that's typically what happens but it doesn't have to actually you can run gdb in non stop mode there's nothing about there's nothing about P trace that requires it stops typically know what it'll do is a wait pit call and stop waiting for it that notification from the signal to come back so signals only reach the Tracy if they're passed in by this P trace can't cool so I said it was a lousy interface and you can see proof of it here so you've got this Petrus Conte the the first argument of Petrus is an opcode I think by the way if ever you find yourself writing a function whose first argument is some kind of opcode and then please don't it should be several functions it might be but whatever that's what Peter is looks like and you can see obviously it's completely and not typesafe when you pass in these boys stars but the last part coming there is sick alarm in this example so that will actually deliver the cig alarm to the Tracy and if there's a signal handler for that it will run or if there is if it's a if it's something like a seg V and there isn't a single handle terminate the process but the tracy will only ever actually see the signal if it's passed in via P trace Conte or one of its friends Peter a single step or Peter a Cisco and and if the signal handler is blocked then it will just mark the signals pending there's the handler won't run in that case and similarly if it's set to ignore and it'll just be ignored so breakpoints when you're when your target process hits a breakpoint that's just a signal that's a sig trap right now if it's a breakpoint on x86 it will have what the debugger would have done is poked into the text section a special instruction which on on Intel is in three I think we'll come to that in a bit it generates a sig trap and then the debugger is notified in just the kind of the normal way likewise control C right so if you run your program within gdb and it's busy and you hit control C there's no special like GTP's not well at least if it's not doing remote debugging gdb is not special like in this way doesn't do anything particularly special it's just that you're you're the program you're debugging it has a terminal probably and when you hit control C the tty for the controlling process group will send SIGINT to all of those processes so the the process it's being debugged gets topped with a SIGINT and then gdb gets notified of that effect and then it drops down to the prompt and you can do what you will and so you can see this so I go info signals so here are his his the default behavior for all of the signals in gdb so you've got these different things you can say whether is gonna stop or whether it's even if you say no to stop then when gdb see the thing little just feed it straight back in to the process and whether it's gonna tell the user tell you the user that it's happened whether it's going to pass it in so it doesn't have to right so that's the thing when the when the Tracy gets the signal the debugger it's up to the debug it's not to the tracer whether it's going to feed that signal back in usually it does but in things like control C which you can see here it won't feed it back in because most of the time that would just cause the program you're debugging to terminate right so this is my attempt to the the UNIX signal algorithm I think this is roughly right I'm almost certain there'll be some kind of exceptions to it but it's a sick kill and sig stop are special you cannot set a signal handler for either of them you cannot block them you cannot ignore them so if you if a process gets a sick kill even if it's being traced right even if it's being P traced even if it's being debugged and I guess I said kill that's it game over the program is terminated it becomes a zombie until it's until it until that result is reached and otherwise if it's a sig stop the process will go into a stop state again you can't can't you can't have a signal handle up for sig stop you can't mask it you can't ignore it otherwise if the process is being traced it goes into this tracing stop actually if it's this is not quite right if it's a sig stop and it's being traced it still goes into a tracing stop so that's a useful signal that debuggers will often use where they want to stop their target process because you can know as a debugger writer you can know that doesn't matter what the what the targets done in terms of masking the signals or setting signal handlers or ignoring signals it can't mask sick stuff it can't ignore it so they're going to do what you want um likewise any going up use blocked or ignored or is a handler otherwise last thing we do is terminate the process so P trace has a whole bunch of operations of petros contrib already seen single step runs at forward one instruction you can run forward to Cisco get set registers and peek and poke a memory and all that stuff it's about a whole bunch I'm not going to get into all the details and what it does but it's quite rich if you run with Petri Siskel that's kind of like Petros but it stops when the process issues a system call actually it stops twice once on the way into the system call and once on the way out and on the x86 is a nasty hack where you can you can inspect the ax register if it's -38 that means it stopped on the way in -38 z gnosis there's no way that a system call can actually return you know sis so that's why they picked that magic value I believe on other architectures and there's no way to tell whether it's on the way in all the way out you just have the trace and you just need to count and here's a certain thing that I you know so I'd been writing to the P trace interface for bit barrister status but for multiple years before I figured out that what happens with sis called restart the kernel does a really good job at kind of hiding this and making it all transparent nearly all the time so when when the process gets stopped goes into one of these is these tracing stop states that I spoke about it's let's say it's in deep in a read system call or something it's blocked reading data from the network and you do control C and it goes into this tracing stop actually it doesn't stop inside deep inside the kernel it percolates all the way back out the system called returns to user space and it's at the time the system call returns to user space then it becomes stopped and if you think back to the original version earlier versions of Linux that had the big Colonel Locke this was kind of needed right because otherwise you wouldn't be able to do anything if it was stuck in there in the kernel holding the lock and that's still the way but it works today it returns back to user space but before executing any code in user space then it's in that stop state and it has a special magic return code something like restarts this is something minus 512 through to minus 516 these restart sis and friends and this is a special return code that tells the kernel when it next get sappy trace cont when it's going to be continued it's going to magically re-enter the system call and restart it and most of the time it'll do the right thing with timeouts and things so it doesn't reset the timeout to what it was so you should never see a system called return e restart sis as in userspace code it should never see kept but there are bugs and it's especially in in in certain versions of kernels one that's can be I think most problematic is the rl6 kernels and central six and like where there was actually I said P traces is now the interface there was an attempt to make it better with a thing called you trace which then was going to be a new interface and then P trace would be a kind of personality on to that and Red Hat adopted it and put it all into rl6 kernels but it never made it and that's kind of died that attempt and that implementation was a bit iffy so if you're debugging a process you might see a function you might see a sister particularly on rel six you might see a system call return e restart sis that's not what really happens that's an artifact of the fact that you'll be tracing it and in in your debugger I mentioned this earlier just touching this earlier breakpoints on x86 at least this int 3 instruction it's a magic single point instruction so inches to raise an interrupt and usually on x86 integration interrupt is 2 bytes long the interrupt opcode and the 0 to 255 in shop number that you're raising but a 2-byte breakpoint instruction would be problematic if you have a single byte instruction on which you're trying to place a breakpoint right otherwise your program could end up reaching into the middle of an instruction so there's a magic quite could xcc the debugger puts that in you should never see it right because the the debugger will when you look at the memory the debugger will take that out but if your program for whatever reason is reading its own code you might you might observe it when there's breakpoints are set when an entry instruction is executed the kernel stops the process with a sig trap and you get all the usual thing you see there's also hardware watch points which obviously very very useful right so that you can heart you can watch data and get a trigger a trap when that data is changed and those are set by the debug registers on x86 I'm setting everything in x86 terms but most architectures have some equivalent of this DB zero to seven actually four of them and four addresses the other a control registers so you can see - forward time you can set them to trap or memory reads or writes or executes actually so you can implement breakpoints using the debug registers if you really don't want to write to the code but there aren't very many of them available and you don't you don't write to them directly from the program so it's configurable on x86 but I think Linux means you'll get a fault if you try to access the debug registers directly from your program but the debugger will do it by up a trace poke user command okay um so now I've got this let's get on to a bitmap that's how kind of P trace interface it's gone to dwarf info now and how our our debug information is represented and I suspect we've all kind of fallen afoul of this at some point so debugging with attributed record formats that's the kind of strange acronym and I was unsurprised when I looked it up earlier to learn that it is in fact a backronym and it's called dwarf because it was made at the same time as the elf format was made and they decided that would be kind of cool and so it doesn't really thought really what it stands for but anyway it contains the debug info it contains a description of your program that the debugger can read to give you that kind of symbolic information so there's simplest level it'll map a program counter to a source line so when your program is stopped a program counter you know X 1 2 3 4 the debugger can look in the dwarf information so yeah that corresponds to food OC line 42 but contains all sorts of information as well you know and and all the type information classes templates all kinds of things that the debugger might want to know and you can hear some useful useful options so - G is the default one that's the easiest one to get debug info but actually that defaults to minus G 2 so minus G is like - oh you give it a number afterwards to specify different levels and default - G's is level 2 you can go - G 3 to get everything with generally I think is I recommend this just and really useful - G 3 and is independent completely orthogonal to the optimization level but obviously the more aggressively the compiler has optimized your code the more quirky your experience might be when you come to debug it it's not laid out linearly and so you can think you're going forwards but you jump back and butt - og is a useful optimization level so good good up the sensible optimizations but but nothing too crazy so you get a sensible debug experience but often you're trying to debug you know you're finding a quick debug production code and you want to do that with full optimizations and and you can you can and it does the experiences obviously your experience will vary but we'll talk a bit about that right now so one of the most annoying things when you're debugging optimized code oh by the way if you do just if you do - g3 one of the nice things it will do is - all the inline functions it will kind of make that all go away so as you're stepping through it will mostly look like inline functions are not inlined which is very helpful and but yeah you try and print a variable and you get and you get variable optimized out right so we've all seen that I expect so let's have a look at what this means so so here's a little program that I'm going to compile with optimizations and when we load it up and so here I am at the beginning of my of my program line seven and if I print foo it says variable optimize now I think this is a really unhelpful error message because it kind of implies to me at least that that just doesn't exist anymore though the compilers been able to get rid of for you completely it's not what it means here what it means here is it's not live but it hasn't actually been optimized away if I do next and then print Fuu hey well ah there it is alright so let's look up what that actually I have that all works and what gdb is doing so well I could read elf - debug dump so this is this shows us the dwarf information right so dwarf is this file format is fairly simple really and it's just this tree of objects and there are tags which are the kind of the nodes of the tree and it the tree structure as you might imagine from your program layout and and nested blocks of code and types in structures with structures and classes and templates and all that well put together in this tree bit a little bit like an AST but not quite and here's everything of course this has got everything that I've included all the type definitions and everything so so GC so you can describe that if you remember my variable was called foo so let's have a look at that so here's foo and it tells me you know where it is so it's in file number 1 which will be indexed elsewhere to optimize see the file name and the line number and everything and the type of course and it has this it has a locations list and this defines where foo is is live and I can actually look at that a slightly simpler view I can go debug duck equals lock and get just the locations list now this program is very small so it's totally quite small so here's the locations list starting at a it's actually a list it's a full list and he did these the ranges where foo is available right so if I go back let's look at the function so I can see from the disassembly it's never actually writing food writing that food value to memory it doesn't need to Rand returns it it's in register ax and and it just keeps it in a register but the dwarf info that's fine it can tell us that right you can tell us that these lines it seems to have a zero size range it's I'm not sure what that's all about but from here from from from offset when I said and Paul to one in seven eight I look here so I can see here it is here's the O seven four seven eight and it's telling it tells the the debugger that the values they're in in in register are a X and actually it pops around different registers here and it's actually telling us a think or having actually confirmed this but I think what this means is the value of foo is what's in our ax plus one right because my code if you remember did food plus plus and then at the end it's just in our ax because that's the value that gets there gets returned so modern versions of the compiler a modern versions of gdb or whatever debugger you're using are able to do an okay job at least it's pretty good job of tracking register you know tracking values through register States just as long as they're life now of course if you're using reversible debugging is even better because and go back to when it was live when it's kind of become unlive but but even if you're using a regular debugger often you can just step forwards a bit and find out what your what those values are there's a nice little tour of dwarf and and and all that good stuff so you can tell these are all GCC options by the way so you can tell GCC quite a bunch of things but actually if you just put - g3 it kind of does everything even macros actually which is quite cool let me just quickly show that so so if I go my best just - G I'm here and I load that up I start and my print vowel it goes I don't know what that is right because it says preprocessor macro hasn't gone through the compiler but if I go that's g3 and repeats but I it knows about macros so particularly it's only about macros like that sort of expand you can so part can convert data structures into to extract the bits the bit fields out of data structures or whatever if that when that's done by macros which sometimes it is that's quite useful obviously you shouldn't be using macros but maybe you're trying to debug someone else's code now related related to dwarf format is stack information CF a and CF ice which is call frame address and information respectively so depending on how you compile things you may or may not have frame pointers and in the modern world more than likely you don't any optimization on 64-bit Intel the compiler will take away frame pointers so it can use the rbp register as a useful general purpose register and this means walking the stack is slightly painful but you know it's worth it for the extra register that you get and so with the CFA and and CFI what you get is essentially a mapping of at this if every program counter or range of program counters what offset from the stack pointer is the beginning of my stack frame right and then that can be so then you can use that to do print your back trace or see it and that's on the CF a and with CF I you get more stuff like locals and everything else and actually the exception unwinding mechanism and C++ will use some of this stuff so it's kind of related but but but different from dwarf as this is as this is a C++ conference nets just a little bit of C++ so cat points pretty cool and gdb they let you stop but so breakpoints obviously stop but on a certain at a certain given any given line catch points can make you stop when when kind of interesting things happen so here's my bit of C++ code minus g3 so I can go catch throw exception now will catch any time the exception exception is thrown I can also catch catch which is a bit mouthful but catch if the catch and if I look at what it's done it's actually inserted just a couple of breakpoints and and it's given me handily the dress here so I can look at that if I look at that address info line that we can see sure enough that's at this cxa throw label and unsurprisingly that's a catch right so these are these are routines within there's the C++ library which the exception mechanism will jump into when it's throwing and catching exceptions and then you gdb will break points on there and then can filter on the exception type and and and all that good stuff you can also catch Cisco which uses the tracer school thing right so you can say catch also schools or catch a certain syscall and run forward until that Cisco happens threads are kind of interesting in the debugger world so the thread library this is all kind of pointless these days because pretty much every threaded C++ program on linux is going to be using just Lippe thread and it's all kind of standard but it's designed such that you can have different threading libraries and in years gone by that was actually more common that you would do that so the threading library provides its lib thread DB where D B stands for debugger and that provides different routines that the debugger can call to look up things like thread-local storage so when you try and print ernõ in just an irregular c program and that will actually go through lib for a DB if it's threaded program because I know is thread local gdb does that kind of all automatically for you just a little plug I'm putting together a series of small kind of bite-sized 510 minute tutorials on all kinds of things gdb so it's kind of a lot of content from previous talks but a lot of new stuff as well she might wanna go and look at our wittingly named gdb watchpoints series and for more of that kind of stuff enough of GDP and the debugger let's talk about dynamic checkers it's my dress sanitizer and valgrind here there are of course many others they're all kind of a both the press an appetizer and valgrind work on this they have this notion of shadow memory right what they what so back up a bit what we're trying to do here is detect invalid memory accesses at a much finer grain than the hardware will allow right so obviously with modern systems you've got pages typically four K might be got you know might be might be much better than that and that's your kind of the granularity with which you can mark memory as accessible or not and what we're trying to do with both valgrind own address and the sanitizer is do that at a much finer grained in a person of object level much finer grain and the hardware will support so what they do is create this these shadow map so the memory every chunk of memory does this work not really every chunk of memory will map down to a shape that shadows are smaller bits and it needs a few bits of information for every eight byte word so the shadow is smaller than the memory its shadows typically and it just has information about whether this so that way both address on a times around valgrind could look up is this address about late or not now valgrind works with a binary JIT translation right so you can just take your program compiled however it's compiled maybe you don't even have the source code to it and you can run it through valgrind and it does it through JIT translation the sanitizers work by changing the compiler these have different trade-offs of course it's nice that you don't have to change the application change the way you compile with valgrind it does mean it is quite slow and there are certain things you can't do so address sanitizer because the compiler is generating the the checks around each memory access it can it can do more stuff right so one of the things it can do that the valgrind isn't able to do is detect and buffer overruns on the stack and the runtime overhead is much less so they report to the clear arms of 50% to 100% slowdown so it's still you know very real immeasurable but it's much less than you'd get with something like Algren but what both techniques are doing is every chunk of memory has this has these red zones at the beginning in the end right and and so if you do a melaka person site I mean it anyway if you do malloc of say 32 bytes actually the system's gone out and I came more than 32 bytes because it's got some bookkeeping information that needs to go with that Malick's chunk but if you do it with with valgrind or with their address sanitizer they they have their own Malik's they Interceptor Malik and they make it even bigger and they put these red zones at the beginning in the end which and then through the shadow maps these red zones are mark bad right so if you memory if your program touches these in notice it's a problem one thing to be aware of of course is the there's the smaller the red zone then the if you if you stretch over the buffer with a certain stride you're gonna you can go right with us with the red zone anyone notice a bigger red zones will catch more types of errors but of course consume more memory so that's it for the sanitizers let's get in now to record and replay systems and something sort of close close to my heart so as I say activate so it's this control of time it's being able to see exactly what happens I think we have time where I can just show very quick demo this if you will indulge me that readable just about that well maybe not the back me yeah I think there's a but I haven't got any that's not doing what I wanted it to do all right you know that's not done what I wanted might just go into normal mode so here I've got a little program and I'm going to run the program inside the debugger get on to the VPN so I can get a license ok that's let's get rid of that so in a previous time around this little demo so apologies if you've seen this demo before but I just want to show what we mean by record and replay or reversible debugging so I'm gonna run my program and it's crashed and I come to a certain 0 all right well what's gone let's have a look at what's going on here so I've called this function cache calculate which given a number is supposed to return this square root and I've given it 255 and it's return 0 so clearly calculating has returned the wrong thing I can see from my stack trace here down here I'm not actually in my coder so it's 0 the debug is nice kindly put me there but actually it's in this raised function here that the where the program has stopped all so far so normal so you know you can do that with a regular debugger just looking as I said earlier looking at the stack trace is kind of the though the closest that most debuggers give you to it like how did I get here and but I need to know what happened inside this function right cache calculate return the wrong thing I need to know why so what I'm going to do is hit this button here which is like a nun call which is kind of popping up a cool stack but rather than guessing is really moving back my program has definitely been here all the bear all the Global's and everything are being go back to what they were and now I can start to step back through time and now if I go here this is right after cache calculate returned so I can go in to the function and I can see what it returned and why so it's returned the I think free from the cache I here is 88 so this looks like typical programmers worst nightmare sure enough my cache contains bad data my cache tells me that the square root of 2 pi 5 is 0 so someone stomped on my cash I've got no idea how or when that happened is it a logic error is it a memory error I don't know well all I'm going to do here is add a watch point normally you'd set a watch point I'm gonna make this watch point persist across stack frames normally you'd set a watch point and run forward until the data Changez I'm gonna run backwards until that data changes and that's going to take me back to the line of code that's where that happened to smoking gun so I've gone back in time I've gone back in time to where the structure contains good data I'm actually I can now step forwards watch the data here as I step forwards this is we'll see this in like watching a live action replay for watching sports on TV or something so as I step step step that's it that's the corruption happening right there let's just back up a bit and see if we can see what's what's going on so writing square root of Jason's and operand adjacent into my cache and I can see here that's that our parent adjacent is minus one so I've tried to take the square root of negative one you can't do that with integers hence the since that's garbage what's going on there why's that happened well let's just add another watch point to that and go back again and okay it's being set here so operator Jason is being said to operand minus one operand is zero so what the program tried to be smart just trying to populate one entry either side in the cache and that's cause when it called a function it returned the right thing or called the function with an argument at zero and it returns zero as it should have but by being smart with this locality of references it's its left one entry in the cache in a bad state and I didn't notice for some time later so that's kind of what you can do with these replay systems and so essentially what we're saying is you can go back to any instruction that executed and you can see any piece of state right any any any value any piece of memory any register value for any instruction executed which clearly is just a huge amount of data or anything but the tiniest of programs right so clearly we can't store all of that even if we try to store just the diff just the Delta between what's changed each instruction that would still be way too much information right billions of instructions operating every second you know it's eight 16 32 byte record every every every nanosecond roughly that's just not gonna be practical either so what we do and this is true for our R and for live recorder what we do is to just capture non-deterministic stimuli and exploit the natural determinism of computer so I'm sticking stepping back in that demo what's happening under the hood is it's going back to us now and playing forwards to where it needs to be right and so we're recomputing previous States rather than trying to store everything now for that to work then we need to capture those sources that non-determinism right computers are completely deterministic except when they're not and for a user mode application you've got several sources of this non-determinism namely system calls threads which is signals asynchronous signals at least some instructions are non-deterministic so read the timestamp counter is an example in modern Intel's also have a get me a random number instruction and share memory access is shared with the process or shared with a device or something and so we have so it just but we just need to capture those which is typically a tiny tiny subset of what the program is doing and everything else can be can be recomputed now there's another problem though we need to know where we are in time when we're replaying so when I'm you know spin that down a steps forward line it has to stop at the right time it was inside loop you can't use a program counter because loops right so you have to know how far through the program we are not just so that we can step nicely with reverse step and things but so that we can replay signals and other asynchronous non-deterministic events at exactly the right time right it's the problem with this kind of technology is that if you get it even slightly wrong everything unravels very quickly this sort of like it's very kind of got to get everything you've got to capture and replay perfectly every single bit precisely the right time so RR is built on performance counters right so modern CPUs at least modern Intel CPUs have sufficiently accurate performance counters to count the number of retired taken branches that it can give you an accurate time how far you are to the program and these can be these can be exploited so they're nice and fast and that the program runs without instrumentation nice and simple we don't have to write too much code to take advantage of them and the only thing is they're not always available so they don't work on AMD CPUs they're not always available in the cloud they don't work on arm but where they are available they're very good but they also can't capture everything so things like gun share memory access is an asynchronous operations are problematic and they used the thing called the precision events based sampling PEVs in order to generate them and interrupt so I don't only need to know exactly where I am in the program I need to be able to go back to a precise state in the program and you can configure the Intel performance counters to generate an interrupt after a certain number of things have happened right and so that's how they do that whereas with live recorder which is from from undo we use a JIT and we binary translate there the the machine code is running which does impose more overhead but does have the advantage you're not relying on these things that aren't always there and can cover cases like share memory and a nice and clear stuff but that but basically they're kind of the same but they're just using different techniques to get the same information that we that we need so that's it for my very quick run-through of those various types of tools I now like to hand over to d'Alene who's gonna talk about Coverity static analysis all right before start a real quick question for you guys what do you think will be the coolest way to make money for C++ programmers inheritance of course it's free and efficient right ok well the second I'm going to talk about static analysis a Greg said I work for synopsis and I support the product Coverity it's a static analyzer so I do static analyzers do well what they do is they take source code okay in case of languages like C C++ they will compile them and then generate a semantic representation that includes a abstract syntax tree various call graph elements right and then compute you know different paths and then go through those paths and find problems in your code alright so all of this is computed and this is done statically and it's not during runtime right so when it finds problems they will present I know it's not readable I took like in red it's like annotation of steps which leads to the ultimate problem in your code yeah and then the these defects are very consumable in various formats if you are running the static analyzer on command-line you know print on the command line if you're using some IDE the analyzer can often be driven in sorry IDE so so your code will be overlaid with the problems you find it's pretty trivial to push the results into code reviews pull requests right and you know there are quite a few static analyzers in our industry and so the way I kind of like to think about setting analyzer is this way I mean what is the main motivation behind that static analyzer and whether that analyzer automatically is your friend or not okay so there there's a group of them where their attempt is to find anything that can potentially go wrong with your code okay where the false positive rate can be as high as 80 to 90 percent so that class of Sena analyzer is typically used mostly for security researchers okay so they need to look at all possibilities find the research and then kind of find the needle in a haystack right so if you are using static analyzer today and you end up facing with too many false positives just remember that's not your friend because if you are charging everything that's in there you're gonna do mean the work of the security researcher right then you don't have time to code okay and I'll say the other class is what's already building when you're compiling with you with your tool chain I say those are your friend but I think they're typically focused on trying to get you something quick right there right and then I'll say the last class of analyzers do tend to be much more conservative right they gotta have very strong evidence before they put into a report that it's a defect yeah and I would say that class make that your best so I'll quickly go through this so I don't know if you read this paper this is back in 2002 Linux kernel version 2.2 a research work at Stanford where they created this was the original version of Coverity actually they found two hundred several hundred very important defects in the Linux kernel and you know this is like many many years ago so since then we have come a really long way the Linux kernel team still uses Coverity today pretty judiciously actually I should ask anybody here work on the kernel okay we have a hand there or actually we're just giving about round of applause because that community they're kind of my heroes right if you look at our planet that's probably one of the most important code bases you know we have yeah but it's very public if you if you go into get log and then do a keyword search on Coverity you'll see thousands of references of Coverity they like a Verity they give credit to Coverity when they're fixing stuff based on what career Lee has found for them so um you'll see there's a table right here on the left column is a list of checkers so these are top checkers in terms of total number of issues that team has fixed and I pulled this data back earlier this year March 2019 so you'll see like the third one dead code that usually means like you have a if some logical expression and then there's a blocker code right but if that logical expression can never ever return true then that blocker code becomes dead code right so that means that developer hasn't made a mistake in their in their thinking yeah so you can see that for dead code the kernel team has fixed 968 to date right 64 dismissed right so 486 outstanding so I'll standing usually based on what I know is usually in areas where they don't care that much right but you but the thing that I want to highlight for you guys is the dismiss rate itself it's it's really okay yeah so if you guys are using a cell analyzer have this kind of rate remember that that's your friend okay so now I want to get under the hood a little bit about static analysis so on the left hand side is an example code where it's it's being annotated with a problem and on the right hand side is actually an example checker it's a domain you know it's a functional language Coverity has actually recently started opening up the analysis engine the ast that the call Co graph to developers so then you can create your own custom analysis you will okay so I'll walk you guys through um so on the Left right this is a very simple problem we have malloc here why we have a if it's outer even number here just kind of random if it's odd then we will free the allocated memory right otherwise when it when it goes out of scope you know then you have a leak okay so let's have a look at to see how the checker looks like so the way the way to think about this is to say okay I'm going to define a pattern where it is a step in the car graph okay and then where it's a function call the function identifier is Malik okay and then similarly we're gonna look for a step in a call graph where it's a call to free where the function identifier is free okay ignore the rest okay and then now if you think about this we really need to match the allocation versus of free right you cannot just take random allocation and random free and what they don't line up right they don't reference the same object or memory right so the trick is to line up the password the object you you care about it's lined up yeah so there's a feature in our code accent where we could say that called expression e whatever the point is becomes the key okay so then and then here's a checker itself now so I could give the name to the checker right and then so then what the trigger does is to say okay for all functions and global set such as all functions we're all part evolving the function where it matches the sequence right you have at one point a lock and then this arrow looking thing this can be can be chained together so if you have multiple call sequence you care about and then where there is no call to free right easy enough right okay yeah okay so before I get to before I get to the demo I just want to give you a flavor for like you know why or when you might use something like this right so I was working with a customer really big cloud company there have a security policy to say anything that goes into storage must be encrypted right so you can imagine how to implement a checker similar to this right then is just start with function patterns where oh five more minutes okay so the use case will be very simple where if you track all the operations where you can cause a right to to storage right and you look up in the car graph that's where this function exists and then if there's an encryption operation right and then furthermore you want to make sure that the encryption happens to the same object now you lined up so every pass that comes through that does not have that encryption done then that's a problem right so like if you are doing code reviews right here you frequently chasm people you see review comments like don't do this and don't do that don't do this in this context or what not that's a good time to use something like this okay so really quick on demo so you can see how this looks so this is command line so Coverity has a bill wrapper cocoa build where we say this is what we want to put the capture code and then you can put the comes out to man like this it's not on the screen oh okay great all right so um I would hate a command like this so the build the bill wrapper will monitor the compile so anything that gets compiled it will pick up and Coverity understand all the compiled flags and then so it knows how to compile and generate the semantic representation that's exactly equal to your code so like this year this is come okay okay this is better apologize tastes that better okay so you could replace the CL here with C make maven or whatever bill you have so everything that's that's a gets touched will automatically get up so so in this case I don't have one compiled so we can you can see it there so then we can do carve analyze just like that so here it will compute the car grass and ASD and all that and then run the analysis so this resource league is is a default checker that that comes with Coverity it's already found that so so the question is how do we trigger that custom checker I show you earlier so we do this so I'm going to disable all the default checkers that comes to Coverity and I'm gonna just enable that particular custom checker I show you earlier okay and then if I want to look at that issue locally and you'll see that the checker name here right came from the custom checker so yeah and notice right you get several things automatically you'll notice that in this particular branch where it is taking the false branch so that's when you when you go out of scope you're gonna have the leak right you notice I didn't do anything that had to do with this conditional branch right here right that's because Coverity in the background when it computes all these paths the paths already pre computed so when I when I expressed that this is particular sequence I care about I don't need to worry about these different paths right so it it knows how to evaluate this I ma 2 equals 1 for example yeah okay yep so so that's study analysis if we have one or two really quick questions I'm seeing a science session is over but yeah alright so wherever but thank you very much everyone

Info

Channel: CppCon

Views: 11,544

Rating: 4.830688 out of 5

Keywords: Greg Law, CppCon 2019, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, event videographers, capture presentation slides, record presentation slides, event video recording, video services

Id: WoRmXjVxuFQ

Channel Id: undefined

Length: 56min 45sec (3405 seconds)

Published: Sun Sep 29 2019