RustConf 2021 - Fuzz Driven Development by Midas Lambrichts

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone i'm midaslamwise and i'm here today to talk to you about first driven development when you know that you don't know so first of all who am i as i said i'm mia zombies and by day i'm a senior software engineer for the refueling solutions and i'm a wrestling enthusiast i also have various social media which you can find me i'm not the most active person but if you have any questions or just want to chat you can reach me on these for the people attending westconf you can also reach me right now in the discord if you have any questions or if anything is unclear so for the subtitle when you know that you don't know this refers to one of the strengths of fus-driven development and even with a minimal amount of the main knowledge you'll still write decent software without any major mistakes uh because first driven development will guide you through it and you'll learn as you go without having to fear that you've just made a big whoopsie um you only need to be able to come up with one simple thing some invariant and the le the rest you'll learn as you go but to understand first driven development we first of course have to understand what fuzzing is so fuzzing according to the first book which is an excellent resource for anyone getting started with fuzzing first testing is a software testing technique uh used to find security and stability issues by providing pseudo-random data as input to the software so basically what it's gonna do is just gonna come up with random data and give that as input to your software and then it's going to monitor your software for some things most of the time it's going to just see if your software crashes if it crashes it'll report that back to you that for this input something crashed there are other variants sometimes it will keep fuzzing indefinitely and just collect crashes and then when you say okay you can stop running it will report them all in one go um if that's not clear yet we'll see also a bit of fuzzing later on once we go to an example and that should clarify it for you okay so we have fuzzing now first driven development so first driven development is a six step process which the first one is determining the variance so that was a seed i was referring to earlier the one thing that you need to know this doesn't always need to be correct you can start with an incorrect invariant you just need to be able to come up with a starting point second step you provide a first target that expresses that invariant for you fuzzing target is sits between the fuzzer and your software which can do a bit of manipulation on the data before passing it along uh to your software and we're gonna run the fuzzer until it gives us a failure and we're going to think about why did this fail our software why does this cause a crash or whatever then depending on our reflection we might just have to write some new unit tests and do some development and then step six the final step is a non-final step because it's iteration you're gonna restart again most of the time certainly with more mature projects or when you've been fastest first drift doing field driven development for a while sorry most of the time you're just going to return to step 3 run your physical again until it finds another failure another bug in your program sometimes also notice that this doesn't really make sense if you reflect on it and that might be because your fuzzing target doesn't really express the invariant that you want so you might have to change your fuzzing target sometimes you'll even after the reflection see that your invariant actually doesn't make sense or isn't very good so you have to change the invariant this these are the six steps of first driven development now let's try to map this on where we'll go well where in the wild you'll find most feathers today or fizzing footing targets today and that's on parsers or compilers so the first one is determining the variant for a parser most of the time this is going to be it shouldn't panic it should either return an error or on a cable it should not panic also goes for most compilers it should accept or reject but it shouldn't crash then a fuzzing target that expresses this invariant you just take whatever input the fuzzer comes up with and you pass it along to your parser you don't even care about okay error just that it doesn't crash then we're gonna run the feature until it fails which in this case means that our parser crashed somewhere panicked somewhere and we're gonna think about why most of the time this is just some path you missed or something like that but since we have a pretty simple fuzzing target and a pretty simple invariant it's very unlikely that it's gonna be one of those two so you're gonna write a unit test for this uh input which just gives us input so that your test coverage increase and then you develop such that the test passes and your trade again in this case most of the time back to step three so now that we know first driven development or i've seen it for a bit we're going to write a library which uses by using field driven development to develop it the library that we're going to write is a json patch squasher most of you will know what jason is but what is a json patch so assume we have this json document to start with two properties a name with the value mids and a ranking eight later on the user gains some popularity his ranking goes up he goes to rank five okay he gains some more popularity a bit more his ranking goes up to four these changes we can express them as a json document itself so the change from ranking five to four we can say another json array there is an operation there to replace slash ranking so that property from the root with value number with value 4. again if the user for example changes its name and by consequences ranking drops a lot there's another json patch for that change one to change the ranking to 950 and one to change the name those are two patch operations and this is one json patch that array is the json patch the entries in it are the patch operations so then again another change the ranking shoots up to one he did something amazing uh another json patch an array with one operation in it to change the ranking to the value 1. so those are json patches now we know what json patches are now the squashing part so if we have our operations from before our json patches we can express the same effect from that starting document to get the same end document shorter so in this case we can express the same thing from the same starting document to get to the same end document by just replacing the ranking with one and the name with meters and that has the same effect scoped out on these documents on the starting document and that's the squashing operation that we're gonna try to do so now we know what we're trying to write okay that's what our library is gonna do is gonna take this squashing it takes a lot a set of patches and it tries to minimize it a bit so the library is going to look like this we have a public function which takes the patches and that's going to forward basically to squash operations which takes an iterator over all the operations and all of the patches and for now we're just going to leave to do in there so we don't have any logic yet uh and this is our starting point now so this is what we're gonna use for first driven development so step one we're gonna write an invariant i'm gonna come up with an invariant and this is what i said before the effect should be the same so the effect of applying the squashed patch should be the same as applying all the patches that were squashed in their order that's the invariant this is it then if we do this it's still correct it's still the squashed is still doing the same thing so step two the first thing target this is what it looks like for cargo first first target exclamation mark then the data is what the further is giving you so we take what the fuzzer gives us the data and then we're gonna see if it matches an input script if we can deserialize it into our input struct if it can okay great we'll return it and we well we'll return that assign it to the input variable if it doesn't then we return from this function from this closure so such that the filter itself will mutate again and try the next iteration because we don't make we can make sense of what it has given us the input struct itself is two things one starting value so we can apply patches to it to see if our squash patch does the same and then the patches them themselves so if you have that then we're gonna squash the patches from the input that's where our library is gonna do and then we're gonna apply the patches in order that we've been given them that the filter has generated for us to get our normal result and now we're going to apply our patch to the starting value to get the squashed result and then we're going to see if they're the same they should because it's the same effect the end documents should be the same so this is our pheasant target in this case so step three running until failure nobody has ever seen it is what it looked like when you start a fuzzer i sped it up with a factor 100 because this took a while but i'd like to draw your attention again to that we still don't have any implementation we still have to do in there um and by showing this i'd like to show you that fuzzing can only show the presence of bugs and it can't show the absence so it can come up with counter examples of where we fail to uphold our invariant but it can't guarantee us that we're upholding the invariant and what i'm trying to show here is so that duration of the fusing before it fails does not equal value quality sorry because again we still have that to do in there so step four reflect well it took a while to get our first question it's difficult for the fuzzer to come up with the input stroke that we expect there are some other ways around this structural wear fuzzing which you can read about in the first book risk facebook you can seat the fuzzer with some data but that's outside of the scope of this dock and then this is for brevity normally this would come become obvious after a few iterations but i'm gonna already do it here is that the start state and the patches don't always make sense together for example after a while you might come up with something the fuzzer might give you something like this that you have a name and a ranking and then it's the operation the patch is remove a non-existing property if we do this in our fuzzing target it would fail because we cannot apply the patch to the document so it's not gonna fail because we do something wrong or our library can be 100 correct but purely the patch doesn't make sense on the document [Music] so that's what we reflected about this is a bit of a shortcut to get you to come along with the next steps because they are a bit more important now um so we're going to iterate so we saw that yeah we have some problems with our fuzzing target but it could also be that our invariant isn't that great so let's re let's take another look at our invariant so what we had was the effect of applying the squash but should be the same as applying all the patches that were squashed in their order this doesn't express anything about what we're actually trying to do the squashing i basically can return one patch that ex it's all the other patches together and that would do that would still have this invariant if we can come up with an invariant that also expresses besides the effect that it should minimize it then we'll have a better progression through our first driven development so as a keynote among you might have noticed earlier on the example is that if you take the initial document and the end document and you calculate the patch between those that you have the minimal div that can be that we can squash towards if we have all the others all the other ones the beginning and the end that's the minimal one you can't do a shorter expression of the difference between start and end then the difference between start and end so that's our goal that's what we're trying to achieve so that's what we're gonna compare against we're no longer looking at the effect but we're looking at the patch itself so to pour this into a fuzzing target step two we no longer try to deserialize into the input script now we're deserializing into a vector of consequent json values such that we ourselves can calculate consequent patches between them so again if if we can do this we continue if we can't we return feather will mute date again because we don't crash and gives us a next input that might be deserializable also again because we're gonna calculate patches if we only get one or zero documents we can't calculate subsequent patches so we're not going to be able to do anything with that same story just return such that the fizzer will mute it again then we're gonna do that actual calculation to get all the patches between all the documents and we're also gonna calculate the what i'm calling the global patch the one between the first one and the last one because that's the one that's minimal that's one that we're gonna compare against and then we're gonna squash the patches and then compare there's a bit more boilerplate here because our library if you have zero patch operations it will return none instead of a patch upper patch with zero operations so that's a bit more ball update but it's just comparing the minimal diff with what we squash into so now we're going to run until failure and then after a while this is what we're going to be greeted with that the threats are named panicked not yet implemented because still we have that to do in there and then we're also shown what input caused the failure and if we zoom in on that a bit we can see that it's two one an array of two and one new line tab new line but for us only this is important array of two and one so first document is two the second document is one and that the fuzzing target will calculate a bench between them and give that patch because there's only one well two documents so one patch give it a patch to our library and then we'll hit to do so let's reflect a bit might actually help to write something that does something instead of leaving to do so an actual implementation might help so we're gonna take whatever the failing input was in this case an array of two and one what i like to do is write an helper function that basically does the same as a fuzzing target such that i can write these very neat very tiny unit tests very quickly by just copy pasting that input from the fuzzer into here so this is our first unit test or logic now is just to do and minimal thing that we can do to make this pause is just return all of the patch operations as a vector and that would then result in the unit test succeeding and we can iterate again so we again run until failure this time we have a bit of a bigger uh error message at the end but the important parts are the search and failed message so the left forces are right with the global batch versus the squashed patch and at the bottom again we have the input that failed so now we have three consequent documents of number 44 and number four and the number seven uh and then if i tidy this up a bit to make it a bit more readable the global patch would be replace the root with number seven what our library right now returns is replace the root with four and then replace the root with seven so let's think about this a bit so i see that if the pot is the same that i only need to care about the last one and we still have two operations but the first one has the same pulse and it's the second one that basically still has an effect at the end so when multiple operations are there for the same path we only need to keep the last one so we again pour this into a unit test again same thing only the input needs to change this for this and another name of course and then we develop so our implementation instead of just collecting into the vector now we're gonna use a hash map from the path to the patch operation look to all operations uh and insert them by their path and yeah if you insert something that already exists then it is going to be overwritten uh in the hashmap so then to get back into our vector of batch operations we just train the hashmap and only care about the patch operation the pop the pulse that we use as keys we can drop them we only need the patch operations as a vector so we collect it again so this will make our unit test pulse again so we can iterate again we can run it until failure this time this is what we'll end up with the failing input would be 4 an array with 2 inside it and an empty array the global patch here will be replace the root by an empty array wheel or squash operations will still return replace the root with an array with number 2 inside of it and then remove the first element of the root so here we see that our previous our previous assumption still holds that the same pot needs to be squashed because this one is talking about a different part so for a different part we still might have to squash so that's our reflection even if the part is different we might still have to do some squashing between them [Music] so we'll write a unit test again and then we develop this is way too much code to show you but the main idea is a user radix tree which can give you keys so the pods which are parents or prefixes of the part you're currently looking at or all the keys which start with the prefix that you give it so basically children potentially of what you're using and then you have all sorts of uh combinations you can get into i'm not gonna show it uh here for preview sake uh but after you've done that end unit test passes we're gonna iterate again i know fun times so we're going to run it until failure this time our failing input looks something like this an empty object zero an empty object the global patch is gonna be empty because you're gonna compare an empty object with an empty object you don't need to do anything our library will return that the squashed patch is a replace operation with the valve with the root part and replace it with an empty object so our previous assumption still hold this is all operating on the root path but if you think a bit about this is and more in terms of the patches is that the patches don't contain any information where you come from if this initial document this this empty object was anything else we would succeed and then the patch operation would be the squash patch and the global patch would be the same again but yeah the jason patch patches don't contain any information from where you started it only it only describes where you go to like you start from something and you replace it with and then that value is contained in the patch what value you started with is not contained in the patch so here actually we can realize that the patches themselves don't contain enough information to always come up with a minimal patch because we don't know what you start from so that's our reflection this time again sometimes it's impossible to get that minimal diff solely from the patches you'll need the starting document as well to be able to come up with that uh that minimal diff a minimal patch so we have two options now we can either change our libraries api so you'll have to pass in that starting document as well and that way we can guarantee that we'll be able to calculate the minimal div or we can allow for sub-optimal results if you find it more important that you don't have to give in the initial document um based on what you want to do you're going to develop some more unit tests and or not and you're going to iterate because if you choose one of those options you'll have to come up with a new invariant because for the second one the allowance of optimal results breaks our invariant so we'll have to re we'll come up with another one or do some conditions on our invariant so what we've been doing again the six steps of first driven development you determine an invariant you write a fuzzing target that expresses an invariant you run it until failure you reflect on that failure potentially develop and do some unit testing and then you iterate again those are the six steps of first driven development so why does this work why this works is because what we're doing on a broader scope than this the six steps so what we're doing is basically we are using a non-cognitive process fuzzing a sort of fuzzer software which is a program that to drive and guide our cognitive development of software and we just need to provide a simple seed and invariant so we start with a little bit of cognitive work and now we let the fuzzer guide us uh and show us where we're lacking in our invariant implementation expression of our invariant so why this is good or why you might want to use it is because the cognitive gaps that you might have and you have if you have limited domain knowledge they get less chance to manifest because you're not driving what you're gonna implement next is the fuzzer that's gonna show you the here there's still something missing and you need to take care of it so for example you might not have or i didn't think about the different parts still having to be combined together for a minimal patch or that you might have to provide the starting document to be able to give that minimal patch in some cases those are things that are important to know if you want to write a good library but i didn't know those and i'm not sure if any of you already saw those or could think of those so with a very limited domain knowledge we've still written a library we still it already does a lot and we don't have to be afraid that we've embedded some of our lack of knowledge in there which will come back to buy this mds uh later on so with that i hope that i've given you a new tool to put in your developer arsenal which is first driven development and i hope you have fun at the rest of the conference and if you still have questions on selling discord so maybe talk to you there bye everyone
Info
Channel: Rust
Views: 2,444
Rating: undefined out of 5
Keywords: rust-lang, rust, rustlang
Id: qUu1vJNg8yo
Channel Id: undefined
Length: 28min 39sec (1719 seconds)
Published: Wed Sep 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.