J. Z. Kolter and A. Madry: Adversarial Robustness - Theory and Practice (NeurIPS 2018 Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to nurbs welcome back to Montreal if you've been here before live here or welcome if this is your first time tuner ups or to tuner trips at Montreal this morning we're going to have a great tutorial on adversarial robustness theory and practice and let me introduce our two speakers today Aleksandr Madhuri and Zico Colter Alexander Madhuri is a professor at MIT at the Cecil laboratory and his work focuses on optimization and robustness in particular in deep learning xico Coulter is a professor at Carnegie Mellon and also the chief scientist at the Bosch AI Center and his work focuses as well on optimization and in particular rigorous explainable and robust deep learning algorithms please welcome the two of them and let's enjoy the tutorial thank you I'm Alexander mater and together with Zico we talk about robustness so is something that we sort of knew what it is but we it still caught us somehow off-guard and you know because of that a well we have to now struggle okay and actually we wanted to use this opportunity to not only talk about the solar business but also to sketch a bit broader picture of the peel of the field and the challenges that we face when we try to deploy machine learning in the real world okay so let's get started okay so I think it's safe to say that machine learning is a success story currently you know if you work on machine learning there is plenty of opportunities to feel proud about yourself over the last decade so we made progress on some of the flagship changes that just a few years before we did not think I would enrich now they are so that's all great and in particular this kind of breed a lot of excitement okay so now you can't open the newspaper without reading about all the great things that ml or a I will bring to us and in particular you know all successes essentially inspired people to kind of apply or think of applying machine learning to anything you do okay in a way no matter what you do if you don't put machine learning into it you are doing it wrong okay so this is all very exciting it definitely makes this will be very exciting to be in system you just you know this also however comes with some responsibilities okay because as exciting as you know trying to deploy this ml everywhere as exciting as it is I think it behooves of us as a community to stop for a moment and think okay is actually detected the ml technology the way it is now actually truly ready for real-world deployment okay and that's somehow the question that we brought questions you would like to tackle or specifically you know there is many way to unpack this question and many in the community try to do it somehow the angle that we want to focus on in this talk is something that you know we kind of phrased as you know can we truly rely on the machine learning the way we have it and even that question has many interpretation you know one of them would be that you know we are kind of afraid that the AI will bring doom to us all and you know this is definitely a legitimate concern but that's not what we want to cover in this talk what we want to focus on is something actually much more mundane but because of this actually much more acute and dangerous is just the fact that essentially we don't really understand yet well how machine learning interacts with other parts of our systems pipeline okay any particular kind of whenever machine learning is embedded into a system it turns out that it gives a way for adversaries to manipulate it in a variety of okay and have seen plenty examples of that already but kind of the point here is to just not allow this to happen you know going further okay so that's when there are some bad guys trying to manipulate our system that's already where machine learning becomes you know like this is a problem for machine learning but also even if there is no bad guys in the loop there is still nature and sort of when we want to go from 95% reliability to 99 percent reliability things actually become quite difficult and you know kind of safety is still very much an issue for application of ml okay so this is kind of a general theme that you want to tackle here and I think it no it behooves to us for us to just think about ok where are these problems coming from okay so things like why would you worry about all these questions in the first place because you know we know that a kind of ml seems to work really really well on all the benchmarks that we deploy it ok and in particular here we have you know one of the key homeruns of machine learning namely the image net challenge so as all of you probably know this is like a computer vision of the recognition challenge where you know you train on roughly 1.5 1.5 million of high resolution images and your goal is to essentially correctly classify the corresponding test set and well to place every photo in a correct category okay and over here we have a progress over the years over like you know how well are we doing on this task okay so in particular we see this market deep in 2012 that's exactly where the Alex that paper showed up and essentially showed how deep learning can be you know applied extremely successfully in computer vision and it just started the computer vision deep learning revolution and in 2015 something very interesting happened so essentially the performance of the best-performing deep learning system on this challenge actually outperformed humans or to be more precise a human named Leandre karpati and you know well I think Andy did a pretty good job and the fact that you know the existing exist computer vision solutions can do better than that should definitely give us a pause and is definitely Express it's definitely impressive okay so that's great but what I would like to ask to ponder here is what does this result actually mean okay so it clearly means that you know computer vision tools the way we deployed them now can outperform human on this particular benchmark but somehow you know and this is definitely true this is exactly what this test measures but somehow in a way is human so it tends to kind of generalize you know the things to that we see to things that maybe are not there but kind of you very much like to see okay and this is definitely the case in computer vision and there is a good reason for that okay and the good reason for that is something about you know the conceptual framework that we use that somehow we kind of you know sort of are aware of but don't really think about it okay so what am i referring to here so let's quickly look back to supervised ml framework and just let's think about how we think about the process of you know training you know training a classifier and the way we evaluate it okay so usually we always think there is so kind of distribution in the sky okay that essentially says this distribution of all the objects we could ever see and would ever encounter and you would like to classify okay and now well the way we train is that we sample a bunch of samples from this distribution and we just plug it into our training algorithm and this you know this gives us the model that we train and then the way we actually check if we are doing a good job essentially if you know if our if we did any good it's essentially what we do it just we sample some more independent sample from the distribution and then we test you know how well our classifiers does on classifying these okay so in particular our measure of performance is in a fraction of mistakes we made during this test and you know it's safe to say that we made you know really really in a beautiful progress on trying to optimize this measure of performance okay so everything is great so far except there is one crucial sumption here that actually is not reflected in practice namely this assumption is that actually in practice it is not the case that the distribution that we use to train our emotion in your model is exactly the distribution that the model will encounter when we deploy it in the real world okay so there are various forms of covariant shift and essentially I said this almost never happens that the distributions are the same and you may say well oh well you know like that's how we make sense of stuff we just make assumptions that maybe are not exactly reflected in practice still they lead us to our good solutions but it turns however there is not however that in this case this may be an assumption that actually was somewhat misleading okay so essentially what can go wrong in the real world where this assumption does not work it does not hold anymore okay so I think one of the key one of the key implications of this assumption or lack of lack of this assumption is the phenomena that we just observe in our uniformly is that you know the our machine learning models the predictions are actually extremely good on average but also they are extremely brittle okay so again one of the like the most publicized this rejection of this is something called a vessel examples so over here we have an image of a peak which state-of-the-art classifiers correctly recognized as a peak and so everything is great so far however what it turns out is that there is a way for me to add just a little bit of noise to this image this noise is not random it's you know it's chosen in a very specific way and you know the good thing about this thing is that this know is essentially imperceptible okay so after I added this noise I have an image on the right it's not much different to the image I started with okay so to ask humans this is still very much of a big however for some reason now the classifier is convinced that this is an airliner okay so my favorite drug here is that essentially machine learning is truly magical a magical technology it you know can make pigs fly so that's that's going for us and by the way so this phenomena was probably if you heard about it before you help from one of these two papers but again this is and not a new thing to our field there was a lot of you know previous work that identified this kind of brittleness it just somehow you know before you know last four years in a machine learning struggled even without adverse are there knows and kind of this was not that much of a problem now once the average case performance is here this worst case performance is what we really realize you know has to be addressed okay so this is the this is the kind of the example of a brittleness but but you might say okay you know this noise is not random it's actually very carefully chosen and in the lab setting you for shirk and you know figure it out synthesize it and make things fail but you know in the real world there is so many other noise coming from different sources sources you don't have such a fine grained control over what you do then you know probably everything is fine if you just you know if you just do it in the real world like there's this brittleness does not exhibit yourself however well it turns out that you know there is a number numerous demonstration by now that this is not the case that you can actually deploy this kind of the solenoids in practice and I guess my favorite examples over here so this is just a for MIT undergrad or they are no longer undergrad but they used to be undergrad essentially what they did is they three deep into the turtle that to us of course looks like a turtle but to the state-of-the-art classifier it identifies as a rifle from very different angles okay so this is something you know I could I held in my hand this is something very physical and it really does in a full state-of-the-art classifiers okay so you know so this definitely is not something that is just of academic interest this is actually how can be happening in the in the physical world and in a sense this goes even worse than that because okay so so far when I talk about this at this I know is the noise pattern that I had to add it was very intricate and complicated so each pixel had to have a specific values to make it work in particular this turtle pattern on the shell had to be very carefully painted to kind of achieve this adversely effect it turns out that you don't even have to go to all these lengths to exhibit some kind of brittleness of our of our machine learning systems in particular it turns out that you just using something like rotations and translations is already enough to see this brittle and so here is an example so over here would here just like an image of a rifle which again correct is Kazakh correctly classified with high confidence as a rifle and how everything is great but now I start you know I start rotating it and as you see it for some of the angles I actually have a the top high confidence classification is a pro is an agricultural device and not the rifle so again whatever you do if you're you know if your objective condition system is not able to reliably recognize the object if you rotate it then you know in un the trouble and by the way this is not just some kind of queer of the pipeline you know you of course might think of using data argumentation to try to correct it and you know and we did try it and we did evaluated and the answer is it helps somewhat but it definitely does not solve the problem okay so I think we can kind of satisfactorily conclude that the brittleness of ml is a thing and I guess what we should do now is we should wonder okay should we be worried by this or not okay well needless to say mine and I'm sure zico's stance is that you know that this is a problem and for a variety of reason so you know the most obvious one is security okay so essentially whenever I deploy a machine learning system in the context where someone else might have incentive to manipulate it well I should be very worried because if I can make my system see something different then I see that's exactly how all of the bridges really happen okay and security community had a number of very cool demos showing that this kind of things can happen so there is a work from CMU that shows that you can 3d print glasses that essentially once you put them on the state-of-the-art a face revolution system thinks you are a completely different person okay so think about like automating automated passport gates you know now you can just fool them by just putting on some weird glasses okay and by the way even though in this tutorial we focused most on on image signal you know the same brittleness everything applies to other type of signal as well in particular there is a work from from Berkeley by Colleen Wagner that shows that you can do the same kind of manipulation to sound so you can kind of synthesize music clips but to you sound like music but to Alexa or you know to Alexa or other voice voice control to voice control device this sounds like a perfectly clear voice command so someone you might be listening to the music on radio and someone might be reprogramming your Alexa and you will not even know about it okay so this is about security about like when there are some bad guys to get us but again it even the safety and reliability is an issue already so recently just Tesla you know they just released their kind of data from the like object and line recognition pipeline essentially as it drives through Paris and again it's actually is doing pretty amazing job except you see that these predictions you know they can actually be quite glittery and kind of they click clearly struggle a bit and indeed if you look online you can see plenty of you know YouTube video where Tesla drivers they just record one kind of like situations like here where the driver assistance system you know it works great most of the time but over here is trying to continue just straight into the divider and the driver has to take over to avoid the you know to avoid like hitting hit and it is even like the system is not even reporting an error it just thinks really that this is where the line goes okay so we clearly achieved a lot but there is more to go so this is another reason why we should care about this brittleness but also there is one more way my one reason why we might want to care and I usually call it like machine learning alignment so essentially adversely robustness tells us shows us the failure mode of ml that kind of our different to our failure modes and this should drive the point to ask that the way machine learning solution work and the way they succeed is very different to the way we work and we succeed okay and that's I think very important to keep in mind okay so so this is all about the brittleness of prediction of machine learning system and you may ask okay is that the only problem in machine learning that we have and the answer is emphatically no okay so essentially as I tell my student every aspect of ml is broken currently if you look at it to the robustness yes okay so so far we talked about inference and the problems of SLO examples but if you talk about training there is also a problem looming there it's called data poisoning okay so what's going on there well essentially as beautiful and great as ml is it's actually infinitely data hungry we always need more and more data to have it perform well and the implication is that essentially if you need so much data we can tree afford to be too picky where it is from so we actually might you know train on data we have no full control and we like we can't really trust and of course the question is what can go wrong and this is exactly the regime of something called data poisoning and in classic settings like the goal of the data posnick is the following is you would like to coconut a training set by just manipulating some small fraction of it in a way that it maintains training accuracy to be small but it tries to hampered organization so this is the particular example I have two distributions green and blue and then here is a sub sample some sample from distribution and that's all I see any I find a linear classifier everything worked rate is a large margin classifier it generalizes perfectly that's the success story of ml but now if I'm at adversary and I also add this very weird point over here and then I ask someone to find the best classifier to classify it well if he or she does Navy they will come up with such classifier and it will just not generalize okay and this is essentially like one of the dangers of data poisoning that that we can get into this kind of being manipulated in this way so this is essentially like a fundamental problem in the classic ml and by classic I mean kind of three deep learning in particular is the whole robust statistics field that tries to tackle exactly this question however it turns out to be not so much of a problem in deep learning because essentially if you if deep learning classifier is is confronted with this kind of example what it really does is essentially memorizes the world example so essentially like the classifier you would come up with would look like this and actually it would generalize just fine so for some magical reason that we still have to understand better for deep learning this is not the problem however what is a problem is something a bit different okay so essentially instead of thinking of the goal of trying to hump organization across all of the examples okay what if we just wanted to manipulate the training set to just effect predictions on some particular set of examples particular set of inputs okay and when we look at that then think become actually much much worse so in particular what was shown is that you can just by manipulating even single example you can manipulate prediction across a whole classes of predictions later on but it actually gets even worse than that so what you can actually show is that you can use the ability to change just a tiny fraction of the data set to essentially take full control over the classifier that will be trained on this okay so essentially imagine that what I have as adversary later on is when the models deploy is that I could can do is that whenever there is an input that I want to have it classify the way I want not the way it should be classified all I have to do is I have to just add some planted trigger and whenever this trigger is present my classifier you know magically essentially makes a prediction that I wanted it to make okay so this is actually quite scary because you know as a user of the system you don't see anything wrong until I exercise this power and you know I think this is something that you know we should think much much more about but you know I will not talk about it anymore today today on Wednesday there will be a poster from my lab that talks about it a poisoning and if you want to learn more about this you should just go there okay this is data poisoning but you might ask okay so is this training cues a problem and inferences the problem is that the only problem in machine learning and as already told you the answer is no so imagine I train my MA amazing in a machine learning model and now I want to deploy it to the whole world okay I will just want to deploy it in the real world and have people use it maybe pay me for the usage so everything feels great because now we know I had my classifier and you know it sits on my server it's secure on the server I have you know everything is encrypted what has to be a cryptid it sounds like everything is great and safe now since now nothing can harm my model in particular this adversely annoys I discussed about earlier actually to synthesize it in principle I need to have a full access to the weights of the model so you might say that in this setting you are safe from it because they cannot access your model directly so you know that's the limited access that this corresponds to does having input/output aspect to the model is actually problem the answer is no okay and there is essentially the whole kind of line of work on so-called blog post attacks that essentially show that just having input/output access to your model is already enough to diagnose so first of all to reverse engineer it and also to brag noise like what will be the perturbations that would mislead okay so this is not this is not setting as either so you know and here I will be talking about this on Friday in one of the workshops if you were very interested okay so you know after seeing all of that you might start wondering so you know we've seen some bad things so you know can we try to synthesize you know what are the kind of commandments of using ml in a responsible and safe and secure function so what should you do if you wanted to do that well first of all you should never train on data you you don't trust because of a later poisoning second of all you should never let anyone use your model or observe a prediction because they can essentially reverse-engineer it and synthesize the black box attack and also in the end you should also not trust your model yourself because of adversely examples so I can assure you that as long as you follow these three commands you are just great you will have a safe and secure machine learning that's that's very you know that's very relieving but of course the Satish says that if you want to follow the sentence machine learning is useless in any of the worst case scenarios so this might make us wonder so you know are we doomed here so maybe you know there is something about ml that makes it really really nice where we just care about kind of average case performance but whenever worst case performance comes into play it just fundamentally broken and there is nothing we can do about this okay and of course you know I and you know many of you many people work in this area believe that the answer is empathetic we know that actually ml can still play an extremely useful and extremely powerful would just needs to happen though is that we need to go back to like and revisit all the ways like all the tools that we use in the context of ml and kind of riffing them with this kind of guarantees of being worst case robust in mind okay and you know even by the way if you don't really buy into this kind of story of you know some bad guys being out there trying to manipulate our systems you know you still should care about this other side the robustness because you can view it as a way to stress test your system so if there is an input that makes your system is Debbie's behave you probably want to know that okay so let's talk about this essentially like the rest of this tutorial will be exactly about this it's just about trying to come up with models that elevate one of the one of the problems that we just identified namely this problem of brittle predictions okay so essentially we would like to find classifiers that when confronted with any like small perturbation of a pig they still know this is a pig not the airbag okay and that's our goal okay so to answer kind of you know how you know how can we go about training models that are robust to this kind of adversarial perturbation well we should ask ourselves a question of you know where do these adverse solid examples even come from in the first place okay and to understand that we have to go back all the way to the kind of the basic tenets of our framework and session like you know think about what is the goal of training as an optimization problem so what you are trying to do when we do training essentially we are trying to find a setting of parameters theta that minimizes the loss on our training examples or over here just one example but you can add this sum to generalize from this and you know what is nice in particular and deep learning is that you know our models are differentiable in this parameters theta so we can use techniques that grading descent to find you know good setting of parameters that will make this loss as small as we as we would like it to be okay again there is still much to be understood but at the basic level efficient practice that's what seemed to be happening so this is a very convenient thing that this is the problem we are solving and we can solve it in a kind of in a fairly principled way so unfortunately this is also like the fact that you know in particular deep learning networks are so convenient to optimize in various shapes and forms is also where kind of you can this is what you can exploit to synthesize this illusory perturbations because essentially to get in a vessel examples all you have to do is just like look at this program program again and just repair amortize it so you just freeze the parameters let's say you trained good parameters and you are done for now and now I would just like to find the perturbation of my input that will make the you know this prediction to be bad so essentially what it means is this we want to find the perturbation that will make the loss to be large as opposed to small and you know the problem is that you know as much as you know our system was kind of our machine learning model was differentiable in theta it's also differentiable in Delta so you can use techniques that gradient descent to find this part perturbations and that's exactly how the perturbations I showed you earlier where you know where were discovered ok so there's there is however like an important question to ask you when we setup an optimization problem namely you know the question is what kind of when I do the optimization what kind of values of Delta should I allow ok and this is an excellent question and so far we know as like in principle you know if Delta was just a difference between my current example and some other example that is from different class well we wouldn't worry about that that's that's something that we would like our machine learning to do this to change prediction if we change the input to a different class but of course so this is only interesting if Delta is imperceptible or you know inconsequential in some way and again now how to formally capture what imperceptible or inconsequential means well there is a bunch of a bunch of attempts but honestly this is a really really hard problem essentially woody particular for vision tasks it will require us to formalize human vision which we are nowhere close to be so you know so this is definitely like a very important question it's like what kind of perturbation of the input our vision system should be kind of robust to and we don't really go into that I just want to mention that you know in the rest of the talk we will focus on this LP per LP perturbation essentially when we just say that we don't want we are just bounding the LP norm of the change of each individual pixel so this is a very simplistic net but in some ways you know it is a metric - definitely should be robust - before we get anything more complicated so that's why this is a good kind of milestone to like to move towards when we work on Excel robust ok so so this is our goal and now the question is this is where our examples are coming from and now the question is okay so how would we go actually about getting models that aren't are not like that are not so so vulnerable to these perturbations and in particular there were quite a bunch of people who is usually claimed when the adversary examples became a thing a lot of people claimed that this is just an evidence of failure of a machine learning of our machine learning totally that in some way we failed because we let these are bizarre examples to arise okay and I think the crucial thing to realize to understand you know how to get robust models right that we did not fail at all okay so at the cellular like the lack of detail robustness is not you know it's not our failure because it's a like the existence of the examples is completely not at odds with our current you know current machine learning toolkit in particular what it is that our models try to achieve most of the time what they try to achieve is something we call standard generalization so what I would like to do is I would like to if I sample an example from my distribution that is in the sky I would like the loss of my classifier on this example to be small in in expectation okay so an expectation is great for average case performance but adversarial examples is a worst case notion it's actually it's of measure measure zero so this expectation is completely invariant on existence or non-existence of original examples so in a sense if we never tried to get like especially if the goal that all of our machine learning took it striving towards its completely invariant over existence or some examples we should not be surprised that they do exist okay so it is essentially if you wanted to get models where you know you don't get into trouble with our examples what you have to do well you have to change the goal you are trying to achieve and the goal that you get that that you should arrive there is something called no let's say a dress hardly a robust generalization in which what you do is saying Oh what I would like to do is I would like to do well in expectation not only on the sampled example from distribution but also on the worst case perturbation of the example and just like plugging in the smacks inside of expectation essentially is is the whole difference between what we were doing so far and what we need to do for arbitrary robustness okay so this is you know kind of this is this is the general outlook of like what is the problem you are trying to tackle and where we are going from now so now we know that our goal is the original robustness now it's time for taking a deeper dive into this topic in particular Zika now we'll talk in a bit more in depth about other examples and about kind of ways to figure out how to even check if there exists a list for examples for our system or no and then he will talk about actual details of how do you go about training models that are robust to it and then I will come back a bit and just give you a bit broader perspective again about like what can we do with the tools to achieve over here thank you [Applause] alright this on good so as Alex mentioned after that great sort of broad introduction I'm gonna have the the fun job of actually going to some depth about how you do it so I'll put this down in a second just just one second one thing I actually do want to highlight which Alex didn't mention maybe this resumes that we actually also have an accompanying website this tutorial so adversarial - ml - Ettore org has a set of notes that go through pretty much all the details when described here and so this is the website here and the notes here are actually in as written as Jupiter notebooks so there's there's notes with you know pros with examples but also with code that will do everything I'm going to talk about so all the examples I have there's actually code and walkthroughs on how to actually do this and you can actually download these notes as Jupiter notebooks or just read them on the web and use that to to go through sort of these these examples so if anything I do here goes little bit fast which it might please just know you have this resource we'll have it again at the end it's also I think in our Twitter post about this and you can you can use that okay so now look at that lovely display okay I quit my yeah okay there we go all right so as Alex mentioned he stoie gave a broad introduction and we are in my port I really wanna emphasize focusing on this problem of pests time adversarial attacks so of course average day robots this is much broader than this but this is the topic I'm gonna focus on and particularly I'm gonna kind of focus on in some sense the the attack and then the defense so the first part is going to be about constructing adversarial examples or maybe a little bit more broadly also verifying whether or not these things exist and the second part is going to be about training robust models and then particularly first we'll talk about constructing have us examples as a verification then average sale training as well as provably robust training okay so as Alex mentioned the big picture here is the following we want to train a model ideally that is well general as well not just on a fixed data set but actually on adversarial perturbations of that data set all right of course transportation is not you know it's not always easy to do so of course what we're gonna do in practice is we're going to have some fun I theta set that we're going to minimize this objective over all right so on some data set s we're going to minimize this robust objective here and what this portion the talk is about these sort of two portions really is part one is gonna be about the inner maximization so what that means is it's is about either finding adversarial examples or actually somehow otherwise verifying that one cannot exist I'll talk about what I mean by that in a second and part two is going to be about training a robust classifier so how do we solve the outer minimization problem now that we've already broken machine learning will tell you how at least we might be able to fix it in some cases all right so let's dive into part one now here's a view kind of of the lost landscape and the problem that we have right so we have some perturbation region Delta that's shown on the bottom there we have some initial point Delta zero we have some loss function that exists for different perturbed Asians of Delta and so if we want to maximize this quantity what Alex described is just sort of well we can just run gradient descent right but what I actually want to highlight here is there's other possibilities as well and actually understanding these things really helps you understand the nature of adversarial examples and verification in in this setting so we really have three options the first is we can do local search right we can just do sort of search to find an adversarial example and this is actually the most common thing that's done in practice this is sort of what people typically talk about when we talk about adversarial examples but there are other possibilities too when I was essentially what I want to really emphasize here is that this the other approaches are also possible so it's also possible actually maybe not for large models but for relatively small models to actually solve this problem exactly using techniques and combinatorial optimization so weak actually could not just do local search but we can actually find the worst case example for a given classifier and finally we can also form an upper bound on this loss function at least over the over the region of question and if this upper bound has nice properties like it's convex or really concave because our maximizing it we can actually find exact solutions as well then we'll also let us give verifications of whether or not an adversarial example exists or not so the first part of the talk here is really going to be about running through these three possibilities and kind of checking what's possible and what's not possible one thing I do want to emphasize though is that all of this work though it's relatively new in machine learning and people in particular it actually goes back a long time to the topic of robust optimization largely in linear models and so the only point I want to make here is that for linear models that actually turns out you can solve this problem exactly and all those three cases I described before actually collapse to one so without too much detail here if we have actually just in this case consider a binary classification problem it's actually possible to find the worst with with norm balance for your region so you can find the worst case perturbations it's just sort of the ones closest to the to the decision boundary and actually sort of formally what this means is that you take your optimization here you can actually push the the max inside the loss because the loss is negative is monotonic decreasing it becomes a min and this can be solved analytically in terms of the dual norm of your perturbation region alright so this is actually not I'm not gonna go into detail about how this works there are of course a lot more details in the toriel but I just want to highlight the fact that this is not a new probabilities in the context of linear model this goes back a very long time ok so now let me get to this one of these three cases of how we deal or think about adversarial examples in the first case we talk about local search so you know everyone knows of course that the lost landscape of deep learning is or depleted classifiers it's non convex and nasties if you look aloft a month but the big the great thing about deep learning is we just don't care about this anymore right we don't care about the fact that we're optimizing non convex functions we're just gonna do it anyway and we're gonna try to empirically find an attack that exists within some norm bound so how do we do this and there are a lot of ways of doing this but the main tool I want to highlight was actually what Alex mentioned to our variants of projected gradient descent the idea here is very simple we want to optimize our perturbation bound in some region Delta that's represented by the gray region there and what we do is we just take a step in the direction of the gradient of the loss function so we're taking a pulse up in the positive direction because we're trying to maximize the loss function and then we just project back onto the set so we find the closest point in the set to our to our projected point and we just repeat this process and this is what and then we just hopefully we'll find our way to points of high of high cost when actually I would say that most attacks not quite all of them but most tactics Daley's in practice are based on some variant of this but they're often based upon simplified variants of this so let me describe actually one for the most part I'm actually not going I'm gonna talk about sort of the methodologies here not specific attacks people have proposed in the past because I want to follow this general framework of projected gradient descent but I think it is worth mentioning one particular type of attack which was actually one of the ones that kind of ignited a lot of the interest in this field which is the fast gradient sign methods this is work by by Ian good fellow and others in 2014 so but the way this arises from gradient descent at least in our sort of viewpoint is that we can take now our perturbation region Delta to be the L infinity ball we're thinking about a region where the L infinity norm of the perturbation is less than Delta so it's less than Epsilon ie the perturbation each coordinate the perturbation is is between negative epsilon and positive epsilon in this case actually projection is very easy you just clip to the ball so the freshens that's one reason why this is a nice norm bound you just clip the entry so the ball that's the projection operator and if you think about what participating descent does in this in this setting if you take a big enough step size in the direction of the gradient you will end up outside the ball right you'll end up outside the ball always because there's always something direction and if you then take a big enough step you'll actually I mean you project back you'll actually always be at a corner of the ball right so you basically follow this direct gradient direction you put it back and for a big enough step you can see you'll always hit a corner and that corner is just equal to epsilon times the sign of the gradient in each coordinate right and that is exactly the fast gradient sign method it's very nice because it sort of gives you as big a step as you can possibly take within your ball very cheaply because I only take Kellen to evaluate the gradient once and then you get that that stuff there so it's become a very sort of popular technique for doing this all right so let's see how this actually looks in practice and I'm going to apologize now to everyone else because this is 2018 almost in 19 and you're gonna see almost entirely Emma statisti toriel so I'm gonna plug for this in advance the reason is actually not because these are the best examples to show a taxon because we know we can attack as Alex Otte imagenet we actually have an example of code to do an attack on a larger problem here but for some of the defense mechanisms especially the the combinatorial optimization methods we're still kind of at the scale of image net here so I want to be very explicit about that is that a lot of these defense techniques do not yet scale to really large problems and so you know so that you also have code that can run this all and you can run it maybe on one key P you not you know in a few minutes not days we are gonna we are using M&S t as an example and in particular we're gonna use two networks a two-layer fully connected network and a six layer convolutional network to illustrate a lot of these points okay so let's see what these fast gradients sign methods actually do and they're also another thing about and this is these things a little bit more apparent than they were you can't even see them in an image net but you actually can see them here in a minute okay so we're taking a bigger almost like a perceptron and we're gonna take adjust the image in a step of the in the direction of the sign of the gradient using FG SM and we get an image like this right so these things are being adjusted for now I'm actually not worrying about there being negative pixels you can do all this the same thing if you just clip the absolute range of the pixels being put between 0 & 1 but these are just numbers as far as the customer is concerned so I'm not gonna worry too much about that do the same thing for a confident of course they're all differentiable so you can do this whatever your classifier is you can still do it you get something that looks kind of similar maybe a little less sort of splotchy little few if a little speckled noise but it looks very similar and if we look at the actual test error of these things so this is the test error on them this with a perturbation of size 0.1 which is relatively small right that's that's not gonna change our perception of the image here then on clean accuracy these things both do quite well you have you know maybe 3% and 1% again these are not even well trained confidence here we're just sort of running something quickly and unfortunately though when you run this attack method you get a very large error for both these things okay so this is this is maybe not good but they rarely actually this is a really simple method and we can actually even do better than what I'm showing you here so the next step is just to actually run this thing I showed you earlier which is a more fine-grain perfectly gradient descent method so rather than taking sort of one big step maybe the first addition point and the first Direction points into that corner but the top the top right corner but a top left corner here is actually has higher loss so you can imagine if you take sort of small steps in the gradient you're gonna end up at a better point over all right so you thought that this sort of seems intuitive right and then you once you get to that corner you sort of start kind of escape but just keep getting predicted back onto the ball there and this is the productive gradient descent method or this invariants are the method it's much slower than FG SM because you are taking sort of multiple steps here but typically finds a better Optima for this problem one thing which I'm going to only mention briefly which is actually really important here in practice if you do this is the following so if you run PhD to run an attack don't just run normal PGD ron was actually called projected steepest descent actually projected unprinted that normalize that steepest descent so the idea here is that at the actual examples the gradient of the loss function is often very small so you have to take a really big step size to get anywhere but then once you get out of that really small region you start sort of taking really big steps then you hit the corner you kind of reduce to like the fast gradient sign method anyway and so what we can actually do is instead of taking gradient steps we actually could just normalize our gradients in the following way instead of sort of taking a step addition the gradient we actually take a step in the direction that maximizes some inner product of the gradient that's the term V here except it's just some norm constraint on V ok and what this does is this basically searches in some box that's showing here with a dotted line what's the point in that box that maximizes the inner product of the gradient and for example for the L infinity box that will always be one of the corners it's actually very similar derivation as we had for F GSM what this means is if you want to up limit a particular in the set method you really and that's all I'll say because we do have some more examples natori I'll do this do steepest descent don't do unnormalized sorry normalize deepest ascent don't do normal to degrading descent because the loss the the the the values of loss here are very different sorry of the gradients are very different over this over this region typically alright so let's see what this looks like now we sort of gone through it in some some detail so we can do things like apply after use them today we don't do at PPG doesn't look that different actually it's both look that's both fool the classifier into thinking the 7 is actually a 3 but if you look at the actual test error there is quite a big difference here so FG sm you know gets these errors so you know fools 41% of the examples on on this end this classifier where as PGD and proves that quite a bit so so there actually is quite a bit of difference in practice on this and this will come back to us and we talk about training robust classifiers okay so two more quick notes here the first one is that what I've described so far is actually what's called an untargeted attack and that we are just trying to maximize the loss no matter where that puts us we can also have to try to both maximize the loss of the true class and minimize the loss of some target class to try to fool the classifier into thinking it's actually at Target class and some very nice thing happens here which is actually gonna be the basis for the more complex relaxation is that your loss function is something like the cross-entropy loss then the difference of two of these loss functions the first term the normalization term actually cancels and so the actual target attack is the same as just maximizing the difference in the class logits that's the output of the linear layer of the classifier over the two classes that you have all right so this last formulation here is actually equivalent to doing that and this is sort of an important an important simplification we can have okay so let's see what this looks like we can you know take a step we can if we maximize take this seven here we maximize the zero class the the probability you're really about the logit the linear term the zero class - linear from the seventh class and we get the zero if we do the same thing for a different loss function maximizing the the linear term on the two class and minimizing out on the seven class we get the prediction - now one sort of note here is that we might succeed in fooling the classifier even if we don't actually achieve the class we want because the best way to minimize say the to label might be to actually minimize the one label even more right so we we might not succeed at changing the class to our target but we will succeed ideally and creating an a personal example and the last thing I want to say about the sort of local search of taxa so we can do them in other norms too so I showed everything in terms of the LM today norm but everything we've done here right the the steepest descent everything normal s deepest descent anything like that can we derive just as easily for other norms for for the Ellen for the l2 norm if the l1 norm sort of Fidel zero norm it's a little trickier and and it all works so for example l2 attacks tend to have more mass in certain regions and less sort of spread out but the nature of the norm itself okay now the last thing I want to say since I'm talking about attacks here is that you might be asking what about all these things I've heard about of you know these different attacks people to propose because even a lot of papers in it and you know with apologies people that propose a lot of these things there's actually some in the audience I know here so we'll see how they how they feel about me afterwards but I think and these papers are of course really important because they were the first to study these attacks in some detail but I think we're actually at a point in the field where it's more valuable to describe the attacks in terms of the perturbation bound you're allowing and the optimization procedure using to optimize that right this is sort of an opinion now but but I think this is a more clear description than relying just on the name or the publication they proposed this because it isn't always clear then what's actually being done here okay moving on now this has been the most common way of solving this inner maximization problem but the thing I want to highlight here is there are other ways as well so in particular we actually can solve this exactly we're still gonna solve the targeted attack version exactly using combinatorial optimization and so to do this I want to actually dive into a little bit of detail here and and throw up a little a little bit more mathematical notation to form the adversary attack as an optimization problem alright so you can write our Network and I'm actually gonna be explicit now and say we have say a multi-layer r lu network so our input of the first layer z1 is just the in the input X and after that each layer is the r lu of some linear function of the previous layer and our class logits are a linear function of the the previous layer the second to last layer right so we don't apply our lu or even a soft max to the to the last layer it's just the logit sits pre softmax ok so now we can actually write a targeted attack as an optimization problem this easy one to solve but the point in part this is an optimization formulation of how we think about attacking classifiers all right and the idea here is we're going to minimize the unit basis on the true class - the unit basis on the target class the inner product of that thing and the last layer which are the class logits so again all we're really doing here is we're minimizing the the class logit of the true class and maximizing the class logit of the target class all right this is just notation for that so it's important to kind of keep in mind I'm gonna switch this connotation so it's important to kind of keep that in mind subject to now our relative constraints right we have to have you know our next layer be the rel u of the previous layer of a of a linear function the previous layer and some norm bound constraint okay so the problem here is that this is not a problem that can be solved by any sort of normal optimization propped amaizing right that has a nonlinear equality constraints in it and of course you know this makes it a very hard problem but the key point here is that although it's a little hard to solve and this is one thing we go into a lot of detail with in the notes is that we actually can write this problem equivalently in a form that at least can be solved by standard commentarial solvers so things like integer programming solvers which has been done by some but by us as well some other people in the past as well as things like SMT solvers there's a lot of work on using SMT methods to solve these things now in practice off-the-shelf solvers adjust things like C flex or Goro B if you've heard of those they scale maybe till maybe I've had a hundred hidden units and we're not we're talking about really really small networks here just to be clear but you can get a reasonably performing like em in this network and actually verify it formally which is kind of cool you can actually can can check for a get an example you know what's the worst you could do on any given task one of the key challenges of this actually ends up being at just computing element-wise bounds on the activations so you need to be able to basically compute lower and upper bounds on the linear terms before you go into the rel you this is actually gonna be a thing that comes up later also we talked about convex methods so I just want to mention this here briefly it's actually quite easy to come up with simple bounds on activation so for example if I have some some activation Z where I have an upper and lower bound on bat and I want to upper and lower bound a linear function of that I can just take the positive and negative parts of my of my W matrix and do it this way but the actual thing here was what was the figure is a little better to show this what's really happening here is we're propagating intervals to the network so we're you know if you just run the point to the network it produces some output but if you actually think about a region around that point that region will similarly propagate through the network and really typically grow in size as it goes to the network that's all we're doing here we're forming these interval bounds to kind of show what sorts of regions you can achieve with our classifier it's going to return when we talk about convex relaxations all right so the last thing I want to say here is just talk about the fact that we can use this technique to certify robustness of course of a classifier and actually we don't even need to solve the problem we don't even actually need to find the or know about we need to solve up we only if you need to even know what the solution is all we need to know to know whether we can certified example or not is the objective value of that solution so let me sort of so this sort of go through that what I mean by that so remember our objective that we're trying to optimize in the optimization formulation of these attacks is the following we're trying to minimize the class logit of the true class and maximize the class logit of the other class some target class so if I solve this problem and the result is negative that means the target class's activation write probability is higher now than the true class and so this means we have an adversarial example on the other hand if I solve this same thing and I found my objective value is positive this means that there is no adversarial example at all right because the class probability of the true class is still higher than the probability of the target class so there's no method can't come up with any they will change the label with this example so just to illustrate here if he has a 7 here say we say we solve some problem where we're trying to have a targeted attack on the class zero sort of maximize minimizing the logit on unit seven and maximizing it on unit zero and this is negative that means we have an adversarial example it actually doesn't still necessarily change the class to be a zero but it's an average sale example whereas if we do something like this and we find that the which are not trying to minimize the or maximize the logit on class one then we and this is positive then we actually know there's no Aboriginal example at all that can make this classifier predict class one okay so this is the sort of fundamental ideas and we actually have some against some code you can play with where you can answer you can use a linear on integer programming solver to formally verify at least some small networks but still imagenet are certain that music still m this size networks not image that size networks okay so lastly before I move on to the to the next section here I want to talk about convex relaxation so this is the third point so we talked about sort of local solutions those are sort of lower bounds and subjective we can also talk about exact solutions and finally we can talk about convex relaxations which provide upper bounds on the solution to this problem okay so how do these comments relaxations work there are actually several ways you can do this one of them is based essentially on relaxing the integer programming formulation of the verification procedure so the idea here is that the integer program formation had these hard constraints that basically the post activation had to be the rel u of this previous layers linear function right within some bounded range actually because we actually had an upper and lower bound that these could attain now what we can do instead is replace that non convex set with its convex relaxation which means that the the next layer and the previous layer they don't lie on the relative anymore that now lie within some relaxed linear set and the cool thing about this is that if you relax the integer program this way it's actually a very sort of simple straightforward relaxation of the of the linear program of the integer program you get a linear program now and linear programs there actually are efficient polynomial solving them so unlike commentary op sedation we actually expect that we sometimes can really solve these things okay so a quick sort of note on this another way of sort of viewing this is that if you take the norm ball and sort of feed it through the network that first figure there shows kind of some complex shape and what we're doing effectively is we're relaxing that shape or thinking about an outer bound on that shape that gives us more freedom in choosing if I clear adversity or examples now what this means is that the objective of a targeted attack for the linear program it's gonna be actually a lower bound on the objective of the true thing and this will come in handy in a second when I talk when I talk about certifications so they actually can still be used to sort of prove sometimes that no ever sale example exists of course solving the LP is kind of slow still and so there fortunately there do exist fast methods based upon things like convex duality for first approximately solving that LP but what actually want to highlight even more than that is even an even faster method which some reasonable work has been showing is competitive with these more complex relaxations and that is just interval based bounds so I showed you before sort of how we propagate interval bounds really this I alluded to how we propagate interval balance of the network it turns out if you if you sort of take those interval bounds we now have a known region that the last layer of the network can lie in and if we throw away everything else from our network we still can actually get a bound on how much I can sort of decrease say my my target attack right there just an analytical solution to this and you can solve it kind of exactly here it is if you if you really care but the real point here is that I can also use this interval balance to provide an even looser relaxation of the problem okay so I've sort of said a lot here in the last two sections and I kind of want to pop back up now for a second and highlight what the real value of all this is okay so if if some of that was a little bit into the weeds for you don't worry about it we're gonna sort of pop back up a little bit right now and then when we talk about training will actually give some sort of examples of how these things really work but here's the high-level thing I want to give to I'm sort of want to show you so say we have our trusty digit 7 here this is the first example of the M this test set by the way this is why it keeps appearing here again and again and suppose I solve my verification problem using the binary integer believe these two the integer programming formulation of my exact verification procedure okay and if that result gives me a negative solution then I know that there exists an adversarial example right in this case trying to make the thing class zero because I'm minimizing the class zero maximizing the class zero logit minimizing the true class logic if on the other hand I do the same thing with the convex constraints I also get a negative number because that one will of always be actually always be lower than the first one but importantly the fact that I have a negative term here doesn't actually tell me anything I don't know that an adversary example exists or not because I've relaxed the set of allowable sort of activations I can achieve and so the fact that I can achieve kind of an adversarial example in this relaxed set doesn't actually tell me anything on the other hand and so so for finding adversarial examples these convex realizations are actually quite not very useful they don't really provide much in that regard fortunately though we sort of don't have that problem we already know how to construct adverse examples pretty well just using gradient based techniques well really don't know how to do well with these great techniques is verify that no such example can exist and the nice thing about these convex relaxations is that if we solve the same problem say you're trying to make the class label look like a class label one and get a positive number so in other words what that means is the class logits the class probability of the true class under this relaxed attack is still higher than the class probability of the alternative class the target class what I now know is that there's actually no adversarial example for this class so I've actually managed to verify that no adverse sale example exists and do so in a way that only involves convex methods that's very that's very nice because these things are a is solvable and we actually sometimes cannot just get attacks but we might be able to actually get proofs that no attack exists okay so that actually covers now the three main techniques we have for dealing with that inner maximization problem and now I'm gonna pop back out oops and start talking about training models not just testing them so let's take back to the big picture here what part one was all about really was creating adversarial examples right so how do I solve that inner maximization problem what part two is gonna be about is how do i training model that somehow is robust to these adversarial attacks and it turns out that actually the strategies I had before will exactly lead to methods for training so we're now moving from just the atomization inner minimization maximization to the outer minimization and if we think about the techniques we use first for search here using our local bounds naturally leads to a procedure that we call adversarial training so this is also sort of advocated by ie and good fellow and and others and I'll describe what that means in a second we can't really use combinatorial optimization to train models it's just too slow it takes on the order of minutes or I mean or for big enough models you know until the heat death of the universe to verify them so we're not gonna build a train you know models ever and well maybe really small ones but not larger ones while using combinatorial optimization we can however train models based upon our convex relaxation and these will lead to provably robust models models where we can say not only do they seem to perform well but it actually can guarantee you that no attack no test time attack is going to fool this classifier at least within a certain norm ball okay so let me start first of all talking about adversarial training the idea here is that how do we go about optimizing this objective how do we go about training a model that optimizes them that minimizes over our model this sum of robust adversarial robust losses now we would like to be able to optimize it with gradient descent the way we optimize everything in deep learning but how do I take the gradient of that term when it involves the Maxima well sort of very nicely it turns out there's a really nice answer this and it's called Danskin zero so the idea here is very simple and intuitive but it's actually a very subtle point is that if I want to take a gradient of this max term that's the inner term that I'm trying to optimize here what I can do is I can just find the optimum in that over that set and take the gradient at that optimum point this sounds obvious and I thought this was obvious but actually for general optimizations is a really subtle result it takes many many pages to prove and the con best case is not quite so bad but this is actually a pretty subtle result but it's really really nice because it means that in order to to solve to solve our optimization problem with great in the sense we don't need some sort of fancy new procedure all we really need to do is be able to find adversarial examples and then optimize from those positions now or optimize our parameters at that at those points now it turns out that to do this I should have sort of a big asterisk here because this only applies when we perform the inner Matt's vision exactly which brings us back here to the combinatorial space that we don't want to deal with but now you know of course we have deep learning to the rescue but let's just do this anyway solving our inner optimism problem sort of empirically and not caring about the fact that you know we're not really in the regime of truth theory here in Danskin theorem and this leads to exactly kind of the standard adversarial training procedure advocated for example by in good fell and one of the amount of the original adversarial example one of the many papers leaks that excited people about adversarial examples Cydia here is very simple and all it really involves is that instead of training at the actual data points you train at the worst case perturbations of those data points right so I select the mini-batch for each example an enemy bat mini-batch I find the worst case perturbation and then I update the parameters based upon the gradients at those worst case perturbations that's actually all I do in this whole process and I should say it's also sort of common sometimes mix robust and non robust updates so you might take some grade and step to the original examples some of the robust exhort of worth gives examples but really you can actually just take the worst case ones - and this still works ok ok so let's say this works now so remember we had our old ComNet which we transferred with normal training and it did really well on standard error here but it actually did a lot worse on you know if we attack it with a finite the fast gradient sign method or with PGD pretty great in the sense it does much worse so the nice thing is if I now train my same exact architecture using this method on the right here we actually get a much more robust Network right so now if I train it on this attack it's only suffers about 2.8 percent error on this now this seems good alright and actually I think Alec sessions a little bit and I think he may touch on that some more later but we have to be really careful about declaring success right now right because what I'm doing in my evaluation here is I'm evaluating the mess on the exact same attack that I trained it on and we know deep learning is really good when you give it something it's really really good at doing that all right so so it's really good at optimizing is that we tell it to optimize and this has actually been sort of the the challenge for a lot of adversarial examples in the past is that is that we have to be sort of careful so however here's what I will say to the best of our empirical knowledge models trained in this fashion with a strong PGD so using basically predicted steepest descent so normalized type of PGD with additionally some random restarts in the PD process seem to be robust to any empirical attack we can throw at them so if I run the same thing maybe say with more iterations of PGD with more randomization on this kind of stuff I actually don't increase my loss that much and in fact one of what Alex's group they posted a model they trains not not this one but I feel a more complex model they trained in the same fashion and people really haven't been able to attack it that much and the crux of the issue here is that we are trying in this PGD based training to really do as good a job as we can of solving that internment that inner maximization problem and so when we do that we actually tend to get classifiers that appear to be pretty robust to anything we can throw at them at least that you know they the the error that PGD gets is about the same error that any other attack we can really formulate gets with with some you know maybe some minor improvements at the at the margins so it looks good but we should really be careful we're Clini to success and so actually we have this model again in the notes and you can try other attacks but I actually think for this you know small data set this is probably actually pretty robust model again for pretty small epsilon this is that perturbation of L infinity perturbations of size 0.1 but one thing I want to actually emphasize is that what we shouldn't do or what's not particularly informative is to evaluate against different types of attacks like taking an elephant a train model and evaluated against l2 or l1 attacks because of course the models train one attack it's not going to generalize we'd also know that right and if you want to have some if we want to pretend against l1 or l2 attacks we should just train against l1 or l2 / Peet whipped with l1 or l2 PGD if you want to have some notion of generalizing to say new attack modes right new sort of general perturbation sets we're not quite there in in in the field yet you would need to sort of define some notion of some set of allowable perturbation regions and sample from this and then try to generalize the new ones and we're not there yet so this was the one that add a final note there it's really not informative to evaluate these models against other threat models if you want a model that will defend well against l2 or l1 attacks just train a model against l1 and l2 attacks for the same PGD method now another kind of final question is what makes these models robust why are they robust and one way to look at this is actually by looking at the cost surfaces of these models so I'm showing here is actually the The Lost surface projected along two dimensions here including the the adversarial dimension and if we take our standard training that loss really is quite steep so so very close to the point this is all within an epsilon ball of zero point zero point one for M miss there's there's a lot of directions of barely really sharp increase whereas the lost surface of the robust model and the important point here is actually the scale on the right and you can see that but it's basically the bumping is here is just sort of numerical things that really what's what's happening here is that this is essentially a flat surface so it seems like when we train these things with PGD we genuinely get lost surfaces that are quite smooth and flat they don't seem to have these big peaks and that of course means that they're you know they're much they're not going to be susceptible to these adversarial attacks now of course we can't verify that along every dimension right that's the whole point is that you know we can't check every possible corner of this this to the end corners of the of the norm ball but we can get some sense by looking at these kind of figures and looking at the loss in this way okay so this is the sort of in some sense the the best way we know how to train our empirically trained models against attacks like this and while you know the jury is of course still out whenever you have a method that's only empirical they seem to hold up really well against possible attacks so the natural question is you know we have these seemingly robust models and in the last section I also describe to you some tractable convex ways of verifying whether they're classifiers robust or not let's see if we can take these robust models and actually verify them so let's use the convex bounce in this case we're gonna use those interval-based bounces the simple one to describe at the end to see what sort of level of adversarial performance we can guarantee for the robust model okay so here we go here's the big reveal here so clean error for both the confident and the robust training comm that's about the same of course FG SM attacks do much better on the robust are on the normal or are able to fool the the normal model much more than the robust model PGD also fools the normal model even more and the question of course is what's this robust bound going to tell us you know maybe it probably can't tell us anything about there about the the the normal combat but maybe you can tell us something about our robust model be trained and unfortunately the actual robust bounds you get if you run the convex procedure for both these things is completely vacuous and it's not just almost vacuous it's like really totally vacuous here like I think you have to get epsilon down so this is an epsilon of 0.1 I think they get a reason value got to go to Epsilon point zero zero zero one or a zero zero zero two or something like that it's you you're not you're gonna get anywhere close with these kind of things so what's happening here why you know we had this we had this great thing but but this that we can prove about models but but it doesn't give us anything useful of course if we could solve the IP about these things and we might have more stuff but actually this model is too big to solve the integer program for that that would take I don't know about the heat death of the universe we would take a really long time at least in my implementation to solve this okay so so what's going on here and the key insight here is that models can be bottles that can be broke Bexley verified are a small subset of what seemed to be the actually robust models right and so if you think about sort of the Venn diagram here there's all models and putting on robust ones there's some space of robust models in that and then within that there's some space of convex lis verifiable models right maybe we haven't actually proven whether that's empty or not that could could be empty but stay tuned for that and it turns out that these convex balance sort of the the the the tractable provable balance that we have are very loose unless the model is built specifically with them in mind alright so it's great that we have these bounds but in order to get them say anything non vacuous about classifiers we actually need to train models upon them so this is in fact bringing me towards the end of what I'm going to say here because it turns out this is possible so the convex bounds I described select the interval balance those those notions of propagating a box through the network this just like a normal network is actually also a convex function of the network parameters so it you know it's it's a function of the weights and the bias these and all its kind of stuff and I don't you don't don't don't I won't go back to that but it's basically involves you know Max's of these things and zeros and the positive negative parts etc but it's it's just a differentiable function of the network parameters and what that means is I can that my final bound that I have my final convex bound I have on these things is also a differentiable function of the model parameters so what can you guys needs to actually minimize not my actual robust loss that's very hard that sort of a commons real hard thing but i can use to minimize an upper bound a strict provable upper bound on the convex our sorry on the robust loss of a classifier of course the the actual outer opposite review is still not so on convex but at least we have you know one people about and actually some reasonable work kind of surprisingly shows that interval bell interval bound propagation actually can sometimes work better for this then more complex approaches even though I think the jury is still out on exactly what level of complexity you want for your training first for your for your verification okay so let's see the final the final thing here I'm going to show before we will pop much further out again and talk at a higher level and there should be time for questions at the end too if I I think we'll finish before before 10:30 okay so here's our different settings here is again showing test error the the attacks of the different things as well as the provable bounds are confident you know gets bad in all regards except normal clean error our robust confident we train with PGD works well and all the empirical attacks but the bound itself is complete vacuous but finally if I actually train the network to minimize this bound we get the following so this is just a quick one that we have again in the notes you can do much better than this you can get this down at least on the end this quite low but I don't want actually make it too low because that will give a false impression you can do that any dataset and other ones it's much harder but least on M this you can get these down pretty low and so what we have here it's not going winning the awards by any means what we have is we have an amnesty where on the test set I know that no matter what attack I have no matter how many more papers are written about attacks no matter anything no one's gonna have higher error higher robust adversarial error with Ellen a lot of qualifications here right with L infinity bad perturbations under epsilon of 0.1 no one's gonna ever reveal a higher error than then 9.7 on this classifier so that's that's that's nice it's something we couldn't do I didn't know how to do even a little bit more than a year ago and these techniques have now really been sort of been been growing a lot within the machine learning literature and so we're coming with a lot more techniques for verifying them things as well as trade as laws as well as training these verified models okay so one last note for me before I throw it back to to Alex and that is these results seem good right it seems promising here I had you know a bad normal model I trained her a robust model everything looked good I could even train a provably robust model to get maybe a little worse accuracy the Prius a pretty good accuracy and seem really good however one thing I should emphasize is that these sort of promising results at least currently are to a large extent actually more a function of M inist than a function of our ability to scale these things to really big problems all right so so M this is also actually quite easy against to prevent to secure its attacks because it's kind of a binary problem right if there's there's pixels that are on or off and you know adding a little bit noise here and there doesn't really make things that much harder intuitively but for example like see far the best bounds that we know of the best empirical balance that we know of which you know see for herself to you like three percent error currently maybe even lower now C bar 10 this is I'll see var 10 the best empirical bounds we know of say that the error is we can't find a TAC that gives that a pretty small perturbation about smaller than the one we showed for M this there we have a guaranteed bound of v we can achieve 53 percent error with the best methods that we know how to do and the provable bounds does we can do is say there's about us you know we know we can form a model that has less than 70 percent error on C far worse again the state of the art and now is around three percent so for larger problems scaling these things up and finding the architectures the both the architectures the bounds and the optimization procedures they will make these things work is still very much an open question and it's the topic of current and ongoing research both in the empirical attack side and the fence side and in the provable attack in the fence side so next up Alex doesn't talk a little bit more about this as well as a higher level of issues that arise from these adversarial examples [Applause] okay so welcome back thanks it was really nice even I learned something so yeah so so far Zico like this like provided this excellent view into kind of you know the knot and nuts and bolts of like how do you diagnose robustness how do you ensure robustness and where we are but somehow I think we both feel is very important here is not to miss the big picture here that essentially against security and ensuring the security is very important but there actually is something even kind of a bit more high-level that is going on here so sort of like that the high robustness maybe it's an interesting notion by itself even if you take it outside of the context of security okay so just to remind so this is like in some ways this conceptual picture that emerges here is that kind of we can try to not only focus on the applications of ml that require security but actually like once we know that there are applications where we care about this kind of security aspects of ml we can take a step back and look at machine learning and see how does machine learning look like to this lens of robotics okay so specifically you know if I on one hand take this standard machine learning that we are doing now which responds to standard generalization you know how does differ from this kind of robust machine learning that we my trying to build here so she's like on the very mathematical level all that we are trying to compare is this two fields one that tries to maximize like two out well to achieve the first goal and the other one that tries to achieve the second goal and one thing that I really want to emphasize here the work should be already clear from what Zico's dog that really even though we sort of discovered you know discovered in quotes the other cell examples in the context of deep learning it's really not only a deep learning problem by far yes in deep learning it's easier to find through examples but every type of classifiers that we know of has them why does it have them because again we were trying to achieve standardization it never heard about robust realization so if we never design for it there is no reason for us to expect magically things happen okay so that's just some point in to keep in mind but yeah but this is the question now can saying now we have this two different notion of machine or succeeding in machine learning what does change when we move from one to the other okay so I will just give you some vignette so the first one is just about like overfitting of robust deep level okay so we all know the story especially for deep learning and internal classic optimization of overfitting so you know over here you can train some classifier and yes like this is the training accuracy so after pushing it for long enough we get a hundred percent training accuracy and of course you know so that's what we do in our tools but of course the question is okay how you know what like what will happen if you actually test the solution on the you know on the unseen test set and this is the usual performance we observe and what what is stuff amazing especially for the pet works that actually the gap between what we can ensure by training and what we actually also get on unseen samples the generalization gap tends to be really small okay so that's great in the standard machine learning but how does it look like in robust set okay so in robust I think we also well it's it's much much harder as a real physical explain to actually get to this like 100 percent robust training error sometimes even impossible because of of data being not of perfectly separable but here we can get there and now the basic in particular this is keep true that we get for C far so that was exactly what Zico mentioned at the end so this is for zipper so we can train a classifier that gets hundred percent robust training accuracy so there exists a classifier that is robust however if we look what happens on unseen examples that's the performance we observe so we have this really huge generalization gap okay so essentially they think we get is that we really get under 50 percent robust performance even though the training performance is 100 or close to it okay so well what can we do the obvious idea that we know from some time person learning is just to use regularization and maybe this idea does work so far we tried and unfortunately even if they're not it seemed that it might work it doesn't so this is actually quite puzzling for us specially because like we try to push the C far as a you know in our lab like quite a lot and that likes to wonder okay maybe there is some inherent reason why getting better accuracy in kind of avoiding avoiding overfitting is a problem here and indeed there is something there is some phenomenon that we discovered and you know the fellowmen that we discover is that actually turns out that you know if we think about aiming for adverse Hyderabad realization as opposed to standardization then actually it's not only a matter of the optimization problem corresponding to trainee becoming more difficult it's also the kind of sample complexity of this problem the you know becomes worse so in other terms especially with we tends to need more data to even be able to get a robust classifier so a classifier that generates the robust way so here is just the kind of theorem that you can actually prove you can essentially prove that there exists you know in D dimension some even very simple distributions extremely simple distributions that already exhibit this effect of you know I I just have one sample distribution I am already able to get very good classifier that performs extremely well on average sense in an average sense however in order to have any hope of building a robust classifier I need to see many many many more samples okay and the key here you think that this is like information theoretic bound so it doesn't matter what you do you can do data with meditation whatever essentially you will be stuck until you have seen enough samples and you will be not able to get any meaningful robots classifier okay so say these rules are very simple so one of them is just like a simple like you know hypercube perform like no perturbed perturbed distribution it's essentially kind of models what's happening on em needs were linear classifiers for robust like for a robust classification require a lot of data but you know but essentially if you are able to use a new linear classifier as something called stress holding if you think about how M this works you can you know you can very easily get robust classifier but then there are also are distribution so the first exhibition was hard for linear classifiers but easy for nonlinear classifiers but you can also connect you know distributions that are hard for any classifier so you no matter what has fair use unless you have seen enough samples you will not be able to get robust classification classifier even though you know average case the standard translation is very easy even for one sample okay so that's all I will say about this there is actually a spotlight and posted about this in Tuesday but yeah so that's one effect in which kind of the sample complexity of the problem changes often quite dramatically but there is also another issue I wanted to highlight is this stuff the question of you know how kind of how does being robust affect our standard realization okay so somehow my maybe like where I'm coming from so one may believe coming into working with this was that somehow once you really do all this work and put all this work to get a robust classifier then it should be kind of superior to a standard standard classifier in all aspects okay so in particular you know we know that one of the effective techniques in like to get a better standardization is data documentation is essentially like adding some transformation to your training data during training to essentially kind of you try to like impose a stronger prior and indeed works very very well and in some ways if you look at adverse trader training which is the current leading technique to get robustness you can view it as an ultimate version of the documentation because essentially what you are doing you are always training on the other side perturbation of the training data so in some ways you are training on the most confusing version of the training set okay so clearly you would expect that you know essentially this is only a good thing and in particular that being robust can only make your standardization even better because you are using this kind of implicitly this ultimate way of doing data alimentation and well let's look at the data so yeah over here we have just the you know the standard test accuracy of a standard model train in the standard way so this is just a reference point however if we take a look at what happens if I just look at the standard accuracy of a robust model then we see that there actually is a gun it's not a big gap but it is a gap it's actually a consistent gap you know and yeah we tried it in a different set up a normally by the way there is a regime like a very like at the very beginning data argumentation like sorry a robust training sometimes does hurt a little bit but in the long term look at the regime where actually we end up training it's always consistently lower okay so what's going on so where is this consistent gap coming from why you know why we seem to be inferior in this way and in a sense as you know as many things in the retrospect it's clear what's going on so first of all there is this phenomena that kind of you know if I'm trying to train a classifier I somehow have to choose if I want to maximize this standardization performance or do I want the classifier will be robust and that is actually an inherent trade-off between this and again the trade-off actually in the end is kind of pretty clear what is the key idea here is sort of if you essentially if you look back at standard training and just when your goal is just on a transition then to you essentially any feature that has any correlation with the correct label is useful to you okay so you can always milk it a bit to get even better standard performance however in the robot setting you know you can't just if there is some feature that is maybe very weakly collected with the hood they move the label but kind of is easy for the classifier to perturb well then you have to actually make a call if you actually want to take advantage of this and improve your accuracy or not okay and this is the basic kind of you know trade-off at play here that kind of manifests itself and you can imagine imagine this just like a simple way to think about it that yeah I have a data set in which like the first coordinate is strongly related with the correct label but it's not perfect and then I have a bunch of very weakly independent like you know correlated labels oh sorry features that you know that are the rest of my the that are in the rest of my dataset and somehow what will happen is that this weekly collected features of all of that if I aggregate them they aggregate to a very very meaningful and predictive they know meta feature but the problem is that in standard training you know what you will do you will just use this meta feature to get essentially perfect classification however in the robot setting you are not even Allegra lis able to kind of to you know to take advantage of all of these features because even though they are very accurate on average there are very easy to be manipulated by the adversary because the correlation it's only like each one of them is only weakly like that with the with the correct labor okay and this is essentially the this is essentially the you know the explanation okay so okay so let's just try to summarize a little bit you know where we are where we think about this difference between robots machine learning and the sun'll machine learning so it's very clear to everyone and that's what is expected that there actually is a price to being robust okay so in particular as we already said the CEO said first of all you know optimization doing that we need to like the optimization problems we need to solve during training is way more difficult and also like the one other thing that we discovered is that the models have to be larger so there is this is clearly an undeniably a bad thing you know we also just mentioned that actually you might also need more data so it's not only about training time or like our sets of the model also you might need more data to train and and also as we said this actually you might also need to kind of give up a bit on standard accuracy okay so this is something again so what we observe and also there is a more recent work by bubucaca all that kind of tries to make this protein more broadly so you know so there are all this costs and you know that's not to be expected you know you it seems we are trying to get something stronger so we have to pay for that but it turns out that there are also somewhat unexpected benefits again essentially like beyond just being robust it turns out that robust models kind of have additional features that might be derived okay so one of the things I wanted to highlight is in some ways you know the model the robust models tend to be more semantically meaningful okay so what does it mean I will just give you an example because honestly we still don't really understand what's going on there so you know one kind of so so what is the going back to us like it's going back to this fact that all of you are probably aware of is that you know as amazing as no technology like deep learning is you know it's sort of very black box and that's a big problem in since like you know when I have a classifier if I plug in the input I will get an output and this opal would be mostly correct but it's very hard for us to make sense why this output was made on this input okay and that's in kind of there is a whole field of machine learning currently trying to you know kind of help us in with this problem anybody one of the like most basic ways of trying to make sense of what let's say deep learning classifier like whether why it made the particular decision on a given input is something called saliency Maps so essentially what you do is that whenever I see an input what I look at I look at each pixel and I see what is there how sensitive the prediction is on like to changing just this pixel around okay and then what I do is I just essentially construct this heat map that the color of a pixel is like more intense the more influential it is and you know that's a great idea and now however let me do it just to a standard model like right away you will get in like pictures like this which are somewhat informative but also show that kind of this model is not exactly doing maybe what you are kind of you know thinking like that you would expect any particular prediction seems to be also influenced by things that are just artifacts and not something that really actually you know cut like you know that you actually tie to the correct prediction and of course you know there is a whole industry of like very interesting tools to make the saliency Maps even better like take them from this kind of serene see maps to make them much more you know nice but what we discovered is that if you just do the same experiment just to robust the robust model alone ok so you just like impose L infinity perturbations and we just say that in the image specification should not move if I just you know change each pixel by just tiny amount then suddenly you are getting this kind of saliency Maps right away like looks like three lines of code you just get this okay and this clearly looks much more much more closely to what we as humans would expect to happen and this actually gets even more fun so in some way what robustness does it aligns kind of the lost landscape of of the model like again more like Alliance it more with our semantics kind of understanding of the problem so let's do another experiment so what I have here is that I have here a picture of an ape which classifies as a primate in imagenet and what I will do now is I will essentially find the like I will use I will just like follow the gradient to find the kind of the closest the closest image such that it will make the model classify this image as a bird okay so I would just morph the this picture into the closest image that actually leads to classification as a bird so let me do that done so yeah the model is not robust so we should not be surprised there is actually like if you look very very closely there is a difference but it's very tiny like robots the mode is not robust so clearly we can manipulate it the way we want it in a very like with the noises complete imperceptible okay but let's try to do exactly the same experiment but to a robust model okay what will happen let's just do this so this is what you get so for your reference this is not a bird if you look very closely you will realize that it's not a bird still kind of I think no one would blame the model for taking this image for for being a being a bird and again all we did here you just train a standard model and just follow the gradient towards the kind of over the laws to make the prediction be a bird and we get such a very interesting and very semantically meaningful pictures you know and of course you know being where I am I'm sure you clearly can draw the analogy to solve it you have seen because you know you have things like that so do you know one of the there are generative adversarial networks that essentially makes a very cool technology but in particular what it does it allows you to come up with this very interesting Clayton embeddings and then you can just navigate this little embeddings and kind of the pictures you get are Suffolk like that so in some way this is like really cool that we can get similar effects with robustness in some way it should not be surprising because you know in both cases like when we do address our training we are solving some open problem and we are training glands we are also showing some problem so you can view robustness is some kind of you know maybe restricted like done like like the previous reximus trick that cannot embedding so there is still something to be understood here but clearly we see the decade we had nothing to do with security nothing to do with safety still we got some desirable kind of you know desirable property of the system just out of making it robust okay so let me conclude let me actually wrap wrap up and yeah so first of all you know let's try to you know so I hope you know both Zico and I gave you like a little bit of a view of kind of where we are in the vessel robustness and kind of how to also get started in particular you know the as the sick already said the you know the website has you know both the kind of more expanded version of what we talked about it's still working progress but it will be there and also has a code so you can essentially start like playing on these things and see where they get and Zico said this is very very very much open field like we really just scratch the surface so just to give you some general direction that I think are kind of worth pursuing so you know one you know one frontier is algorithms essentially you know one thing that is very like hampering us is just the an ability to scale up all the technique okay so specially like having to do robust training as necessary training or doing the verification is essentially kind of it's essentially like a real difficulty and we really have to think about how like what has to change to be able to play with larger models like eminence is not really the right you know test case to do anything on this also yeah we have to think about like how can we make our models smaller and maybe more compact while still being robust and actually more importantly how to change architecture to kind of maybe induce more robustness just in the prior okay so maybe we should just I think how exactly we even like to put our networks together and kind of get some robustness just out of this so that's one front here the other one is the theory okay so in a sense I think what I always find fascinating in a like standard ml was that the theory was actually very informative of what will happen in practice okay so we had this kind of theoretical bounds which of course were theoretical but still the usually the guidelines they gave was quite crisp and quite relevant to happens when in practice you know once the depth deep learning came in we sort of lost it a little bit we are slowly recovering it and you know the same is true about robustness like I think it robustness we still don't really have very good underpinning to even guide our intuition so in particular we don't really have like strong or sufficiently meaningful non vacuous like a adversely robust a generation bounce we know that the surgeon as ionization is different to the metallization but we kind of don't really understand it from the physical point of view also as I said maybe like we need some theory that will guide the choice of the requisition techniques because the regression techniques for robust classification that we have now they do not seem to be effective and finally and this is actually I can't stress it enough you know there is this one more domain is that the sort of the data okay so you know machine learning is a field about data and you know in order to make sure that our tools are really meaningful and they actually I brought they're pretty people applicable we really have to work with the very set of data sets this is not really happening currently in the robustness world again I'm guilty of this too but kind of there is a reason is we are so stuck for him for a disappointment that even we cannot figure out this very few data that we are working with currently but you know that this should not stop us from trying to explore kind of how robustus manifests in many different scenarios and also like the other extremely important composition like you know we have to think very hard about you know what are even the right set of perturbations that we should because very racket when I say my model should be robust what should be like what is the exact that in kind of class of transformation that using be robust to as we said the LP transformations we considered they are clearly necessary but they are far by far not sufficient yes because like rotations are not captured by LP perturbations you know so you have to throw in notations no then you can think of local deformation like there is a lot of stuff to think about and I think we should think about and again this is just about vision but there are many other contexts but this may be important and different perturbations have to be considered so so this is like kind of this three main goal algorithm and Furion data I would like this is where I would paint this the current like the you know the the key directions that will push you in this field but also like this is the point that already you know is eco mention but I really want to embrace is that somehow you know there is something different about robust ml that goes a goes beyond just like it's works with different definition somehow this max in this definition should really really make us also think differently about the way we approach the problems like we should go from the kind of average and best-case scenario evaluations to the kind of worst-case mindsets okay so we really should be in particular you know when we evaluate the quality of our model we really should always kind of think back it's always a game so yes I want my mother succeed but in order to make sure that I succeed I have to put my heart of someone who wants to break it someone who kind of knows exactly how it works knows all the potential weak spots and just tries hard to exploit it so this is again something that happens naturally in you know insecurity because that's the field they work in it didn't happen in machine learning you know because there we didn't have to worry about a facade yet now we have to and now we really have to learn from security community you know how to do it you know I guess Nicholas Kelly is in one of the great you know propagators of this kind of idea so if you think your model is great talk to him and you know then we can even talk after right so essentially he's great at that there are some useful resources here there's a clever hands which kind of gives you a library of attack to treat it as a point of start not at the end of your evaluation this is just like this will just seed your thinking of how could I go about breaking the models there's also a really robust ml that tries kind of to keep track of like s people proposed models you know which photos actually were already well well proven to be robust or it is there is some evidence that are robust and which which models were already circumvented so you essentially can navigate the field better but in general like it sorry about the mindset I think we really have to be much more skeptical and much more careful in this field that we usually because essentially failure here is kind of quite catastrophic and we just can't afford that okay so this is about additional robustness but again more broadly I really want to kind of confirm that this is like in some ways I think machine learning clearly succeed as a succeed as a proof of concept but now it's time to grow the next like to do to the next milestone and so like make it truly reliable and kind of something that you can be kind of confident that you can deploy in the real world life and essentially it will work there and in particular I really really believe that thinking about robustness is not only and the biggest ml that is better suited for like security and safety aspect I think it also will be better in in other regards as well I just mentioned this kind of interpretability angle I think actually like in the context of fairness and other things I think it also be a very useful tool again I don't know yet exactly how it will be useful I think this is like the exciting question but I am pretty sure that it will be okay so this is all I have to say I just want to again just emphasize you know if you got interested in this field there is you know plenty of you know plenty of stuff to do there is just like you know the kind of the cross-section of things to do is just overwhelming to be honest so yeah if you want to get started we have the notes and code over here you know that you can just start playing - if we will be adding to it over and over also like some of these issues that we discover like there are blog posts on my on my group websites that kind of go a little bit on depth they may also help you to get started and yeah overall you know I just want to say you know it's really fun field I think it's really a critical field that we know we need an influx ideas and work so I hope you know some of you will decide to join in I think it will be fun Thanks [Applause] can you hear me thank you we have time for some questions if you're leaving now then please leave quietly and if you have questions to ask there is maybe just one microphone right here if you could come and join the line go ahead please so to summarize what you've done so far you find the adversary I put a max around the loss and then you they learn to put them in around the max to combat that so let me ask the logical question which is what if I put a max around your min and then you put another minute on my Max and so on I think it's like meaning max like so I think mostly more steps more moves in the game if you in some ways I think there is really just like one mean one minion one Max because there is just like two players adversary and ask but you are right that in some way this will always be a cat-and-mouse game essentially like whenever I kind of get an idea how to kind of thwart any of the perturb like any of the attacks in one model there's always a new kind of you know an extension of the model that kind of all now we are not secure again and yeah this way we'll never have ultimate robustus but this is what we also have in the classical setting it's like we'd ever have perfect security we just only try to reason about where we can get security at this moment what we are missing and kind of at least you know yeah it will never be perfect but we just need to do better than we do now and hopefully in time we will get there but yes there is no kind of you know solve it all solution of meanness like oh now it's everything is secure period and there is nothing else to be done that's will be always an open question but currently is actually quite quite bad so one point I'll make there is I think the next regime for this by the way is the max over big Delta and that would be generalization over different classes of attacks which I think is the next natural extension of this which would be really interesting consider which really hasn't been considered very much and for example being robust to multiple you know distance measures yeah so value can do take at the intersection or the Union even you can do that one but to actually generalize and have an adversary that can adapt their perturbation region is a really open question that has not been addressed as far as I'm aware yeah great now it seems to me that in the interpersonal tweeny it is assumed that there is no intrinsic error in the computation of the of the tweeny but people have been trying hard to apply up computing like reducing the precision so that when you compute their group is some complaint ejection error could you comment on how this robust machine learning would work with approximate computing like in the end it's like in some ways these two things are orthogonal because as you as Eko explained in the end when we want to get robust models we just have to solve certain optimization problem and yes so whatever we understand about so these optimization problems in the reduced precision thing if you transfer over so I don't think it's an important issue and we have to think about it but I think we still have a very clean interface in which to understand in what will be the impact of limited computation of this is like how well can we solve the optimization problem of limited position if you can do it well everything will work if we don't do it well it will not work hi so I think you talked about one approach being to find the exact worst case adversarial example and you talked about using kind of off-the-shelf binary optimizer sexy place but I mean I was wondering like the zero one binary programming problems are actually np-hard and when you look at what optimizes like SEAPLEX do they're actually finding approximate solutions like they're basically the branch and bound they take like a linear relaxation find an approximate solution and then you know find another relaxation of the problem so you know did in terms of finding provably hard bounds have you actually been able to do this successfully based on provably optimal solutions to the combinatorial problem all right so so the examples I gave there were for the networks and the examples in the in the notebook those are all solving branch and bound to optimality so you get you you get the full you expand the tree entirely to the point where you actually are guaranteed out melody your upper and lower bounds match now of course you could also use stop those solvers early in some sense to get upper and lower bounds which are not not do not meet and actually the combination I described there are exactly some are like the first few relaxations that method like c flex or OB would solve so so and i should say there's also work i think there's been more some work from from the university of oxford and so folks of deep mind on adapting like better branch and bound search methods for this problem as i think is might remember even be a paper here at nips on that so yes it's a lot of work in this there's also a lot of work on the SMT solver community about specialized solvers for these things and again compute upper and lower bounds but to be clear the the solutions that we have in our tutorial are all the cases where you solve the branch of bound to completion where the upper and lower bounds match okay and so how many dimensions have been able to just so they're for those ones that work exactly there's about 80 hidden units and those which is the number of binary variables what are your thoughts on ideas like input free adversarial examples so what do you like meaning that you don't like what does the input free me in the exactly like you don't look at the input data when you're constructing your adversary oh I see oh yes so this is a very interesting question actually like in some ways yes so not much I think is extremely interesting because in some ways you would like to kind of make your perturbation to be as kind of oblivious to where they will be applied as as possible so I don't think so there are like I said work on if there's a perturbation which is not exactly what you are saying what you are talking about but like it's close its to look is the data but you would like to come up with one perturbation that works for most of the images so honestly like input feed perturbations I think okay so ideally if we do well unless there is some actual hidden bias in our machine learning models that we kind of is unwelcome they will be kind of semantically meaningful like there should be just something that also should you know confuse humans so it will be very interesting to study if that's the case or not so I think that's interesting you know again we are still starting I think it's a very interesting topic you just I don't really know much about it yes so so in the case of universal perturbation in particular which are sort of getting at what you want there's actually very nice authorization formulation of that too because you can just think about the max of a delta being outside the sum now or outside expectation so it's still a min max problem but the max is outside the expectation is that it inside of it so it's independent of X or it has to be Universal overall ax so that's one of the universal perturbation formulation which has been studied what's been less studied though is models that are robust averse to universal perturbations which should be an easier problem to handle but there hasn't been much work on like robust optimization against Universal perturbation so yeah that's your really a really cool topic thank you oh thank you for your excellent tutorial the website isn't especially a nice touch I was wondering so you claim robustness at a particular bound Epsilon but presumably if you were to increase the epsilon you would then increase there right and so forth how has there been any research done for example on the detection of its aerial attacks such that even if you were to increase the epsilon further and further you still realize that your input is now adversarial and if you just like rejected or something or do something sensible yeah so we do have some work sort of in progress but I can talk about it I'm not sure what for if you can do with it I've just sort of simple like binary search on an example dependent epsilon boundaries that you can increase or decrease to sort of get each make each example as secure as sort of as secure as it can be in some sense I'm not sure it's quite getting what you want but you have to say you know if it was an interesting question like I'm always a little bit skeptical of detecting and versatile examples because it's very unclear like if you make a data-driven if you make it like machine learning algorithms to the text stuff then what you can do you can construct a visual examples against this detector so this is like the most promising way to do it is definitely not I don't think it works doesn't mean it can't be done but here I think the interesting meta question over here is exactly like how much our metals over fit to the choice of Epsilon okay so we definitely did this kind of studies where we kind of took a model that we know that is you know that is like design to be the robust to 0.1 Epsilon and like how does it fire when you go beyond it and of course it gets broken it's depending on the data on the data set it's sometimes it's really like HP overfits and sometimes just like relatively gracefully goes back and sometimes you actually wants to kind of a pack it in a stronger manner to just get as you know as security against smaller epsilon so there I think that is we don't understand it well this creates an interesting happening but yet we don't okay thank you I thanks again for the talked with excellent so only one or two more questions so my question is actually about the nature of the threat models that the attackers in the inputs they'll choose and that I think in most threat models you're gonna find the attacker is free to choose inputs to perturb that are outside the distribution of the task in hand you gave a really good example of carlini's worked where he used music to attack speech to text and I think most of the Federals have looked at this is perfectly available to the attacker and I'm curious if you see any path forward for adversarial training and verification when you have to actually deal with the whole family of distributions that lie outside your task yes so I can definitely see it I don't know how to do it yes yeah I agree yeah it's an actual problem if you if you say when a so what prevents noise from being classified as something yeah well because it could be any noise anywhere right that's a great adversity or graffiti we don't know yet yes so so I expect that this would be like a big problem like I have some thoughts how to like I think essentially part of the part might be that we are asking for too much I think we should not be able to robusta classified noise as well but yet currently if you start with complete noise you will you know you will destroy this models there is no question water okay yeah to your point about asking too much I think this is where the engagement security community which is where I actually come from has to really start in earnest exactly as it would be a threat models not being ml but define what's too much so speech yeah I'm happy to chop that's exactly what I think thank you okay so most of the tutorial about image object detection and there I just wonder if there's any results on natural language processing like question answer those areas so yes there is I know personally Daniel that would have some each independently have some work on that Nicolas Carlini right sitting by PI you have some work on on speech as was mentioned the previous one so yes language is different in particular because it's much more discrete then images so you change a whole word you don't just put her word a little bit but it's a it's a there is work on that and it's not as well developed there definitely is some work on that so like it the latest you know AEM MLP there were several papers on this yes so in LP is that like we still are not there yet with average case performance so it's obviously doesn't work when you do advertiser stuff but if there is working again pearson yang I think is a one of the lead person of that so yeah but yeah it should apply to any domain in just a matter of the end of the tutorial let's thanks Nico and Alex again thank you very much for a great tutorial [Applause]
Info
Channel: Steven Van Vaerenbergh
Views: 9,190
Rating: 4.9779005 out of 5
Keywords: NIPS, NeurIPS, 2018, presentation
Id: TwP-gKBQyic
Channel Id: undefined
Length: 120min 3sec (7203 seconds)
Published: Fri Dec 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.