#52 - Adversarial Examples Beyond Security (Hadi Salman, MIT)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] performing reliably on unseen or shifting data distributions is a difficult challenge for modern vision systems even slight corruptions or transformations of images are enough to slash the accuracy of state-of-the-art classifiers when an adversary is allowed to manipulate or modify an image directly then models can be manipulated into predicting anything even when there's no perceptible change this is known as an adversarial example the ideal definition of an adversarial example is when two humans consistently say that two pictures are the same but a machine disagrees but we can't operationalize that therefore many researchers look at the l2 or the l infinity distance between two images as a proxy for human evaluation hardy salman is a phd student at the madre lab at mit he cut his teeth at the robotics institute at carnegie mellon at uber and at microsoft research during his undergrad at the american university of beirut hardy became fascinated with robotics and autonomous systems what excited him the most was that building such an intelligent system could assist humans in making their lives much easier hardy was fascinated by the potential for artificial intelligence to make robots behave increasingly like living things he realized that robots without a high level of intelligence would always be lacking something critical after several years of research in robotics he quickly realized that his passion went beyond applying artificial intelligence to robots his experience with robots and autonomous vehicles made him aware of some fundamental issues in artificial intelligence which hinders it from being used reliably in safety critical applications chief of which is the brittleness of ai models in general their sensitivity to distribution shifts random corruptions and adversarial examples not even to mention the safety and security concerns that lie therein hardy became enmeshed in the pursuit of these problems above all else and he did some pivotal work with some world-class researchers at microsoft research so what does it mean for a model to be vulnerable to adversarial perturbation let's just say that an adversarial perturbation is when an imperceptible change can be made to an image transmogrifying its predicted class the robustness of a model has traditionally meant its susceptibility to noise but it's also been used synonymously for generalization itself when hardy was at microsoft he came across a couple of papers from his current lab at mit the first one was adversarial examples are not bugs they are features it presented an entirely new perspective for understanding our data referred to as the robust features model a robust feature is one which is anthropocentric or recognizable by humans in this model the brittleness of neural networks is justified by the sensitivity of models to highly generalizing and so-called non-robust features of the data right which are imperceptible and unrecognizable for humans the supervised paradigm of machine learning maximizes accuracy alone it's little wonder that classifiers use all of the information available even if it's incomprehensible to humans the mit researchers showed that it's possible to disentangle the robust features from the non-robust features by doing adversarial training or so called robustification robustification is when adversarial examples are found using projected gradient descent it only finds the smaller magnitude non-robust features because the algorithm is constrained to a small radius around the original sample so by definition any of the features it finds are the low magnitude or non-robust features just because you do adversarial training it doesn't necessarily mean you're going to block all of the adversarial examples so robustness will always be a matter of degree normally classifiers learn both sets of features in tandem but when you isolate them into separate data sets and you train a model on both of them you'll get a good classifier in both cases this demonstrates that both of these projections of your data are just two different types of features it's only when you train on the robust features that you get a robust classifier which is to say a classifier which is primarily learning human recognizable features previously adversarial examples were seen as aberrations arising from the high dimensional nature of the input space or statistical fluctuations of the training data this new frame of reference suddenly explained why adversarial examples transferred between models they were in the data this conception doesn't explain all adversarial examples it's still possible to find ones which don't generalize outside of the training data but why not leave the robust features in there you might ask they still carry predictive information even if imperceptible to humans herein lays the tragedy if you remove all of the non-robust features your classifier accuracy will drop but at least there'll be an alignment between what humans recognize and what your model is learning the second paper which inspired hardy from the mit lab was adversarial robustness as a prior for learned representations which demonstrated that adversarial robustness leads to more human perception aligned feature representations robust optimization can actually be viewed as inducing a human prior over the features that the models are able to learn so in that paper there was a really interesting experiment where they trained a robust network on robust features and a non-robust network on non-robust features they then put in either an input image or just a random seed image and then they run projected gradient descent to maximally activate a given output class back into the original image and you can see that on the robust network the kind of shapes that are emerging are quite high level they're quite human recognizable we have something that looks quite frog like in the top left here and quite crab-like whereas on the non-robust features we're just seeing noise these are imperceptible to humans and when the dog was used as a seed image you're still seeing the dog kind of bleeding through into the optimized image so as you can see these just seem like two completely orthogonal sets of features but annoyingly for us the non-robust features below are highly generalizing and are being learned by neural networks in their default configuration it's a fairly empirical method that they use trying to ascertain whether an image is human recognizable or anthropocentric to a certain extent but yeah it's it's quite clear though that the more robust the network the more it produces human interpretable visualizations larger shapes and configurations of objects rather than these fine-grained or ostensibly noisy patterns do you remember whelan brendel who he had on the show recently he was involved in that texture bias paper showing that cnns are strongly biased towards recognizing textures rather than shapes he claimed that if you have a robust network then they're not sensitive to the textures anymore and they're more likely to be aligned to the features that humans pay attention to so all of this work inspired hardy to start thinking about how the fruits of the adversarial robustness community could be leveraged beyond security hardy published work showing that using adversarial robustness as a prior for training during imagenet models actually led to better transfer learning to a wide range of downstream tasks a few months later hardy had a huge realization the phenomenon of adversarial examples can actually be turned upside down to lead to more robust models instead of breaking them hardy actually utilized the brittleness of neural networks to design unadversarial examples or robust objects which are objects designed specifically to be robustly recognized by neural networks this work led to hardy publishing the paper unadversarial examples designing objects for robust vision which basically answers the question why are we trying so hard to make neural networks robustly recognize objects why don't we turn this thing upside down and instead optimize our objects to be robustly recognized by neural networks how can we build objects that are easily detectable by machine learning models instead of optimizing images to mislead the models as is the case in traditional adversarial examples we can instead alter the inputs to reinforce the correct behavior yielding on adversarial examples consider how adversarial examples work we maximize the loss of a machine learning model given the correct label with respect to the input image by solving a simple projected gradient descent formulation finding permissible but small perturbations on the input image to make this work for unadversarial examples all we need to do is change the max to a min we can do this on entire images but to make it even more realistic we can work with patches or even textures on the objects inside the image this works brilliantly in settings where the user not only has control on the models that they're deploying but also on the objects of interest that they're trying to recognize track or detect we already do this in the real world right just think about helicopter landing pads or runways or stop signs these are things that have been designed to perceptually activate our own human perception as much as possible and it turns out that we can do the same thing for machines anyway i really hope you enjoyed the episode today we've had so much fun making it remember to like comment and subscribe we love reading your comments and we'll see you back for some quantum natural language processing next week check out me hat what the hell is that oh nice look it has my pronouns excellence conflating pronouns with editors there is a secret it has the tweets that got me into trouble welcome back to the machine learning street talk youtube channel and podcast with by two compadres sayak the neural network pruner pool and dr yannick lightspeed culture now in today's show we are joined by hardy salman hardy pursued a double major in mathematics and mechanical engineering at the american university of beirut he obtained a masters in robotics from carnegie mellon under the tutelage of the legendary professor howie chossit where he applied machine learning in tumor localization and surgical robots he also applied deep reinforcement learning for robot navigation hardy interned at uber where he worked at the intersection of machine learning and uncertainty quantification for autonomous driving and he also got accepted into microsoft's ai residency program in 2018 which is really prestigious actually i think only about 10 people got accepted from literally thousands of applicants so he spent two and a half years at msr as a research engineer working on adversarial robustness for neural networks and whilst he was there he published a convex relaxation barrier to tight robustness verification of neural networks and this work was a layer-wise convex relaxation framework that unified all the previous lp relaxed neural network verifiers i think lp means linear programming by the way um in primal and dual spaces he also published provably robust deep learning via adversarially trained smooth classifiers which improved the performance of randomized smoothing and is now one of the few scalable and certified defenses against lp bounded or finite dimensional adversarial examples this work achieved sotter accuracy on imagenet and on cifar10 he also released the paper denoised smoothing a provable defense for pre-trained classifiers where he developed a method for provably defending any pre-trained image classifier against finite dimensional adversarial attacks at the end of last year hardy published unadversarial examples designing objects for robust vision hardy realized that the phenomenon of adversarial examples can actually be reversed to lead to more robust models instead of breaking these models how to utilize the brittleness of neural networks to design robust objects or unadversary examples he also published do adversarially robust imagenet models transfer better at the end of last year robust base models for transfer learning might actually have you know slightly lower accuracy but they transfer better for downstream tasks robustness leads to better feature representations recently heidi joined the mit madre lab as a phd student where he's developing techniques for efficient reliable and safe deployment of machine learning models in the real world and today hardy is going to tell us all about his work on adversarial robustness beyond security anyway uh hardy it's an absolute honor to have you on the show tell us about your eureka moment with un adversarial examples thanks guys for having me in the show this is a great show and i'm happy to be here um and thanks for the great introduction so yeah like addressing examples are amazing as a phenomenon it always actually like surprised me like and got me really excited i remember attending a talk like uh by uh by one of like facebook air researchers uh uh facebook i researched uh researchers which gave it which is what he gave it at cmu and he kind of talked about i forgot his name to be honest but uh he gave it he gave a talk at cmu and he introduced hundreds of examples and i was really really impressed and i was like oh my god this is great like this is nice like like how how can you know those you know very easy easily crafted examples you know break break such like really really well working models like you know they work in like in very high dimensional space you start you just perturb your image slightly and you suddenly break the model it's really impressive and you know this got me to like you know to realize like you know like our current models they work really well you know in the average case but on the worst case scenario they just like you know suck it's very very easy i can't imagine how easy it is to like break these models so it's clear that you know like our current uh you know machine learning models are way um you know like way far from being robust and reliable in a way that we want them to be when we deploy them and especially in specifically critical applications so for me the recent example is really like kind of like from a security perspective it's really like an important problem although people like some people think that you know uh there are more more problems that you know like there are worse problems than ever examples currently that we should focus on which i agree with but that's still adversity examples is definitely like at least in the future it's gonna be like something we have to deal with and uh currently also we have to deal with as well um so from a security perspective you know i've uh i've done i've done some work as you mentioned the adversary examples and it's really exciting like the nice thing about it is that it's really challenging like like there's literally like it's easy to attack you know there are many many attacks all of them work amazing but it's very very difficult to defend against these attacks like it's it's it's insanely difficult even though it's on small small data sets like c410 we cannot achieve like you know above like 60 67 accuracy on like some even like you know small lp preservations it's just like insane how how how difficult it is to defend um and how about certifiably defend as well like with guarantees that even becomes harder and harder and whenever you try to defend the problem is like whenever you're trying to defend against these things you lose accuracy as well you do you lose standard performance so every single defense that we have right now it doesn't even like you know achieve like you know the robustness level that we want but in addition to that you know it suffers from you know bad accuracy and also sometimes like if you're using randomized smoothing or something it's practically difficult as well because you need like a hundred thousand samples at inference time to like get some certificate so like some some search feed of robustness which is clearly like nothing like not very practical so like no not being uh requiring a lot of samples sometimes uh because of you know it's randomized nature uh of the algorithm basically and uh deteriorating basically the standard performance all of these things you know come together and you know it's it's very far from being practically practically usable at this point in addition to of course not being that robust that we want but still open problem i'm super excited i'm working on it all the time in fact we are working on so like in a couple of months we would have like something that like probably the the first so we would have like a really really good defense again at the serial patches which is practical and to be honest i feel this is the first time it's going to be like people will think actually yeah let's probably deploy this model i'm very very excited about this work and in a couple of months hopefully like in our lab we're working on this super exciting and yeah i can't wait to like you know open source exactly tell the people about it but i'm very i'm very optimistic about this so it's specifically for adversarial patches it's really amazing and the first time you know like the first time i think like the practical the fans uh can can go out there and like you people think because it's like it doesn't it's very fast it doesn't create standard performance and it achieves remarkable um robustness against their patches on imagenet scale so it's really impressive uh i feel i'm promising so uh we are working on it and they are preparing it you know really well so i'm very excited for that but uh but yeah like most most of the work on robustness have been you know adversely examples have been like on security applications which is kind of makes sense right like it's the first first kind of like you know uh obvious or like direct application uh uh or they're direct you know like yeah direct kind of like application where adverse examples you know like hurt the system so it makes sense to work on it from a security perspective but also like there's have there have been you know a bunch of works uh uh that that discussed you know robustness and adverse robustness beyond security like the nice thing about address examples in addition that to that you know it's easy like it's it's it's an impressive phenomenon because it's great it breaks your model easily you know small perturbations breaks the model this adversity example these adverse examples have actually given given us insights about our models it turns out you know like if you actually you know do adversarial training to block these adversary examples you're kind of adding a prior on your model which actually makes it learn a really really robust and nice representation that hasn't learned it before a more human aligned representation that that that basically aligns more with aligns more with how humans perceive perceive basically the world roughly speaking so it's really like it's really it's really like a nice prior and in addition to like you know being something that hurts uh our our models it actually can benefit like those insights that we achieve that we get from adversary business uh like are really really essential for like learning uh learning representations and other topics that we actually care about and in machine learning and i think like in this chat we're gonna focus about like you know two applications that that that actually improve the performance of of actual systems in the real world by what and and whose insights were brought from address or examples basically so you know we we look at whatever we we did for these examples we learn from it we we get some insights and we solve other updates other tasks and and like apply to other applications which i found really really really exciting and as you mentioned like those applications you know include designing and adversary examples um which basically reverses the story as you said of adversary examples so instead of like breaking a model by generating adversity examples you actually design the world in a way that the neural networks can perceive it much much much nicer and much better so basically you add features to your objects you add features that neural networks like and really really you know uh grasp on and i guess we will get like into these these the these details later on i guess in the chat but uh but yeah uh examples i think like were um were key for for me to like get into it you know to like like to to get fundamentally into like machine learning and like start actually doing really really good work like when i when i joined microsoft i remember so i didn't do any any you know research on adversary examples before before joining microsoft's ai residency so when i joined microsoft uh i remember that so so they're like you know you join an ai residency you have to choose some some some project to work on some set of people to work on really exciting it's like you can work on anything you want uh you know so you have you have microsoft and all the microsoft resources and all my stuff people you can choose wherever whoever you want to work on with so i felt really really well like i wanted to take that opportunity to actually get into something that's that's really impactful and people care about and you know it's it's it's actually you know like not easy like it's it's very you know challenging and uh you know it was in the middle of the hype of like you know obviously robustness and there's your examples and many attacks like every day there's an attack every day there did it and there's a defense so it's really exciting uh so i i i was like that's it i want to get into this i want to like start you know start like i want to find the best as everyone starts i want to i want to solve this problem i want to you know build the best defense i'm going to stop adversary examples so i started working you know uh reading reading a lot of papers uh subscribing to like google scholar to like get every single to read every single attack and defense which is impossible so i stopped doing that but uh but yeah so i started like digging hard on like seeing what what what problems like current defenses have what problems certified defenses have and then as you mentioned uh we published the first uh paper on like complex relaxation which tried basically to like uh so which is basically along a which is basically on your network verification or certification so we find we found that you know so there's adversity examples it's a problem people try to defend against it in an empirical way or in a certified way so empirically there is basically there there's billion ways that people have tried but like nothing works more than like you know doing adversarial training probably that's like the most effective way just like you know train on adversary examples and it works decently but not as much as as good as you want well there's a lot of variants of it and you'll find them with different different names but like they're essentially all um doing doing similar things and there's also certified defenses which gives you robustness guarantees so you have guarantees that your model is robust so the something that we found is that there's like 20 papers on certified defenses which do convex relaxation of neural networks specifically relu networks and they try to like you know push the bounds push the push the certificates that they that they get but they are also kind of essentially it's the same like they are doing similar things a bit of heuristics to like do the relaxation better so what we did is that we just unified all of these in one framework and we sold this framework to optimality so like we actually found you know like the best bounds this this this neural network this this this this relaxation uh problem actually uh can get and why that's a problem it was a problem because it requires so much compute like i remember we did like 150 years of cpu compute or something that's why people didn't actually solve the optimal problem there so we unified everything in one framework we solved the optimal problem and we showed that the optimal problem is actually really really bad like it doesn't it doesn't give you much like it doesn't actually close close the gap with like exact verifiers such as mixed integer programming very fires for example on even simple problems so like why like so so we we basically had like one simple message like if you want to do relaxation just like do more than layer wise convex relaxation because if you do the right relaxation with some approximations you're not gonna you're not gonna get like much from it so that was you know like uh interesting paper and like people liked it apparently uh so i was i was very excited very happy because you know people started you know talking about it you know you like you put it on archive you you open source the code you you tweet about it and then people start interacting this is the best moment like of of of the research uh of the research kind of point like life of a student or like a researcher i guess at least for me i i get very excited like you know when i i'm very excited like the day before i want to like to beat unannounced it opens and open source the the work i'm super excited um so yeah that was that was really nice and people liked it i was very happy and it was kind of a step for me like i was always like you know i have nothing in this field i want to do something like i am like you know i have nothing i haven't done anything in this field so this was like this was like the first step for me will i get into this field and immediately after we published this paper uh randomized smoothing came out you know like it was like came out like two weeks before we followed this paper or something like that and right now and it sounded like you know like a very scalable method and that that's actually promising that that's a promising direction and certified defenses for the fans to defend against adversarial examples so it's the first it's probably the first work that actually scales to imagenet and it's really really like you know theoretically theoretically really well motivated and really nice like has nice theoretical kind of foundations um so i worked with a bunch of like theoretical computer scientists at microsoft and researchers like it was actually really nice problem to work on and what we did there is that we basically tried to like improve uh not the actual certification but try to train models that are best that are better for this meth for the certification method so what we did is that essentially like how randomized moving works is that you train a model to be able to classify well under gaussian noise and then you pass and you for if you want to like certify an image you replicate it like end times i don't know hundred or thousand times you add random noise to it random gaussian noise to each to each sample you pass them through the network and then you take the majority vote on that and also you can like do something else also like to get some some certificate of robustness like how how like what's the radius of your l2 ball in which you are robust so what we did is that we kind of trained to figure out a way to train them the model that we use for you know smoothing or for randomized smalling in a better way so we basically employed adversarial training to optimize the right objective so that the the cool thing there is that we kind of approximated the actual objective of the smooth classifier so now we are not actually certifying the actual classifier there's another classifier which is wrapped around this new classifier which does this process of randomized smoothing so passes in random noisy images or noisy images basically of your current image passes them through a network many many many times takes the majority of out of these so this this kind of like a process is a new classifier um so essentially to to to to what we wanted is that to be able to like find what the right noise that we want to add so that we get the best kind of model for such process so we did basically adversarial training but on the objective of this smooth classifier so that was the contribution there like you know uh attacking the smooth classifier to find adversarial examples for the smooth classifier and train on these and it boosts the results really really a lot and also people liked it so i was very very excited so like you know two two papers after each other and people like them i was super excited like uh like i think these are the two favorites that you know got me more confident and like you know more more kind of like uh you know excited to like continue working and like you know uh like search more and do more stuff on how how how to improve you know robustness of neural networks and uh yeah so these are these were the first two papers i did them during the ai residency and then i switched to be a research engineer at microsoft and then continued working on this on these on these topics so i collaborated with the a bunch of people so i recalibrated with zika culture from cmu and alexander madrid from mit who's my advisor right now um uh like those those were kind of like the two of the leaders of in this field so i just like kind of reached out to them and i was like you know i have this good idea do you want to collaborate or something like that and they're like they were really really open to it so i was very excited like it's really exciting like how you can easily you know reach the people and like start collaborations and like i i i actually like doing that a lot like it's really nice like you just reach out to them and they are really nice people in general so like at least zico and alexander are amazing both of them like they are they were really really nice and easy easily approachable so i just reached out to them and discussed started discussing with them met them at eurips i remember as well when i presented these two works it was really really exciting so i started then like working with zico on like this the noise smoothing paper uh and let me know guys if you wanna like if i should stop or not i'm just like saying everything i'm telling the story of my life no it's fine carry on sounds good all right let's do it so so yeah i i joined i i basically you know i remember i discussed like in years 2019 i met ziko like after after a tutorial he that he gave or like i thought he gave or something and we i i don't know i got this idea of like you know why don't we i was discussing with him and you know we we we thought like why don't we like like it's it we were actually trying to like think of ways to improve so basically the original randomized smoothing paper came from his lab and then you know uh me and like some researchers at microsoft you know improved it or you know presented this work on like how to train it better so we were thinking how we can push it more because it seemed like probably the only method that can scale like how can we push it more and more so we were thinking that all right so like our models or the my smoothing really depends on like being able to like classify well under under gaussian noise right so why don't we kind of like add a pre-processing step to our inputs before passing them to a classifier so we were trying to like append the noise denoiser to a classifier and train basically this end to end to figure out like can we actually improve randomized smoothing and it turned out that it doesn't help at all you know you are just like you know adding you're you're just like enlarging your model but like it's not reading it's not really helping helping a lot so while doing that we were like oh like what if we just like fix the classifier and like you know train the denoiser so like what if we have a classifier that's fixed pre-trained black box what if we fix it and we just append a denoiser or pre train a pre like some denoiser and fix like prepended before this classifier and then like do randomize moving on this whole new classifier does it actually work and it turns out that it actually works really well and why is this interesting is that okay randomized winning is amazing and it's probably one of the it's not yet where you want it to be of course but it's one of the only methods that work and so so so like obviously like we want we have to think about how to apply it eventually how to apply it to actual you know the real world problems so imagine you have you know you are using uh google's or microsoft's or any of the shelf image classification api and you wanna instead of querying it for predictions and on google and microsoft let's say they are lazy to deploy their own certified defenses because it doesn't really work well so far so what if you want actually to obtain some certificates from those small like from those models you can literally just wrap these models which are black box you don't know anything about them you can append basically you can like append a denoiser before them in this process of like querying and then just like pass a bunch of noisy images of your current image of interest that you want to certify and then get the prediction pass them through the noise so add noise pass it through your denoiser then pass it through the api get a bunch of predictions take the majority of z votes of these and that's it you have um like you have you have you have uh you have a prediction that's that's that comes with a certificate as well so you can pass you can also like the output that you get you can take the majority vote to get the prediction and you can also like you know calculate calculate the actual certificate that you get like how robust the model is around this data point that you are passing that was really really exciting like so we did it like for we we certified like uh google's api microsoft azure's api uh clarify api and the aws api like it was actually really cool to just like you know go there just like try see how like how robust these models are without you know doing this process and then how more robust they become after you doing this process so it was really really cool and we possibly we published this at europe's as well 2020 uh i think yeah and people liked it as well so i'm excited so so i like like when people like like i really i really invest time like writing papers like a lot of time like like in our lab and in my previous at microsoft as well and more specifically in our current lab like we invest a lot of time in like writing papers and like making sure the code is really smooth and runs and like replicable and write a blog post like we don't open source anything without like a code blog post so that people you know is like are easy like to to like to like see the work like it because like there's like so much noise there's like billion papers a day in machine learning so if your paper is not like really well done like first of all the problem has to be like you know well motivated the writing has to be really really nice because people are lazy to read like if i if i start reading a paper and like i don't understand what's going on and like you know it's not well written i'm like people are lazy by nature they're not gonna like continue writing most pro reading most probably so as long as you don't like you know feed it to them like very easily they're not gonna like there's billion things like they will they'll say ah let me continue working on my code like whatever i'm doing but like unless it's like really well written with a blog post with like easily you know people like when they understand stuff that's my that's my understanding like when they people you know you know when you tell something about people and they understand it they like it and they get motivated and they then they start following your work and like uh hopefully building on top of it so that we you know improve the state of the art of like ai and like science in general so like i feel this is really nice something nice about machine learning that you know people try to like you know make things applicable and and you know easily easily doable and like you don't see like very complicated papers you you see chunks of papers which is really nice actually and uh so yeah in short like i try to expend so much time on to to like make the papers really really well uh well written so i so so of course i feel sad if after this all work you know people that don't like like it or like you know i don't know if it doesn't you know get a lot of attention it feels a little bit sad but it's fine like it's worth that at least you see you feel you know you have um you have done you know your part and then it's up to people to like it or not or to appreciate it uh or you know but like i really spend time also like choosing what problems to work on because there's many many problems you can work on so i spent time optimizing because any problem you get into if it's tiny or if it's big it's gonna take you a few months to like try the paper write the code etc so you better spend this time on like problems like you better spend like two weeks thinking not doing anything and then like like start actually you know like working on the problem then like starting immediately on a very s tiny idea that you get which might not you know be that impactful because anyway it's gonna take like three months four months from you to like write the paper you know write the code etc if you want to do it well of course if you don't want to do it well it's easy you can just like write it overnight but but yeah it's it's exciting when people you know like the work and you know appreciate it and start interacting with it and building on top of it it's really satisfying for a researcher at least for me i don't know for everyone but i get this for me it's very very satisfying so after this um all right so i think this is probably yeah so after this i so i i don't know how things happen exactly but like we started i started celebrating you know uh like me and some researchers at microsoft circle everything with alexander madhuri as well so we were starting discuss discussing like we wanted like to actually you know like uh see like where we can where we can go also like with robustness and it was after you know this adverse robustness uh like features are not bugs you know uh sorry like uh at least examples are not bugs they are features uh paper it is really inspiring and it came up with it came out with like two other papers which like show that you can actually use robustness and to do to do other tasks so this was really really inspiring for me and it was really amazing well written uh like really nice experiments nice motivating i think they're like the crucial thing is that like the set of experiments that delivered this idea uh uh it was really nice really really nice so i started thinking like how how how we can push forward like i don't know how like how i start thinking about stuff but like i don't know exactly how i what i did exactly but like i started thinking like how can we push this more how can what like is this it all like what what does it mean that there's another bus features robust features what can we do with these what does it mean that you know uh if you train with adversarial training you get you know a nice representation so like if it's nice looking guess what like so what like what what happens um so so we got this idea of like okay so it's we when we actually do addressable training we we are kind of like blocking these non-robust features and keeping those robust features right so like in the robust features model they're like every feature presentation is like this divided into like robust and robust features so by doing additional training you are kind of like masking the non-robust features keeping the robust features only right and when you do that you get like nice nice representations in the sense that so a direct application for that was like what does like okay a very important place where we care about feature representation is like for transfer learning we basically have you know those source models that we train on large data sets and we have some small tasks where we do not have much data or compute and we want to basically transfer the knowledge that we learned unlock those huge tasks such as image net for example or other other more larger tasks and you want to transfer train models there transfer them to work on very quickly on those small tasks imagine you are working in a company you have you are doing grasping or something and you have like your own data set of like i don't know 1000 images and you want to create a model in that good luck trading from scratch so it's better like that you boost strap your model by starting from pre-trained pre-trained model and then fine-tune that on your small data set what decides how well you're going to do one thing that decides how well you're going to do is the quality of the features that you're transferring over so really you know the features that your source model learn are crucial because you are transferring them basically and then fine tuning on the on the on the target model um but what does like what does so we're thinking like what does like how how do we decide what these features are and in general these features are dictated by what priors we use or add to the training process when we actually know our training so this this includes the losses we use the regularizers we use like if you if you do some regularizer you are actually you know adding some like inductive bias and you're advising the model to like you know prefer like to to to kind of like eliminate some some some some weird things that he does that doesn't have to learn and learn some other stuff and this includes also like you know any data augmentation you do uh the optimizers you use anything basically that you actually like choices that you do during training uh adds some bias you know like and and and shapes your features basically you know so your features that you learn basically depends on what priorities you have so we're thinking okay what happens if we do address your business so this is an additional prior right like you are you are forcing the model basically if given two images that are very visually similar you want it to output the same predictions so those images were basically can be basically shifted by some non-robust features which are the adversarial examples or like that adversarial perturbations but both of them you want them to be predicted similarly by the network which is something you know that makes sense like we want our models if given to visually similar images to output the same thing uh so we were thinking like what happens if we add this robustness prior so what we did is we did it like we started training we trained a bunch of image net robust models robust imaginative models and standard legitimate models then we transferred the representations learned to 12 different image classification tasks but before coming there actually like there was there are some conflicting evidence of like why this this might might not actually work because like whenever we train with robust training we actually deteriorate the standard performance of the model and there is a paper by google by cornbread al that showed that in order to get better transfer learning your source models have to be better so better event models transfer better so they showed that there is a correlation with how well your imagenet model is achieving on imagenet with how well it's going to transfer to a wide range of trans of of of downstream tasks so this was okay okay we were like okay damn it then this might not probably work because our robust models have lower accuracies but let's try it anyway you know let's see maybe maybe maybe we find something interesting and indeed it was interesting that you know those robust models when we transfer them although they have lower accuracies than the standard models they transfer better consistently on a wide range of tasks not only image classification but also object detection and instant segmentation it was really surprising really nice really clean really like you know direct like i like when i like when when when when when these the results they come you know the idea is very simple etc and the the results just flow immediately you don't have to do any hacks anything just like you know it works i love when this happens like it's really satisfying as well for me so so so yeah we just created a bunch of robust models bunch of non-robust uh standard models the robust models transfer better so we started thinking why this is the case like is that conflicting evidence like what's going on like what we are we have lower accuracies how like what happens with the google paper like you know what did they find there so we actually dig more and we found that in in it actually is the like of course the google paper is correct like you know uh we find that we are not like you know we didn't say that it's not correct but we wanted to understand what's going on uh so we found that both robustness and accuracy uh are important so if at a fixed robustness level if you increase the accuracy of the model you improve transfer learning so what we what we got is basically kind of like a a correlation curve between image net accuracy and transfer accuracy that's basically higher than the standard model for robust models so what what essentially that means that is that if you improve the robustness all the shift all all the kind of like transfer like translating curve that basically relates you know how large the model is or how well it's performing on imagenet and how well it's gonna transfer shifts up um so it was really really exciting so just for best transfer so so for best transfer learning performance you need both that's a robustness and accuracy so if you fix robustness level and increase accuracy you increase performance so it's so so what the paper what the google paper found is that at a business level of zero basically if you fix if you're improving so so that paper should be basically better image net models transferred better at a fixed robustness level or something like that so like if you fix the resonance level you improve the image that model yes it transfers better but that's like that's kind of like for a fixed robustness but if you improve robustness it will it even if the accuracy is lower it might transfer better but uh and and that people got an oral actually at like in europe's i was super excited like this was you know like one of one of my previous papers you know the the randomized smoothing one got a spotlight so it was my second paper in this video so i was very excited as well like i was super super excited then but like the oral one i was like damn it like that's amazing i was like i was like in redmond of course at home you know enjoying the pandemic from home uh and suddenly you received this email it was really really amazing i was like oh okay like that okay my goal not my goal but like i would like to eventually to get some best paper award or something but like getting an order before starting my phd was like amazing i was like okay this is great so i was very very very excited i don't know people some people don't care about like you know awards and stuff but me at least as a young researcher you know just starting this motivates me a lot like you know it's in addition to like what people appreciate how impactful the work is it actually gets like you know some nice award like no appreciation at the conference it's it's it's really nice i know this doesn't matter a lot maybe like you know it's just like like something like but at least for me it motivates me and that's what's important for me so so so that's good uh so yeah people liked it as well super exciting uh can i can i maybe jump in here uh for for a moment and talk about this because it this is it's very interesting right the fact that you get more robust models and they transfer better and so you know on one hand you can make the argument that the representation of robust models is better just because of the fact of how we train them but isn't it also the case that this might be an effect of how we define what transfer learning is because in transfer learning you know we we define the objectives for transfer learning which means that you know we as humans we think these two tasks are somewhat close right like these are medical images these are imagenet images right to make the determination which tasks are even considered transfer learning tasks we use our human assessment of the data which is exactly the what you would call a robust features right so to even make the determination of which tasks are transfer learning tasks we use robust features and when you train robust models you exactly train them to be responsive to robust features so what i would have maybe as a bit of a challenge hypothesis is that maybe these models don't actually transfer better to any transfer learning task but only to the transfer learning test that we as humans consider uh to be transfer learning tasks there might be there might be other tasks to which the non-robust models transfer much better but we just don't we don't know because we never try like okay here i have my malware detection data set how am i supposed to know that my imagenet model is going to transfer really well i just don't get the idea right because i'm a human so is is this is this kind of a this seems like a bit of a circular loop of how we define the task and then you know how we solve it i mean it's cool but yeah yeah no this is this is is it not an anti-pattern because why would i use an imagenet model for a malware classification task because it's effectively going to result into negative transfer right because the overlap between my source dataset and target dataset is roughly various parts but that's what you're saying right i'm that that's what you think what i'm no no no of course of course you're right you're right like i'm more making kind of an esoteric point that you know there essentially and i agree i agree with you hoddy this paper like the features not box paper i consider like a landmark paper in in the literature of adversarial examples it clearly like the experiments are very convincing about the fact that you know that the data contains features that to us humans seem like yeah those are features right like shape and whatnot but they also contain features that we don't care about as humans but they're still features and now what i'm what i'm saying is that you know when we as humans come up with a transfer learning task we say look these two things have much overlap and we do this on the basis of robust features however there might be there might be two tasks that we as humans think have no overlap like the malware thing so as a human we think now what do they have in common but right it could be that they share the same uh non-robust features yeah no i i actually i actually totally like this is a great point to be honest okay there's a bunch of things here that i wanna discuss it's actually really really a subtle point in fact like very recently there's a paper that came out that showed that actually this correlation doesn't happen if you transfer from imagenet to like chexnet so you know medical image it doesn't it probably doesn't have any overlap it's just using like some like weird features that you know that are not existing maybe there are existing that has none robust features but like otherwise features as for humans nothing overlaps you know but maybe maybe maybe there are some non robust features which transfer and which is which is great and that's why probably you see a little bit of improvement from imagenet to uh chexnet but this correlation and this this hasn't been done on robust like if you train it if you transfer robust model to checksnap so pro i actually was trying to try that to be honest like i was trying to like see because after this like someone tweeted about it like the i forget i'm bad at names oh my god i forget someone to tweet about this paper about like transferring from internet to the chex net and they show that you know this google paper doesn't hold like if you actually improve image that model you know on imagenet it doesn't transfer better or checks net they always all they all roughly get the same performance so you are just like you know transferring some minimal minimal knowledge there so personally i wouldn't i wouldn't tend to like i mean people you know whenever they do medical imaging they just start from internet pre-training i don't know why but like it doesn't help that much it helps a little bit and this correlation doesn't hold but i totally agree you might actually you know uh trans try to transfer a robust model and it doesn't improve better than standard models maybe if to it's worse i don't know but i i was actually trying to try that but regarding this point of like what what what like what what's transferring like this there are robust enormous features like what's going on i don't know exactly what i don't know the exact answer to this but like i have a hypothesis which is basically okay you have this robust features and non-robust features when you are doing robust training you are just masking blocking out the non-robust features of imagenet which are you know like kind of shortcuts that imagenet model on image learns only on imagenet like there are tiny things that are specific to imagenet so robust features are probably more transferable like they have those shapes and stuff so by blocking these kind of like non-robust features and transferring the robust ones and then fine tuning on the new task we are just like filling in the gaps the non-robust features of the new task that's how i see it to be honest can i ask some dumb questions right real quick yeah the thing is i feel like this is this is quite an advanced conversation and i need to dumb it down a little bit so when when your lab released the features not bugs paper it made the quite interesting finding as i understand that the vulnerability here to adversarial examples is a function of the data set itself and it spoke about this dichotomy between robust features and non-robust features and if i understand correctly non-robust features are basically features that generalize quite well but are imperceptible to humans so a robust speed feature might be a pig snout and a non-robust feature might just be some weird collection of pixels that unfortunately generalizes very well very well indeed so it seemed to explain this phenomenon that there's the transferability of these features is is because it's in the data then that's why it affects so many different types of models now the the paper did an experiment didn't it to kind of um let's see what happens if we remove the non-robust features from the data set why don't we just start with that how the hell did they actually do that so so they did two experiments in that paper i wasn't on in the lab when they did that but i know the paper so the first experiment basically is that they they they just try to like you know the first uh so they did two experiments let me remember them so the first one was uh just kind of like for for a data set of dogs and cats just like you know fine adversary examples for dogs and these are examples for cats and then you have your set of examples and label them different basically the the the images that are labeled as dogs has the non-robust features of cats and then the images that are labeled as cats has the anonymous features of dogs so just like you know use relabel these images to be cats where there are non-robust features but the actual image for humans is a dog and dogs where there are numerous features of dogs but the actual image is a cat and train on those and then you achieve really really high accuracy there so so so like you still which which says that you know those numbers features very high accuracy on the original test set right that that's what's crazy so so you give the model like a data set of a bunch of dogs which are all labeled cat right but you've perturbed them to adversarially look to the model like a cat and now that model trained on this data set will be able to recognize a true cat image from the original test set which which like unequivocally tells you that there's something in these images that it can use to to actually recognize cats yeah exactly so so this first experiment yeah just said that you know those you can act you can just generalize from this these numbers features so the other experiment was basically uh trying to just remove those thunderbolts features from the data set and train using standard training on this new data set that's robustified so it has basically no non-robust features and then you achieve you know robust accuracy by just like you know doing standard training but i suppose um philosophically though what's the difference but you know we've got robust features which are human perceptible features and we've got non-robust features does it really matter it's the first question and the second question is um we were talking with francois charles last week about the manifold hypothesis you know he has this beautiful idea that all natural data falls on this interpretable manifold and presumably all of these different types of features fall on different manifolds do they yeah yeah no i i actually i actually quite believe with this kind of like you know uh conjecture hypothesis uh like yeah like like i feel like this manifold is basically i mean it's it's easily describable for by humans but like not mathematically and rigorously you know formulated like describe like the way we describe this metaphor is basically the manifold of images which human actually recognize as you know like correct images and that makes sense basically but i feel you know those those images of the cat that are that have the numbers features of dogs these are you know i agree with that like they are on totally different manifold and as because as a human this manifold of like human you know uh making sense images i don't know what you what do i call that manifold so so a human when he sees or she sees the image of a dog that has none of us features of the cat they will say this is a dog there is no way they will figure out that this is a cat so i totally agree with this uh i i mean i don't know how to prove it but i i don't i totally agree with this i i feel like you know this is this is this is what's what's going on and in practice okay like like no one knows exactly like what's like what's like even though there was features model it's just like a model it's like i'm not like in 100 this is the case like no one actually knows exactly what's going on but i could like i feel like these non robust features and anonymous features you know separation inspires you to do a lot of like cool stuff downstream tasks try to justify what's going on and uh you know like similar to transfer learning you try to like justify why transfer learning works like what's going on are we transferring the non-robust features or the robust features and you know like like all of these like interplay between like what robust features and what features mean that you did inside the do adversarial robustness help in transfer learning i'm sorry if i forgot the exact name of the paper but in that paper you showed that if you have better norm separation i mean norm separation has some kind of relationship with you know fine-grained features or coarser grain features and so on do you think maybe this might have been the case for non-robust as opposed to uh robust features uh you me but by what like you mean the the the size like how robust your your model is like the like the basically the size of the like the size of the lp ball that you're training yeah maybe the robust features are actually helping into the norm separation and that's why maybe they are able to distinguish better because this is one of the hypothesis that you have already shown in the paper so maybe you could you know take that route we don't like we don't show anything like we don't we don't like show that this is actually like we just like you know hypothesize we cannot like there's there's no proofs but it's just like running like no empirical uh experiments but uh uh but yeah like i think okay the people we try to have like to avoid like having you know getting into conclusions oh we are transferring the robust features and we are not like like transferring the non-robust features or something like that because i don't know if that's exactly the case like but but but intuitively the way i think about because definitely more work has to be done there to like see like what's exactly going on but what i can tell you is that really like okay the way i think about it is just we are blocking kind of the non-robust features which are generalized it's they are good to generalize but like for the same distribution images from the same distribution but they are tough to generalize to images from other distributions that's my intuition to be honest so that's why we're trying to block them so that they don't interfere in during fine tuning on this task at target sorry yeah target task basically we're just like eliminating them could you spell it out to me because i still don't completely understand that i was so confused about this the first time that yannick explained it to me because he said there are robust features and non-robust features but the definition does not seem to include generalization right we all accept that generalization in deep learning is a function of interpolation on a manifold right and non-robust features apparently have the feature that they do generalize so they do fall on another manifold and there must be some differences between these manifolds if because we're talking about robustness and brittleness and accuracy and there's an interesting interplay between them right so you're saying that the non-robust features maybe they're less learnable maybe the manifold is different maybe the manifold has discontinuities maybe it's less interpretable but you know are you saying that the non-robust features generalize worse than the robust features um i cannot definitely say but i i would say from from this you know transfer learning experiment i believe so because you know like if you keep them and then you transfer okay there's there's one experiment that i wanted to do but like i never did i'll tell you about it that that can that might actually answer that exactly so so okay so the way i think about it is that because we are blocking them okay when we don't block these features and transfer you know robust and robust features we get like some some like improvement on like downstream tasks above of transfer trading from scratch but when we block them and keep only the robust features we get better better better performance there so the actual interesting experiment is to try to transfer just like the underwater features just like you know create this data set with non-robust features and train a model on that and then transfer that and then create a data set with only the robust features so a robustified imagenet dataset which has only robust features and try to transfer that and then in unrobust they imagine a dataset which has wrong visually looking images but like has the non-robust features there and try to transfer that and that kind of might actually you know like kind of like okay i'm gonna i'm gonna do this i'm gonna like let's see that comes back to exactly the point that i was making in the fact that what i i think what we call transfer learning is a function of the robust features like the the way we even select the task so i think the as you amend the paper name that what it should be i think it should be even amended more saying that you know uh robust robust features help for generalization four tasks that a robust featurer as humans considers to be transfer learning or help for sorry help for transactions so i guess that also a question would be is can we find equally many tasks for which the non-robust features transfer learn even though we wouldn't consider them transfer learning tasks or are they actually let's say useless um but yeah sorry to to tim's to tim's point maybe i got asked an interesting question yesterday that might clarify this a little bit uh so someone uh asked me in in my phd exam actually um isn't if if you think about language and you think about the question how many legs does a snake have right and you type that into google for questions like this google will often give you an answer and it will be like a number that it finds somewhere right and now imagine why that is that is because these question answering systems they've been trained on kind of questions with marked answers now you consider and now consider what the model does the model sees how many so the model is like super duper primed to find some number somewhere because in all the training data set whenever it was how many the answer is some number but now you add snake which and so in the training data it has never seen those two features it has seen snake it has seen how many but never correlated like never together right so now it's sort of it it it focuses on the one with the stronger signal but what you've essentially done is you've kind of ripped two features that shouldn't be together you've put them together and the same goes sort of for for these these images so never ever in the data set is there a dog shape with like the microstructure of a cat fur right yet that's exactly what you do when you create an adversarial example you create a picture with a dog shape and a cat's micro fur structure now the model you know being a cnn being very you know being very acute to the structure to like little features it sees that and it's confused but it goes with the one that gives it a stronger signal which is what we call the non-robust features yet still all the cats have that so that that that is a good feature right but we just put it in very weird constellations there's an interesting thing there right which is that does non-robust imply low magnitude right because because you because the interesting thing is that the cnn has this weird entanglement wouldn't it be wonderful if you could say well snakes don't have fur i've never seen that before well actually they get entangled together so you can't ask the question so like what exactly do you mean by low magnitude as in the pixel map you don't need to change the pixel mass much to make that feature file but there's no good way to capture that like actually mathematically so a lot of things have been tried like let's say oh they're just high frequency features we'll just do like a low pass filter right like nah doesn't doesn't work right oh they're they're not yeah that doesn't work yeah or like just like i mean try to denoise the image or something no no this doesn't work uh like it's just like you know hit that like think about it as yes i mean they are you know like tiny features and high frequency ones but like hidden like in very very high dimensions like you know you have imagine like you have like a hyper cube of like you know many many many blocks around you and if you are just like adding random noise i'm like trying to like you know like if you like basically trying to like you know remove the the bombs from around you basically you're just like randomly choosing like a bunch of these boxes but like you cannot actually it's very difficult to find the one that actually has this you know you know like small perturbations just like there's exponentially many boxes around you and this is like hypercube so it's really really tough like i don't i like like that and like that sorry like regarding this kind of example like the only way like we currently have know how to do like where very well is like adversarial training so trying to like you know follow what this like attacker did like to place this kind of like example in this cube or randomize smoothing which basically tries to like you know remove everything from around you and all the cubes but this deteriorates your performance because you are just smoothing a lot you are just like you know making your function less is expressive so i guess it's easy to say that models do latch into high frequency spurious correlations but it's still ill understood as to why that's the case i mean there are some fourier analysis uh i mean fourier perspective on robustness of vision models but it's still pretty understood in my opinion i guess it's also one of the reasons hendrix and their lab came out with the paper on natural adversarial examples i mean examples that are naturally adversarial what can you do about it i mean yeah many reasons they laid out in that paper is one of the reasons is texture bias-ness and stuff and the second is that in image classification problems multiple objects get mapped to discrete individual categories so this is also one of the reasons so i don't know yet but i guess there could be some developments around these areas as to how they correlate to you know robustness and similar aspects of a vision model so i'm looking forward to such developments maybe don't know i can also work on them yeah this is this is a this is pretty cool yeah um yeah i don't know i don't know what the solution is like but yeah there's many many like things we can try but yeah it's tough it's what i can say like that's not the only thing that i can like hundred percent i'm sure about is that it's a tough tough problem very tough problem but yeah like this this this question sorry like this question of like nanobus robust features is always like you know in my head when i'm like thinking it's just like you know like what's going on what's transferring what's not transferring uh how can we use them uh what exactly are these like we have some evidence that you know they are actually you know like data like they are you know a characteristic of the data but we're not 100 sure about that as well still like robustness uh the box paper shows that like no no no like no guarantees that this is the case maybe it's something this with something else uh but yeah this this experiment of like figuring out what exactly is transferring by just like training on a robust defied image that model and transferring that versus training on an unrobustified imagenet model and transferring that basically by by like flipping the labels and like adding additional examples is something interesting i feel like i will i will i will do it i like to to support this and more and being able to answer the question uh better because i i don't know how to answer it right now and i can speculate but uh with experiments we can actually we can like precisely uh try to like you know get more intuition and answer these questions so i think the question is question of robustness is on one hand you want to let's say become robust to adversarial examples but adversarial examples they are defined in a very specific way if you really want to work with them because they are defined in let's say you know i can't perturb by some like i have to stay within some l infinity ball or some l2 ball around a data point that i have in my in my data set and and that's how we measure kind of closeness because we we know very probably to a human there's nothing in like a tiny l infinity ball that looks different however what we actually want to do is to say look these two things they seem similar to a human right um which is completely different from being close in in l2 space or in l infinity space so there's this it seems like there's this mismatch of you know what's close to a human and what's close in the adversarial example literature so my question is do you think the ultimate solution if we had somehow an accurate model of the human visual system would that help us to battle these adversarial examples or would we still sort of be in the same troubles just with kind of a different distance metric no i do like i think like the key to solve deep learning is figuring out this like close or distance metric that makes sense like if we find that like if we know that y two images like how to measure two images closeness in like the human sense we can just like use a k n or something to just like classify we're done that's it like you just like you know you run a k nano like the all images one is neighbor that's it done i mean all like yeah learning is just a distance function at the end of the day exactly so like like that like if you if you find this if you if you find this business let's chat on offline and publish the paper and be you know legends but yeah no i think this is uh this is the this is uh this is the question of the day in my opinion the way i think about it this is the question like how what's for me for for computer vision also i don't like work on other like computer versions i i'm once i'm most motivated by computer vision uh but yeah like for computer vision i think the distance like similarity kind of notion is key to like solving many things the more we try to move away from the distance with stuff i guess we might be able to increase the transferability of the you know robust models because there's also an aspect of transferability of this robust models to different kind of attacks right and they do not transfer well we do have evidence about that right until soil phase is lab came out with a cool technique that is based on lp ips learn perceptual image patch similarities that that is based on perceptibility of you know different robust features and it does not rely that much on distance based matrix and they do show that they have better transferability across different attacks so i guess this this is suggestive of the fact that the more we try to move away from different distance best matrix the better we might be able to solve the transferability problem across different attacks that are both lp based and non-np-based so so the transferability of what exactly like adversaries attacks let's say i have trained to train the model that is good to defend lp based attacks but that again fails on non-lp attacks so this is again this is probably happening because we are relying too much on distance based matrix maybe if we rely more on the you know perceptible aspects of these models maybe we can get away with this so but so i mean yeah like that would be another notion of distance but like the current notions of this the current mathematical notions of distance are not sufficient uh because they are are we try we try to like just like you know go in like certain directions or like off of the the image manifold or like you know space of images yeah but uh it's still like you know perceptually similar images like you can you can you should hopefully we should be able to define some some notion of this i guess i wanted to say that the conventional distance based matrix might fail this but yeah i mean fed is level again prove that perceptibility might you know uh enhance the transferability so yep so while we're on on the topic of of of distances um you you also worked quite a bit on on certifiability uh right i don't know what that has to do with distances it sounded cool in my head um so in can you can you maybe uh just explain a little bit for people who have never you know come across you know certifying certified robustness and so on what that actually means because it seems quite you know daunting to say look here is a neural network it's provably robust like what does that what does that even because i i feel it's it's kind of important that we have a notion that you know is understandable here yeah and just to add on to that question you you said that you um you you made an add-on essentially to let's say google's vision and and um the the microsoft vision api and so on uh doing the um the randomized smoothing and a denoising model and you didn't even need to change the the base classifier and i was thinking to myself well if it's so easy to do that why did why haven't they done that already so what what what yeah what does certification mean and why haven't they already done it yeah no these are all great questions so okay certification okay so we're defending neural networks there are two things right like we can do like empirical defenses and certify differences certification is like basically like another like certified defense like it's like what we get from certified defenses so what we do for empirical defense is that we try to like attack the model and like train on whatever we attack so that we get like empirical robustness and what that means is that there is no guarantee that you know in for this image we cannot find an attack so it's just as like our robustness is as good as the best attack in the world so we there might be some very super strong attack that we haven't evaluated our model on that can break it so that's the problem with like empirical defenses you're always like oh maybe you know there's some new attack that i don't know about so for certification and certified defenses you actually find a proof that around this data point if you perturb any kind of perturbation around this data point within like some you know in uh neighborhood of the city point whether it was like an lp ball or like you know some other neighborhood mathematically described you will never be able to like flip the prediction of the model and the way there are many ways to do that some people do like lp relaxation so you just like pass in a box around the data point through the neural network and then you would look at like kind of like the polytope of like that you get in the output of the neural network and you check if that polytope kind of intersects with the the with the with the with the with lick with kind of like the decision boundaries uh of your decision of your output basically if interest if it intersects then you you are not certified if it doesn't intersect then you have certificate so you try to see like how much you can increase this uh ball in the input space so that you uh get a polytope basically like a random like a like a very complicated shape uh in the output which is not which which is probably non-complex or oh no actually it's complex because that's what people actually do like they do convex relaxation so that it's convex and then you try to like see how much you can increase the the ball in the input space so that you that polytope doesn't intersect basically just i understand so you're you're dealing with an input space let's say 100 000 dimensions and inside there there's some natural image manifold maybe but you know yeah and how could you possibly cover all of that experience space and do any kind of analysis about you know yeah no in this example so so so so you literally uh have like a hypercube let's say al infinity okay let's just think at infinity you have a hypercube basically around that data point so you pass every kind of like uh like like let's let's let's think in 2d you have a square around this data point and you want to pass this square you know you basically pass the boundaries of it and you see where those boundaries transfer because it's like a convex mapping the boundary transfer basically to like you know boundaries basically in the output space and you will get like a convex shape that's mo that's morphed so that's how basically you like you you you achieve that you can cover everything you you're just passing literally trying to like see how the shape deforms until you reach the output so the problem comes in from the non-linearities right so if if we're just so that it's easy to pass a square through the linear layer right it will become maybe a square maybe you know like uh some kind of deformed square or something like this it will still be convex and and the good part about this is that you can pass it through the next layer again and all you have to do is you can consider the boundaries of the shape because everything else is inside so everything that was inside before will still be inside after when you have the non-linearities right exactly that's when you get the weird shapes so you can no longer say that anything that's kind of in between two corners is also going to be inside the shape right so what people do is they they pass it through the linear layer right they have that chip then they pass it through the non-linearities and then they ask themselves okay what's kind of the worst case that can happen and then they make a convex shape around the new thing and that's where the the looseness of most of these bounds come in right so that's what you said at the very beginning if you just do this layer by layer you these bounds they get so loose that you know that the amount of space you can certify you know where you can say around this image you can't find any adversarial example it becomes like tiny um so i feel exactly yeah this literature is perfect it's not really there yet but you know with with things like these randomized smoothings and so on i think we make a lot of progress but again i'm asking the dumb questions here so i understand that for a given example in experience space you put it through the model and as janet was saying because of the non-linearities it what we want to happen is that on the other side of the neural network it will still be a convex um continuous mapping right so your square on the input has been transformed into some beautiful little region on your manifold and you can see within that region whether you are transgressing any of the decision boundaries but my my question was that's just one example in this hundred thousand dimensional space so what are you gonna do you're just gonna like randomly sample oh like it's similar to like how how we evaluate like our models like like this this this kind of like question transfer like goes to like okay if you train a model on this data set and you test it on your test set you get some like notion of accuracy but you didn't test on every single example in the world you just like test on like some held out set to tell to see like how well roughly you know your model will work exactly the same for certified defenses you just like bring a test set and test basically you know around each of these points how much you can go larger and larger and then you get a notion of like how well your model is done but like the problems that transfer from evaluating image that like machine learning models transfer to this kind of like setting it's just like you are getting a sense of like how robust it is you cannot guarantee that it's robust it's like you are getting like local robustness around each data point and you evaluate it on like you know a data set and you you you essentially take like kind of like the minimum uh ball and basically say oh within this ball there's or like you check okay you fix the ball and you check how many of these examples were certified of your data set you so you say within this ball 60 70 80 of my test set examples were certified that make that makes sense but i think the the reason why microsoft then haven't certified their vision classifier is the certification tells you more about the test set you use than the actual model so let's say a whole bunch of my test images they might be projected on to a manifold and inside that manifold it's actually very interpretable and very smooth and has quite good generalization capability and there is no transgression on the decision boundaries whereas um you know let's say 20 of that data actually went on to some other not even a manifold maybe those examples were very difficult to classify so they were just memorized by by the model but you wouldn't be able to do that across the board yeah that that i mean that that is one explanation why they didn't do it but like i mean the main reason is that it doesn't yet like we get the deterioration in standard accuracy like randomized moving is not like you know a working certified defense that we can just deploy right now it's not like at that stage at some point hopefully it will be i don't know maybe maybe not i don't know but at current like currently like why though like why i did you know like why i tried you know certifying these like google's api et cetera like microsoft's uh azures uh sorry like aws etc they didn't do it because they don't want to deploy a model that is robust slightly you know but loses accuracy like at this point because attackers can just like attack like they will if if someone wants to attack you they will be able to attack you so you you better deploy the best model that you can deploy like the best performer on the average case and like not worry about the worst case at this moment of time because the worst case unless unless you have like a model that's robust and doesn't deteriorate your standard performance yeah which is which is coming very very soon in in the paper that i thought that i publish soon on adversarial patches hopefully yeah hopefully hopefully i don't find the bug from now to them but it's looking really really good and it's certified that's why i'm confident it's certified so i'm not worried that nicholas carlini will come and like butcher but you're still gonna do it because he will still do it because i wanted to come back he has to fight with the mathematics if you want to i wanted to come back to carlini and trey murray tower that we spoke to last time so because they they were being incredibly down beat about this we had a good chat with them and i just felt demoralized afterwards yeah my god um you know because carlini has systematically dismantled every single uh adversarial attack going and you know they they also agreed with you they said well it's not worth losing 10 accuracy for a bit more robustness and his argument was that there's an infinitude of you know more adversarial attacks anyway so it seems like a slightly pointless exercise but but they said you know the main defense mechanisms were data augmentation or adding more data although the more data you add the more adversarial examples you have and um adversarial training i wanted to get your take on this but also randomized smoothing they they agreed that that was a successful mechanism but their general kind of take though was that we're on the road to nowhere we are just completely screwed when it comes to adversarial examples um yeah like i kind of care i kind of agree with that i mean i agree with that like it's it's so tough it's so tough that like like our best like model achieves i don't know 60 7 60 on c410 like in some like lp lp ball so like even if even if we achieve like i don't know like 90 or something like if someone wants to attack it like they have just like to know to repeat the attack like 10 times or like whatever so that they break it or like you know like they have to repeat it at some like some number of times to break them all unless it's it's like you know 100 certifiable defense and then they won't be able to break it but uh but i think like like i think the key the key point is figuring out how like how to like make models do things similar to humans like that's key that's like like like i don't like i mean i work on like robustness and defenses and stuff but like i also like try to like think like what what those representations mean like how can we make our models like do things similar to humans like have the prayers that humans have like we don't like we will not be able to fool with like being full with like those perturbations there are some other stuff that fools us of course of course you know like uh um like illusions and stuff yes which is great i mean i'm happy if a car gets like uh fooled by an illusion not by like you know tiny marker on a stop sign or like by random corruption even like that's even like a worse problem at this point like it's random corruption like in snow weather distribution shift that's a serious problem which my other paper actually kind of targets like you know the adversarial examples paper tries to tackle basically you know robustness to like changing the game basically to achieve robustness on like common corruptions which is actually you know a more like direct problem that we have right now it's like a problem of distribution shift which is more severe in my opinion at this moment of time like we have to solve that before so i mean it's good if we solve that example that's good that's good but like i think this is like a more prominent problem at this moment at this moment like we have to solve that even because okay the best in the ideal world there will not be adversaries but they will be there will be like random noise and corruptions and the ideal world so they will happen eventually yeah yeah yeah exactly on that exactly i think the distribution shift a lot of this comes back to generalization right you said we we need better priors i i like charles conception that we need to have human prize in our models and most of the prize we have in our models at the moment actually um are to improve the information conversion ratio right there's this realization that uh most of the information out there in the universe is just it's like the kaleidoscope effector charlay says it's just it's a very small amount of information which is represented in lots of different ways so we need to improve the efficiency in in doing so but i i wanted to come back quickly to um because you know we've been talking about um the the randomized smoothing and and the noise models and stuff but but that your your main contribution recently actually was this incredible concept of unadversarial examples and we haven't really properly spoken about this but it's this idea that you can have a patch or even a texture you did a version in 3d where if you if you create the right kind of patch you can actually reinforce the model making the correct prediction even under a variety of different background conditions and noise and so on so can you just go go through really clearly what that process is yeah yeah so this is the most exciting work recently i'd like i i i'm still super like every time i like you know see it i'm like you know think about it i get i just get super excited it's just like you know it's very simple idea like super simple okay we have those adversary examples or adversarial perturbations that you know easily break our models so we are able to add like small non-robust features which them which our models love right like whenever they see these non-robust features oh i love them i'll go there i'll i'll see what they're saying i'll i'll do it it's super robust to the rotations transformations to everything like whatever like whenever they see it like even if there's like so much like other information like another bus like there there's like the image of the dog but like small noise or like features of a cat oh this is a cat amazing so i was like holy like why don't we just like that's how the models perceive our models currently that's how they perceive we don't like it we don't we want them to perceive like us but they're currently that's how they perceive the world so why don't we design the world for them in that way so i was like why don't we just like bring those features okay you like this feature this is this is this is the picture of a cup you know it doesn't make sense for us for humans that's fine it makes sense for neural networks let's use it so what we basically did is that we just like brought like created those non robust features but for the right class so in a simple setting you know we just bring an object so so this thing doesn't like we basically try to like color or like paste some patch or like redesign or color an object of interest so that whenever the new network sees it it sees that those non-robust features of this object along with robust features of course but in addition like we kind of amplify the non-robust features so like in your example that you like like the cat with a furry would just make the fur more furrier like i don't know like more farrier for a neural network right so whenever they see the neural actually sees this oh this is a cup for sure even if there's some other like you know noise or distribution shift or transformation it actually works so the key there was also like trying to testing this under like all sorts of corruptions not that only test time corruptions like all sort of like the corruptions that imagenet c has and it works really well it's like those and like underwater features actually like are really salient even under like the worst none kind of like common corruption which which like whenever we paste them on our objects we see the objects even under like weird distribution shits the easy way to do this would be to say okay well here's a cat i'm going to put an un-adversarial patch on the cat which has the non-robust features of a cat clearly that would be cheating wouldn't it because i'm not supposed to know what the picture is before i put the pattern on so so hold on like mix them together no this is a great point like the the key the key okay the key of this paper is that it's not like it's like changing the game for a specific scenario for specific scenarios in the world like of course this is not gonna work if you wanna like go train a model and go outside and like detect dogs and cats it's not gonna work because we are not controlling the dogs and cats but in many many scenarios we have like the system designer or like you know the human has control not only on the model but only also on the objects of interest that they are trying to recognize detect track so like the way i actually came up with this is that i was working with with microsoft with some researcher on like some like project with like on helicopters and stuff like with some company and we're like how can we robotify landing or something like how can we how can we like apply knowledge like to make landing of like helicopters or like drones safer or like more robust under severe weather conditions i was like thinking we're discussing alexander madrid was there as well you know along with like ashish kapoor from microsoft and sai vamprada as well and we were thinking and i was like holy like why don't we just like redesign this like why do we design this landing pad like that why don't we redesign it in a way that new networks love to see and there with with that actually idea came from there like why like for such for such for such a scenario we we have uh control of what models what perception model we want to deploy on the helicopter for example but also we have control of the object of interest this applies also for like stop signs for an autonomous vehicles we designed the stop sign right we designed like that because the humans recognize it like that why don't we have another stop sign for example for it for you know next to it for thomas drive like autonomous vehicles or like in a warehouse you know or in kitchen let's say in a kitchen you have like a robot that's doing the dishes and stuff why do you want to color the dish like that why don't you color it in ways so that like it's much more robustly classified and recognized and detected by the model and the idea is very simple it's just like you know doing adversarial attack targeted at risk attack but towards the correct class like it's actually just literally literally reversing the story just like just instead of training the model train the object train the input so back regate and update the texture in the input space like in the 3d texture so we use like you know some like non-different uh sorry differentiable renderer and stuff do that so so like the demonstration that i have on like image net and like 2d images is just like you know a proof of concept and like a systemic way to analyze those patches but like ideally you will actually create 3d models or like 2d patches and print them and paste them on objects you're not going to like modify to the images because that's cheating so of course to to clarify a bit more there used to be this demonstration which i found pretty cool when i got into kind of uh computer science of an engineering professor here at eth of raft andrea and he had these drones and i think it was a ted talk and these drones they could do like amazing things with like balls that he threw up in the air right yeah and the point of the story was there are like cameras in the room that track the drones and the ball and they were made specifically like bright red and round because right we can build cool computer vision detectors like a half transform and the color is really distinct so we can like pinpoint where that is so in a way that's what we did before we designed objects that these cameras and our systems can track really well but now we we can do it even better because we have back propagation so we don't we don't have to think like exactly what do our systems like we can just ask them and then right they can exactly my question is um you know given that you can do this with a network you can take a network and you can see what does the network love and you can put that somewhere and it will be recognized very robustly do you also modify the network itself because you could think of this as like a two-player game right where you create a patch network combo that is just like so and that would be i think mainly a function of the of course right like the that's like the complete system will be determined by what corruptions you want to be robust to so you kind of build a network patch combo that that just loves each other like is that no this is a yeah this is a great point like this is actually a setting which we did not discuss in our paper but it's like you know a direct setting which is basically joint training of your model and your patch we did it it actually achieved similar results but like because it's not like as clean we just like avoid like talking about it and let it left it for like you know uh future work so what we did is just that we fixed the pre-trained model so just like bring any pre-trained model fix it okay even if it's a randomly initialized model like the crazy thing if bring and randomly initialize the model and just like optimize the patch with respect to this model so minimize the loss so usually we minimize those with respect to the parameters of this model so don't do that just like minimize the loss respect to other set of parameters which is basically the texture so optimize your input or your objects of interest so it works really well like even on a random initialized model so like i think there's a random initialized model you deploy it like this might be you know something like like it's just like random totally rather maybe maybe it has applications in privacy or something i don't know like if you don't your model to be hacked or something it's random and then you design your objects of interest so the model is like for anyone who see it or try to hack it or something it's random like it's not gonna do anything but like whenever it sees the object of interest it's just gonna get super exciting and it will detect it so we just focused on pre-trained models but totally you like yeah it's like you can achieve even better results by like jointly training them you can achieve as good as results if you join the trains training because it's just going to improve your model so maybe like you alternate like minimize the inputs the parameters then it will minimize the parameters of the input that you can alternate or like do crazy stuff but but but totally like you you touched the exact point like we design objects because we think oh like red color and shiny color might be you know interesting for this system it might be better for this model but doing it in a data different way like is definitely better like like you just like collect data and do it like it's just like making your life easier we tried bass lines of random colors and you know like something that makes sense for humans it doesn't work as well as you know uh like i have some some examples in the paper like doesn't work as well as you know did like designing it and did a different way so literally it's just like applying you know a data different way to design objects for neural networks like that's that's what excites me like it's foreign like from the perspective of new networks we are redesigning the world i don't know this is a wider ranging application application so i mean first of all the probably the jointly trained thing would probably be extremely vulnerable to other serial attacks so maybe that's you know maybe that kind of blocks those things from being deployed but um i think the applications of this are much more wide-ranging because this comes very close to for example data augmentation and data driven data augmentation like to to to really understand what kind of things the neural networks like will probably also make you then able to design data augmentation algorithms for classifiers when you can say okay what does this network really love and you know probably you know we should we should augment our data in such a way either to reinforce or to to completely go counter to that depending on what we see in the data right yeah yeah yeah yeah i mean yeah could i tell a couple of clarifying questions because i i still don't feel i completely understand this so it's a wonderful concept the way we design uh stop signs right on the side of the road they're big and red and red you know we we our brain lights up so they light up like a christmas tree for us i understand the concept is similar to like a helipad or a runway we make it light up like a christmas tree so we can reinforce it so so i understand correctly this is very similar to the work you know when they could make a turtle look like a rifle in 3d or you could put an adversarial patch on a human being and an existing vision model would think it was something else so this is very much for the regime where you actually know what you want it to classify right so i'm going to make a patch which will look as much like an airplane as possible so you are so it's not it's not for any class it's for a particular class and then i'm going to stick that patch on an image because then it makes me think well so you're taking a non-robust feature and you're increasing the magnitude of it because normally it's quite low magnitude but exactly i bet that works quite well for the setting where you have a pre-trained model but why don't you just throw the whole thing out and start again you know what why even use a pre-trained model why don't you come up with the equivalent of a qr code and not even use the original model at all why don't you just change the whole pipeline so like the other option is just like training the model jointly with the patch i mean you either train the model which is what we do you know you just bring someone some models and you can some objects and like data set and you train on them so that's what usually we do right like people do the other thing is just fix the model and train the patch or like the input or like the object of interest just to design this structure and the third option is like train them jointly which is possible yeah we like it's possible i'm saying and it's it it's like it actually it will achieve as good as like you know just training the patch because you can start from every random not from a random nationalism you can start from a pre-trained model or any model you want and just like try to like improve that more and more you are basically minimizing the loss your objective function instead of just like with respect to the model also with respect to the other set of parameters which is also like the patches and like a crucial thing with the parameters of the patch or the texture the crucial thing here is that uh i don't know someone mentioned that like uh that like we have to like include like corruptions in the training pipeline like the nice thing is that we don't actually do that like because we cannot like model or kind of like corruptions in the world we just it like the nice thing about this method is like we it doesn't it didn't see any corruption any random corruption any noise any any like snow or anything during training time but at this time it generalizes really well to them whenever it sees those saving features like that's the cool thing and we specific of course we can like probably pipe in like common corruption add more corruption during training which will make it even more and more robust but we wanted to show that it hasn't seen any corruption because in the world it's it might actually come with like uh sorry like see corruptions that haven't seen before so we wanted to demonstrate that it actually generalizes really well but totally if you if you if you know them what about the corruption is that it's gonna the data augmentation will help it even more and more so you will have aerobic like instead of doing expectation over like rotations and like transformations of the of the patch you can actually do expectation over also like corruptions as well so you can train the patch with expectation over corruptions over everything that you know of corruptions of transformations over other data points of everything basically uh and minimize the loss expect explain value of the loss uh but yeah so it's really it's it's it's really really general but yeah you totally can train both the models and the patch but we didn't like in this uh invest a lot in that yeah go ahead what's up i think in in this regard a paper that really shows some positive direction is the self training with noisy student training paper where the student is actually made made to you know learn and ensemble of a harder teacher model so you have clean set of images you pass it to some well-trained teacher model you take those predictions as the ground truth levels for the student model and while passing the images to the student model you heavily augment it with transformations like brand augment and you ask it to be consistent with both these different versions of images so that way you went for some you know some sense of consistency and at the same time the teacher has higher capacity uh than the student while you are training it so and that has many nice properties such as improved robustness to common corruptions and also some level of adversarial robustness to adversarial perturbations in particular l2 uh attacks so they did not you know do anything provably but they did show empirically this works this yields better robustness to certain adversarial perturbations so i think having some sense of consistency regulation might be also helpful along with stronger data augmentation policies so yeah you can you can you can do all sorts of stuff like anything that you can you do for your model you can just now do it for your input like any any like sort of like data augmentation or like you know tricks you do to optimize your model better you think about it you have another kind of like set of parameters you're optimizing which you can see that's that's the only thing like you cannot see actually like the parameters of the model but you can see the parameters of the of the object that you're optimizing yeah like the color it is also very coherent with human perception to some extent because inherently we are training our model that that okay this is my image and if i am being a little bit darker inside some dark light then it should be my image as well right i mean inherently training my mind to believe that sort of stuff it's actually it's essentially like telling you telling me like oh if i see like uh i don't know like it's just it's just like telling like adding adding more more information to the model like is it like my i don't know like some my friend telling me oh if you see me like i will always wear yellow okay so just like whenever it's yellow like i have more confidence like if i see him from far away if there's more confidence that this is this guy because he i have a little bit more information that this you know this is this guy because he had this information that i will always you know wear yellow in addition to his feature expressions and stuff i will that that will be persistent like there yeah i would have more information that he passed to me when he told me like i would always like you know that is exactly what the recovery training sort of tries to do so that is the beauty of that method i guess and i also wanted to ask another point is with randomized smoothing i guess this is one of the cases where a network a network sense of perception is not that much inherent i mean coherent with how we perceive things right because uh you i mean uh sorry that was not randomized smoothing that has to be with unadversarial objects because we were mentioning that we are trying to you know maximize the uh perception capability for a neural network but that might not be again coherent with how we perceive things don't you think this is another case where uh both the perceptual capabilities sort of very different while we are trying to constrain our models to be as coherent with how perceived how we perceive things i mean it really depends like on the situation on the task like some tasks we don't interact with them at all only robots interact with them like hopefully at some point like the robots only will do dishes or like you know will uh wipe the house or something you know like we will not care how how things are you know designed uh maybe at some point you know even stop signs we will not be driving at all i don't know maybe this will happen this will come so we don't care about how the stop sign is designed but if there are tasks that you know humans and robots are you know uh doing it to get simultaneously we can even we can either leave it as is because a human like know how to detect shapes much much better than networks and networks care about the textures so maybe like let's just like you know give textures draw it to machines and shapes to humans or like you know regularize like add some regularizations on how we paint the object so we can add you know we can like now it's like all sort of like kind of like looks noisy and like you know similar to adversary examples you know very noisy very hurting to the eye maybe we can optimize and like you know some color space or something you can like you know optimize not in the pixel space you can like add some like function and optimize that function you know to like optimize the model so you can literally totally optimize what color should be not like or what it tricks it should be what the total color should be you can optimize that you can like add regular you can do literally whatever you want it goes into the direction of these people who train like gans to produce adversarial examples right so you know like you have like a gan that produces images and then you find something in its latent space and it goes yeah yeah i think if you want a good example for unadversarial examples for humans it's branding so like you know you could you could uh you could teach a human to make the association of like expensive nice shoes that last for a while and make me good at sports or you could just put like a little check mark on it right and like the human like it's it's like this direct hack into the human brain it which is which is branding and that's exactly what what these unadversarial examples for the neural networks right it's like you can either learn this whole complicated stuff or i'm just gonna you know slap this piece of thing on here and i i make your you associate that with the correct class oh yeah this is this is a great great example great motivation yeah this is [Laughter] well on that note i'm gonna wrap things up but um uh hi dear simon thank you so much for joining us today it's been an absolute honour and uh yeah it was it was really really cool having one thank you thank you for having me it's been a it's a pleasure and i really enjoyed the the show and uh yeah really nice work and you know you do a lot of hard work like you know there's like i don't like the show every two weeks like it takes a lot of time to prepare i totally can imagine that so like it's really impressive yeah thanks for having me on and yeah let's just see like where this channel goes and like how much we can advance ai as well but yeah you're doing great job guys like happy to be here thanks thanks so much it was actually every one week but yeah i think just everything we've recently gone down to every two weeks because as you can imagine it's quite a lot of work yeah anyway right i can imagine [Music]
Info
Channel: Machine Learning Street Talk
Views: 5,356
Rating: undefined out of 5
Keywords:
Id: _eHRICHlg1k
Channel Id: undefined
Length: 108min 26sec (6506 seconds)
Published: Fri Apr 30 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.