Attacking Malware with Adversarial Machine Learning, w/ Edward Raff - #529

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] all right everyone i am here with edward raff edward is a chief scientist at booze allen hamilton edward welcome to the twiml ai podcast thank you so much for having me really excited to be here looking forward to jumping into our conversation you lead a machine learning research group at bah uh what does a machine learning research group at bah focus on yeah so it's a little different for us compared to many other organizations that have research teams because we are a consulting firm so our business model is basically based on renting out people's brains and you have some hard problem you want to solve so we need to have smart people who can work on these hard problems so we sort of view research both as a way to sort of let people know like hey we actually work on some really cool things that require this level of difficulty and thought and challenge but also as a way to train staff like uh one of the things that uh we've been doing a lot of work around is adversarial machine learning where there's you have your model that you've developed and you've put blood sweat and tears into it and it's your baby and you want it to work well and go into production and there may be some uh nefarious actor out there who wants to subvert your model uh so like a lot of my work is in uh malware analysis and malware detection where the malware author like actively wants to subvert the model they don't want to be detected as malware they want to be able to run uninhibited so if there is someone who's going to try and sort of mess with your model and try to make it produce errors how do you prevent that or how do you even accurately quantify what's happening and there there's no course that you can sign up for yet on adversarial machine learning in school that's not part of anyone's curriculum yet so if we want people of that skill set we sort of have to grow them organically and research is one of the ways to do that and to build people up with these really uh deep technical skills that we need but are not sort of off the shelf yet yeah yeah yeah i'm glad you brought up adversarial ml because i know that's an area of personal interest for you and that's one of the things that i wanted to dig into with you as a way to get that conversation going it's actually been quite a while since i've had someone on the show focused on you know this intersection of ai and cyber security which uh has you know continued to be of interest uh particularly as we see more and more activity let's say in the cyber security realm um yep can you give us an overview or uh you know what's your what's your kind of temperature read your take on the space and and where it is and how it's evolved over the past couple of years yeah i mean it's it's interesting it's if you go back to like the first like applications of machine learning for malware detection uh it goes back to like 1985. um people have been looking at this intersection for a long time and in many ways it's still nascent despite having been a problem and been something that like people were really looking at for a long time in part because it's so different from normal data like okay convolutional neural networks and deep learning have come in and eaten everyone's lunch at image classification at natural language processing signal analysis all these different problems but all these problems have some underlying similarity of like things near each other are related to each other the words that i'm saying right now like they have meaning based on their order and the things that i say tomorrow really have no relationship to what i'm saying today there's this correlation in time pixels in an image if you look at this sort of pixel on my shirt and it's blue and you look at the pixels around it they're probably going to be a very similar shade of blue there's a spatial correlation uh that's true for signals as well malware that does exist but it's so much more complex than that you've got uh the sort of arbitrary system designed by humans that is like instruction code and assembly and how that gets converted into like literal zeros and ones and uh how the compiler is optimizing the layout of the code and which functions get put together and not and inlining and just a huge amount of complexity that is very different from what people are working on it's a very different scale a single executable like if you go download a new browser that's like 30 megabytes would be a really small browser download but if you look at that as one data point that's a massive data point like that's huge like most people the data point you're working on is maybe like a kilobyte at like the most um we're talking about 30 plus megabytes is like pretty like normal thing to occur uh i mean if you're looking at your you've got some data set of software programs and you've trained uh malware detector on software programs your browser would be like a feature a 30 megabyte feature that you're trying to do inference or classification against is that what your is that what you said about that it's like a variable size feature and like maybe there's an image in it and maybe there's multiple images embedded inside of it maybe there's a word doc embedded inside of it there's like every other file format could be embedded inside of this one file format and it's describing like code this like thing that could do anything of arbitrary complexity like that's what turing completeness is and we have proofs on like how you can't know what any arbitrary code is going to do without running it well you don't want to run it because it might be malware so there's there's just sort of this huge explosion of complexity as you dig into all these details and machine learning in the broader communal sense has not really develop the tools to deal with data that's so weird and so different in all these unique ways so there's a lot of just sort of unsolved machine learning problems at this intersection and that also creates lots of avenues for attack at this intersection because the malware author they don't need to abide by the rules that's part of the whole point is they're trying to break the rules if there's a spec that says oh you don't set this flag in the executable because it'll behave poorly well if that helps the malware author they're going to do it they don't care yeah so it's it's it's a very rich and interesting area i think from both like a machine learning math side of like how do we get the math to work with these new kinds of complexities these new kinds of relationships but also just from like the low level really technical side of like uh i could use some undocumented instruction to try and make my malware work that like oh yeah this only runs on new cpus that have this undocumented instruction because it causes some weird side effect and like i've seen things on like research that gets done on a like spectre-based malware where like the malware only works uh in the like pre-fetching of the cpu trying to pre-fetch code and data so that if you run it on like the wrong type of cpu or like in a virtualized environment it won't prefetch the same way and the malware won't run so you sort of hide what's actually happening and if you just look at the code the code doesn't make it obvious what's going on because the malicious intent is hidden to the prefetching of logic so just a a huge richness of complexity that that makes it interesting and fun and malware is you know just one element or part of this broader kind of cyber security um patchwork or you know set of problems problem domain um is it representative do you think of the other aspects in in that the other aspects of cyber security kind of suffer from the the same kinds of problems or um are they all kind of unique in their own ways uh i'm i'm definitely i'm a machine learning man first and uh learned about cyber as like an application area uh or malware specifically so i i don't want to speak too authoritatively but i think a lot of these have their own sort of unique snowflake problems especially around data collection and building a data set that is again sort of like unique to this problem space in general where if you want to get your data labeled um it's really easy to go like for images like you can a toddler can label images literally like you you could you could set up that app and they'd put it on the ipad they would do it for hours um no special training required to be like this is a cat this is a dog like okay here's this is arbitrary executable tell me if it's malicious or not or like what kind of map or family it's from uh uh-huh that's a huge amount of work um or like if here's some pcapp data here's some network traffic data like is anything weird going on on this network i would just stare at you i don't know maybe what do you think the answer is like it's it's very technically deep and complex and each each avenue sort of has its own unique problems one of the ways that you've taken on this malware problem is uh recent paper focus on adversarial transfer attacks um can you talk a little bit about that paper and the problem that it's seeking to address yeah so this is a papers adversarial transfer attacks with unknown data and class overlap that a couple of people on our team have worked on uh luke richards was the lead author on it some great work we also had one of our collaborators from uh nvidia and uh umbc this paper we were really looking at a core adversarial machine learning problem which is been motivated by a lot of work recently but you have some model that you want to defend and we'll call you the victim because someone's going to try and attack you and so there's some adversary that wants to attack the victim model and they know that the model's there they know you're going to do it but they don't have access to the model specifically uh so quote unquote black box attack black box it is black box and it's a specific type of black box attack that we sort of generally call transfer learning because what the adversary does is they build their own model that sort of does the same task and they attack their own model and the assumption is well i built a good model you're doing the same thing these attacks should probably work on your model and the research that's been done to date sort of says yeah that does work and we were thinking about this and thinking about it from like okay if we were doing this in really real life does this match reality and maybe maybe sometimes but oftentimes not because what a lot of this work has done has sort of said i know you're working on this problem i know you have a model to detect malware because you're the government and malware's coming out at you or you're a large bank or everyone has malware detectors you're supposed to that's part of just part of the game or you can you can imagine sort of like many different tasks you would expect like the government should be doing this like that would make sense uh and if you're an adversary you just want to like mess with people you're a bad actor and so you build your own model but you're not probably going to have the exact same data that the victim does uh and you might not know like you'll you know you know you you can guess what they're gonna do but you don't know exactly how they've set it up like there could be reasonable uh design choices i'm like well what are my classes exactly and where is that line so if you're in that situation where you don't have access to their model you're probably not going to have a perfect match for their data or how they've designed the class structure and it sounds like a lot of the prior work assumed that the adversary was operating from the same playbook so to speak it was using the same data and or the same classes the exact same data and the exact same exact same data and exact same classes guys like unrealistically optimistic um which like if you can defend against like an omnipotent adversary good for you congrats but i want to know what's going to happen in real life so that's for the motivation and i'm not claiming ours is perfect to real life but i think it's closer but so we started building some tests to sort of vary like from zero percent overlap in the data to like 100 and how much how much is that data overlap really a factor in the transfer success rate and also overlapping the number of classes that are in common so you have like all the exact same classes as each other to you have like two classes in common so like a very small attack attack surface there we spoke a little earlier about the difficulty of kind of defining the data in this base as a feature a an executable or is it something else in this particular case and the example that you used in this paper what was a a piece of data feature and what did the labels look like and this we were focusing just on the the adversarial machine learning part so we picked easy data that everyone has access to so we just did image data uh like cifr 10 safar 100 mini imagenet and the the classes were the normal classes like cat dog car truck okay deer and frog i think are also glasses in there um we're just focusing on the the adversarial part for for this one uh all the complexity of malware is just like that's too much extra complexity right now right you're you're basically trying you're hey let's step back take a simple model and just test this fundamental assumption about transferability with different data and classes yeah something that i often uh pro probably one of the best skills that i try to like pass on to like employees and students is like okay like you have this complex goal just figure out some way to cut this down into different chunks because you can't eat the whole sandwich it's too big forget about an elephant you can't even eat the sandwich it's too big you got to cut something off uh so we cut off this part uh so we're using just image data as convolutional neural networks that everyone sort of like feels comfortable with you're going to understand what the results mean um and when we set this up uh we saw that both less class overlap and less uh data overlap both would hurt the attacker's success rate which makes sense but we also saw some odd behaviors that it wasn't sort of as consistent as you would expect it wasn't sort of like the smooth degradation down like it would get worse and it would like start to get better again but randomly um and as we started like some kind of generalization property kicking into effect or something that's that's part of what i think it is because what really became very interesting is uh from just sort of the base results of of that behavior we think a lot of it is like it once you sort of reach a minimum sort of threshold of like lack of similarity uh there's a lot more randomness that comes into play that the model just is not trained to like expect in any possible way and now it's making errors maybe not because the adversary did such a good job crafting the example but the data is just so different from what the model itself understands because the amount of overlap has decreased so much and the the really sort of scary part is when it comes to like okay you want to defend your model what defenses work currently the the overall best defense is adversarial training it's been the best defense basically since it was introduced in like 2017 2018 and it's pretty simple strategy of like okay you have your model you train it and now as you're training attack your own model then feed the attacked inputs back into the training and just sort of be doing that continuously so you're constantly training the model to be better at correctly classifying attacked data points and our results showed if you do this in this more realistic scenario adversarial training actually weakens the defender wait what yeah the attack success rate increased on models that had been adversarially trained i i think what's happening there what we what we sort of believe is going on is when you're doing adversarial training you're in a way over fitting to a very specific adversary an adversary that has the exact same classes and data you do uh but when the adversary has different classes their attacks are going to naturally go in different directions that your model has just never optimized for because it it sort of started from an initial condition and it got better at that and it's sort of just converging on whatever initial path worked best and so it's sort of overlearned to the sort of unrealistic scenario and these new items are tightly coupled like the the data is the same because it's it's baked into the process of creating the model exactly and so the attacks seem to transfer more successfully if the victim has done adversarial training in this sort of imperfect knowledge scenario which if you're actually trying to build a robust model for real-life production like that's actually a huge concern now because you say okay i'm going to keep my model private i'm going to try and sort of mitigate as much information that the adversary could acquire by sort of keeping this close hold uh and you want to do all the best things okay i'm going to do adversarial training like oh actually if you believe that's the correct threat model you might not want to do that it might actually make you more susceptible rather than less interesting can we maybe take a a second to kind of punch into the degree to which adversarial training and a lot of these you know techniques around adversarial ml generally the way the degree to which those are you know practical concerns and implemented by people that are building actual models versus academic thought exercises that people really aren't thinking about when they're putting models into production at this point what's your read on that from where you sit i think on average it's more academic for most real most most real world like usage of it is more academic like people don't necessarily a reason to really believe they're under attack so a lot of people motivate this with like self-driving cars someone's going to trick the self-driving car into plowing through a stop sign uh which like i don't want that to be a thing that can happen to my future self-driving car that's true uh i'm sure maybe some horrible person out there like tries to screw with them but like in general it's not like a very well motivated threat like who's going around just trying to destroy every self-driving car um it's like it's important i think it's more important in like the academic sense of like trying to figure out how to make models robust to just like errors in general than it is there's an actual adversary trying to trick you and a lot of times a lot of the research that gets done becomes sort of somewhat cartoonish and like the amount of in information that they give the theory makes it like why would they do it this way if they were that powerful like i remember seeing some work on uh like adversarial attacks on on uh medical imaging and they'll they'll adjust the medical image and the to make the ai models would give you the wrong prescriptions and the wrong drugs and the wrong medication to like hurt your health and if they have like that much control to like get all this data and like access them like why just flip a bit in the database change the record in your medical record like that it seems unlike if they're that powerful there's so much easier things they could have done right so sort of the goal with like how we came up with this work to begin with was like well in a realistic scenario what's it actually going to look like and for malware it is a really like true true-to-life realistic problem that people do deal with like day-to-day um which is part of why i like doing work in this space is it's it's not academic it is like this is really happening meaning not the the the broad existence of motivated adversaries but the specific application of adversarial attacks to in kind of the malware world by meaning a male 1 i'm out there you know there are existing malware detectors that are based on machine learning models and there are people out there that are trying to deploy adversarial attacks against those models and there are people in that world that are building adversarial robustness into those models like that's all real and ex extent uh today yeah that that that is all real to varying degrees today it's more complex in the malware space because an adversarial attack in the malware space can happen earlier in the process so because again like these they're so executables are so complex you can sort of mess with the executable itself to maybe screw up the way that the antivirus processes the features to begin with so you can change like the whole floor out from under the the model that you're trying to fool uh meaning like like obfuscation of the the malware code within the broader executable or something different that's one of many many possible ways so you could uh there's a thing called packing which is like let me put my executable inside of another executable so you're sort of you have to figure out how to peel this onion to figure out what's going on underneath or like if dynamic analysis which is when you do try to run the malware in order to get features a lot of malware will just like initial like step of the malware is just like wait 24 hours because you're probably not going to run this dynamic analysis for 24 hours you're going to run it for like three minutes tops so they're just going to try and like outweigh you um and so like you you might have features that like oh does it call this api does it call the crypto functions that's that might be a good sign of ransomware does it call the file delete functions like okay that's a really good sign of ransomware it has both of those if they just wait until the clock runs out then you never see the features to begin with so there's more complex and interesting ways that the malware adversary can sort of mess with the model uh and uh i i can go on like lots of fun tangents um my favorite one that i've seen which is not i've never seen it actually used by maurer but it's more it's just fun that like this is possible uh there is this project that people released called the mavfo skater skater mav like mov intel move instructions yeah it's an instruction for people who aren't aware that's like this is the assembly code that moves data from one location in memory to another location in memory okay and so you might say like please move like please move this memory from like please move this data from like memory into this register because i'm going to do some work on it or from register like back to disk to save it or whatever uh there's one instruction has so many side effects that is actually turning complete so you can compile any program into one that contains only the move instruction ostensibly it looks like this program only moves data and never actually does anything with it but it has so many weird side effects that you can actually and it works you can recompile any program to contain only one single instruction wow because the program itself is just putting bits into memory locations and executing them yep right and there's there's enough special like side effect cases on this one instruction that that's the only instruction you technically need to be able to do anything on an x86 computer wow you see weird stuff like this with malware so often it's just like yeah anything you can think of like i'm sure there's some way someone could get around this and then it becomes a numbers game i'm like how many more machines am i protecting how many more cases am i covering and am i sort of making any new like gaping holes that i need to address because you're not going to get it all but you got to that doesn't mean you sit around and go like well i can't solve it perfectly so i might as well not do it no you build the best solution you can you sort of get it out there you try and fix what you can and see how do they adapt what what happens next and sort of this continual back and forth of you taking a step in the adversary taking a step so so malware you know is a place where these kinds of things are happening today um and we kind of got to that from talking about the um adversarial or adversarial training adversarial robustness uh kind of incorporating into trading and that having an adverse effect on robustness um is there a solution to that we're working on it uh i'm not out of a job yet though thank you it's it's sort of a double-edged sword uh if we could i mean if like the the gut reaction of everyone is like these adversarial attacks are definitely a bad thing we don't want them to exist we got to get rid of them uh which yes it is definitely a bad thing uh but it's in the context of how things are being used if someone's using a machine learning model in a bad way then being able to subvert that model now actually becomes a defense a good thing so if someone is using the machine learning model to like create like a surveillance like state or something uh like okay that we're not comfortable with that we don't like that there was actually there was a recent paper i had this idea years ago but didn't do it i get no credit for just having a nice idea but uh it was really interesting paper that got published uh or put online at least on uh using like adversarial attacks to figure out how to put on makeup in such a way that you look normal but the machine learning model will just be like yeah there's no one there so i'm sort of giving you a way to sort of get some of your privacy back you don't want to be tracked or looked at so these are things that um they're complex from many perspectives they're not just from a technical perspective but from like an ethics perspective on like what is and is not an okay use of machine learning how do we want to deploy these things these are the things that we often have to work about and think about and be conscious of but it also does mean that adversarial attacks and defenses are like whichever way it goes we we ultimately can or cannot defend against these attacks also means we ultimately can or cannot subvert people using machine learning for inappropriate and unethical use cases yeah so that's something that i uh think about a lot the main idea is that hey we don't uh you know the research that's happening here isn't really considering a real-world scenario your paper considered a real-world scenario and you found that um in general the less overlap um the more difficult an attack is but in a kind of weird way where a weird non-linear way and in a weird non-linear way and in addition we could from from the attacker's perspective we could restore some more predictable behavior to the attacker uh so if we sort of simulated uh attacking a model with unknown classes or sort of imperfect classes by sort of like randomly masking them out when we generate the attacks so we're like each time we attack it we're going to randomly pretend some of these classes don't exist and don't count so maybe you're trying to maybe the model's first gut instinct is to convert the model from predicting truck to car you say well we've masked out car that's not an option so you're not getting credit for that you have to do something else if we do that we restore a lot of behavior that makes more sense so it it sort of removes the variance of uh class overlap from the attacker's success rate so rewind this for me this is when you're training your target model you are doing what so when we're when we're generating the attacks oh generating the attacks okay to sort of transfer to the victim so the adversary has their own what we call a surrogate model they built their own model that they think hopefully matches what you've done and they're basically going to perturb their own surrogate model in order to simulate that they don't know exactly what you're doing and that the attacker pays a penalty in terms of a tax success rate in order to do that but what they gain is sort of certainty about their attacks success rate if that makes sense so in the normal situation maybe the average success rate of their attack would have been 40 but they're not sure that it's actually 40 they're well maybe it's between like 15 and 60 and i can't really tell what exactly it is but then if you do this modified attack they're pretty sure that it's somewhere between like 30 and 35 so it's lower but they they actually know what they're going to get so kind of greater bias less variance yep exactly um and on the the adversarial training um the adversarial robustness training um are you or are you aware of folks that are like how would you go about trying to um like reformulate that problem in light of kind of the real world thing like you know i'm thinking of hey is there an analogy to drop out that's like class dropout where you kind of forget about classes and the adversarial loop and like does that help like are folks working on that specific problem not not the class dropout thing the the broader problem we actually tried the class dropout thing and unfortunately so we tried that one okay it didn't mark then didn't work uh we were very sad about that but uh no i i don't know what the answer is yet uh something that i want to look at and part of the problem with these experiments where they were hugely expensive to run because instead of sort of attacking one model for one data set we have to attack like 50 models for one data set because we have to vary the class overlap and the data overlap every single time and then want to run it multiple times for each combination because you can pick different classes each time that might bias the results so it turns into this computational explosion of model training and attacking so something that we want i want to look at but i think we might need to figure out something more intelligent first is these approaches that try to sort of build provably robust models from the onset where you can sort of think of it like your model makes a prediction and it's sort of making a prediction about like a single data point like one point comes in and you get one answer and what they try to do is they try to build models this is one approach anyway there's multiple but they try to basically classify instead of one data point like a region so you sort of have like the data point is the center and there's like a sphere around it and you're saying everything in this sphere gets the same answer and if you can do that then you sort of you're provably robust to attacking that data point for a certain sized radius and then as you train you try to sort of increase the size of that sphere that you can do so it spheres start out sort of infinitely small really just the data point and you try and push them wider and wider as you go and that means you're becoming more robust theoretically that would work better uh but i'm not sure uh in theory there's no difference between theory and practice but in practice there is uh does the it sounds like that has strong implications on the the type of model you're using like um you know maybe even as far as not a neural network model or can you incorporate that kind of technique into a neural network formulation you you can for a neural network it's it's really expensive so it's sort of like if i wanted to do that for these experiments i'm going to increase the compute time by like at least another factor of 10. or we took us like four or five months to run all these experiments wow uh it was like all right uh 40 50 months so okay maybe maybe we'll come right away but yeah no i think i think in terms of real world like when you're really thinking through do i need to be concerned about an adversarial attack am i an entity that is likely to be attacked if i'm a government if i'm a bank uh if i'm a large enough corporation uh then yeah you you are likely to be targeted and attacked then okay what what are the models that i train that are sort of the most likely to be attacked or that people are going to try and subvert so like if you're a credit card company like you have fraud detection models yeah that people are trying to subvert all the time so okay i have this model like i have these models that i then identify these are the models at risk and now let's really think through like the specifics of the threat model for this particular thing not the sort of academic abstract i can apply this to any problem kind of thing but what do we know about the domain that we can use to build a robust model for this specific problem that's what i've had the most success doing anyway that's the way that i generally try to help like our clients approach these kind of problems is to really focus in on that scope and narrow down like where where do we actually need to do this let's not panic and think that everything's under attack all the time because that's not realistic and you're going to give yourself a heart attack and we've done that we've done some of that for malware for specific kinds of malware models and we've done some of that for a computer vision before and gotten a lot of success uh much more quickly with that approach and as you say like part of that can be like well maybe you shouldn't use a neural network maybe you should be using a linear model and sort of carefully crafting your your features to make a linear model work well or maybe a shallow decision tree or a random force or something would be better because you can sort of understand to what degree can this really be attacked and what's that envelope and design around it so like part of how we did that for some of our malware work was uh if we if we go back to like the example of dynamic analysis where if you're actually going to see the thing run let's just assume it's not perfect i don't claim perfection uh but if you say okay i'm actually gonna see the thing run i'm gonna see it uh delete the read in the files encrypt them write them out delete the original files i know this is malware and for it to be ransomware it has to like actually have those steps if you build a model that can only sort of normal models look for things that both indicate like one or the other so i look for things that tell me it's ransomware i look for things that tell me it's not ransomware that's a design flaw in this scenario i shouldn't be looking for things that tell me it's not ransomware because if it does those sort of finite things that qualify it's ransomware and what malware will do is or what what they can do is just insert lots of random other benign activity and be like oh look at all these benign things i do that way that outweighs all these malicious things i did and a normal machine learning model would go yeah that's right you did do more benign things than malicious things you are benign no that doesn't make sense at all if you do anything malicious it's malicious and by default everything is benign it's benign until you do something uh bad so we can incorporate that into the model that there are no features that contribute to a score of benign that that's just sort of the default and then you have to do enough sort of malicious indicators for the model to change its decision and be like no no no you're actually malicious you don't get to run anymore and is the the implication then that the models that tend to be used in the space are kind of very heavily hierarchical or ensemble or something like that where you're identifying you've got modules that are identifying specific features or characteristics and then kind of bubbling that up like microsoft has has published uh some of their like strategy for how they handle uh windows defender and trying to protect computers and they they've published they have this whole hierarchical uh strategy of like okay here's the super fast model and sort of uh it can handle a lot of things and for the things that can't like bumping that up a stage to something more complex it's going to do something more sophisticated and i forget how many stages it had in it but like the end stage was sort of like okay we're actually going to like run this in like a dynamic environment in the cloud to try and figure out what this is doing so you yeah people definitely do that of this sort of building this hierarchy of speed and complexity trade-offs because you can't afford to run everything through dynamic analysis all the time like that would just be running all computers but again like there's not enough computers could be running all the programs that we want to run to make sure the computers can run them it doesn't make sense right right right right yeah when you were talking about dynamic analysis i imagined it to be uh well maybe not dynamic okay i imagine it to be like this design time tool like hey we collect all these samples of things in the wild and like we run them through this dynamic analysis thing and like maybe we're creating labels or something but it sounds like from this microsoft example it's like no you try to run this app and it says hey hold on i'm gonna you know put this in the malware detector in the cloud and i'll be back to you shortly yep i i i remember one of my colleagues who's uh he's a malware analyst uh with a lot of experience he was telling me about how like for the for the analysts who are just doing their job like as part of a manual process to figure out what things are they would have a big sort of uh virtual machine cluster to run things and try and observe them and see what they're doing and uh sometimes there there is malware that has what's called a vm escape where there's a bug in the the vm and it is possible for the the code to recognize it's being run inside a virtual machine and leave the virtual machine and infect the host that it's on and occasionally that would happen and they would just be like all right burn everything down like yeah this is complex now just light everything on fire we build again from scratch this is the only way to make sure it's clean right right that that might be a slightly hyperbolic description but it's it's the the complexity is just so great it's never ending yeah awesome awesome um what what kind of future directions you know as if we haven't already talked about enough of them are you excited about in this i'm excited about uh graph neural networks they've been getting more traction in the past two years uh that i think is good uh some of it's been a little contentious on like relating to a theme of uh reproducibility that's also been picking up on like okay graph neural networks were very nascent a few years ago and then there was sort of like a small like kickoff of people iterating and publishing oh here's my new fancy this and then a lot of people are saying like oh actually all this is wrong and really you just didn't tune things correctly so there's still a lot of sort of working things out there i think uh in part because like the complexity of dealing with graphs like it's now a much more arbitrary connectivity and how much this depends on the specific kinds of graphs you were looking at looking at versus someone else's graphs but i think a graph structure really is going to be the most at least from like a machine learning modeling perspective the most ideal way to model a lot of malware you can sort of create this graph of like which code parts connect to which other code parts you can have features on the nodes of the graph but also features on the edges of the graph so you can also have if there's a chunk of the executable that you couldn't disassemble you couldn't figure out what's there you could still have that as part of the model with some features sort of indicating that like this is connected here somehow but we're not sure what's actually happening at the node um i like failed to parse correctly so i think that's something that is probably the right direction to be headed in but is not yet fast enough and scalable enough for the malware sort of use case where like our data points are like the size of other people's data sets um like i i one of the data sets that uh i work with regularly the largest file in the data set is like 200 megabytes which is like all of sif r100 is 200 megabytes like literally this one data point is the size of this data set everyone's using it it's interesting that you provided that context about graph graphical models graph nets as applied to this space i'm pretty sure one of my first interviews around cyber was on graph stuff and i did not realize that there was a bit of a graph neural networks and security winter that happened i i'm not sure it's uh so much as a winter as uh the the graph neural networks four malware just like hasn't germinated yet in the first place okay so the graph neural networks and the machine in the machine learning land people have done graph based things for malware a lot before but just like normal graphs not neural network based ah got it so i i can see like the room for worlds to collide but they haven't collided yet got it very cool very cool well edward it was wonderful chatting and learning a bit about what you're up to in this space thanks so much for joining us yeah thank you again for having me it was a really fun conversation and i really enjoyed it absolutely

Info

Channel: The TWIML AI Podcast with Sam Charrington

Views: 233

Rating: undefined out of 5

Keywords: TWiML & AI, Podcast, Tech, Technology, ML, AI, Machine Learning, Artificial Intelligence, Sam Charrington, data, science, computer science, deep learning, edward raff, adversarial machine learning, cybersecurity, security, graph neural networks, booz allen hamilton, transfer adversarial attacks, adversarial attacks, self driving cars, autonomous vehicles, black box, malware, malware analysis

Id: 5VPXo-MmVtk

Channel Id: undefined

Length: 50min 23sec (3023 seconds)

Published: Thu Oct 21 2021