The Deep Learning Revolution: A Cybersecurity Use Case

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello everyone now when I was in school my teacher used to complain to my parents that I don't come to the school and they were right I was spending most of my time in chess tournaments a Chess Club that was what I was doing but when by the age of 17 I didn't become a grandmaster the highest level in chess I figured out I'm too old to ever have any realistic chance at becoming world chess champion so what did I do in Israel we have a saying if you don't get accepted to become a pilot you go to anti-aircraft missile division if I don't fly nobody flies so I said I will develop a chess program that would defeat the world champion that's how my passion for AI started and it has been the center of my life for the past twenty years three years ago when we thought of bringing AI to address a real-world serious problem cybersecurity was an obvious case we have more than 350,000 new malicious files every single day huge amount of attacks just proliferating in an exponential pace the kind of damages data leaks the most respectable companies in the world or victims to this kind of data leaks each attack tens of millions of Records there is not a single person in this room that their data has not been leaked multiple times at least and is this something that the CEO the big companies don't know of course they know of course they are concerned about that if you look at the list of things that they are concerned about cybersecurity is much higher than many other important things so they are taking measures to prevent attacks to try to mitigate these risks but it's more difficult than that let's look at the trajectory of the attacks in the good old 1990s we will concern about viruses then in early 2000s while the botnets in the aisle of service what are we gonna do but look what's happening today an exponential growth in number of attacks in families of attacks and in the sophistication of the attacks and the currently existing solutions that the antivirus is most of us are using are ineffective they don't detect most of these files they're good for detecting existing files so how do these new attacks look like let me show you a very vicious kind of attack so that you will get a better intuitive understanding of that not so vicious please meet my dog his name is tyrion tyrion lanister this is tyrion de the cavalier this is how he greets me every morning this how I wake up and I instantaneously recognizing now I also recognize him when he's under the blanket when he's cuddling with his toys and I just see some part of him now imagine a world in which I show a picture of a dog then I modified just a few pixels and nobody here would recognize the dog sounds ridiculous right this is how the world of cyber security looks like nearly all of those 350,000 malicious files every single day of very small mutations over existing ones like a few pixels on image but existing solutions don't manage to detect them of course we have more advanced solutions but more or less all of them are following a reactive approach well there's a new infection millions of data records leaked a huge amount of damage and then we stumble upon the file we manually analyze it extract features eristic s-- update the signatures and then small mutation all of that all over again the most advanced kind of cybersecurity solutions use the most advanced kind of AI machine learning well you know many people today use these terms interchangeably nearly every company's AI first I'm here to meet a company which is AI second we're all AI first they're all doing machine learning deep learning and to the extent that they're using this term interchangeably so let's pause for a moment and just make an order in the glossary and what these words mean AI artificial intelligence is a very wide umbrella term anything that exhibits intelligence is artificial intelligence it doesn't have to do any learning IBM's deep blue who defeated Garry Kasparov in chess in 1997 that was a great example of AI without any kind of learning for just one giant calculator under a I we have a certain sub domain which is machine learning the difference being that here the machine learns by itself rather than being explicitly programmed within machine learning we have many different sub domains one of them happens to be neural networks or deep neural networks also known as deep learning until a few years ago neural networks was considered a completely refuted field of research that no serious researcher would get into that that's why I got into that but there is a big conceptual difference between deep learning and these other forms of traditional machine learning let's see what's the difference and how it would apply to cyber security let's take as an example the problem of face recognition now many of you would recognize the face here let's first use traditional machine learning to do that in traditional machine learning we cannot feed the raw data in this case there are pixel values directly into the machine learning module we cannot do that so what can we do what we must do instead bring a domain expert in this case it would be an image processing expert they would analyze the problem and tell us what are the most important properties the features here affair is known as feature extraction for example here the most important features would be lets say distance between pupils distance between nose and mouth proportions of the face texture color etc thus we convert the raw data into a vector features and then feed that into our machine learning module wherever we use traditional machine learning these are the steps for cybersecurity take an executable file decide whether the important function calls vector of features and then machine learning module but there are two big problems during this kind of machine learning first by converting the rich raw data millions of pixels in this example into a small list of features as good as they may be we're throwing away most of the data that we have we're looking just partial data but secondly and even more importantly we humans even the best experts amongst us we're terrible at doing feature extraction we're not good at articulating the features important properties let's even a much easier intuitive example to explain that now everyone here I guess is the world great expert in detecting cats and dogs in images I see if you experts over there if I show a picture of a cat or a dog in a few milliseconds you will tell me if it's a cat or a dog and you will have near 100% accuracy so who can explain to me what is the difference between a cat and a dog well I will not waste your time that's why I have students at university what people usually say well a dog is bigger yeah that's a big good feature a dog is usually on the ground the cat is on your table making a mess with your papers tell that to my dog this is how I found in the other day on my desk and he's smaller than a cat he thinks he's a cat it may very well be a cat or at some point after I waste my students time too much they say oh there's a difference the pupils are different the pupils of dog or more circular those of the cat are more elliptic finally a good feature you recognize the cat and dog here I hope and now even though you see the picture you still have difficulty explaining that this thing about our brain we immediately learn something we're good at that but we're not good at the process of converting that knowledge to features articulating those things so what's great about deep learning it's the first and currently the only subfield within machine learning that can skip feature extraction using deep neural networks we would not do any feature extraction but we just feed wrong pixel values directly into our deep neural network and by creating a deep hierarchy of features that it learns by itself it will do much much better and the result obtained by deep learning the past few years have been the greatest leap in performance in the history of artificial intelligence for many kind of benchmarks where we were used to sink half a percent 1% improvement here suddenly we see 2030 percent improvement in vision for many benchmarks which we have 70 percent accuracy now we stand 97 98 percent accuracy sometimes even surpassing the human accuracy the greatest revolution we observed sorry if the results are so good and they are amazing for vision for speech for text why not use it for cybersecurity to detect the latest malware we've got a few problems here the screen is OK it's working fine I bet you're saying random data here this is not random this is how it looks if you take an executable file and you look at some parts of it the raw bytes as if they were an image it looks random to all of us here but it's full of patterns the patterns are just not local we don't see local correlations like in images like in speech so any kind of neural net that uses local correlations cannot be applied here and have additional challenges for example in addition to the fact that we cannot use standard deep learning models the input varies substantially in size what if we have two images of different sizes we just resize them to a predetermined fixed size what if we have a file which is 100 kilobyte 100 megabyte or 100 gigabyte we cannot just resize that and the file formats are different you can be infected by executable file PDF file Office file different file formats so these are really challenging problems that cannot be really not apply traditional standard deep learning models to them so what can we do many people look at deep learning like lamb and you just robbed the lamp the genie comes after lamps there oh my master what's your wish give me your training data we'll feed them into tensorflow and then here is your model it will work well it's not really like that let's see what we need to make it work anyway first of all we need lots of data actually huge amount of data in this case cybersecurity luckily we have everyday 300,000 data we can quickly grow data sets of hundreds of millions of malicious and legitimate files very good at that we need deep learning experts still a hard resource to find the best deep learning people do who know they feel intimately enough to be able to modify the algorithms for this field let's have it covered in that as well and the easiest part we need the deep learning framework a software library that allows us to write deep learning models them to program them and then run it is the easiest thing right let's do Allen this for a moment so first of all the good thing about the deep learning frameworks until a few years ago if you wanted to write deep neural network you would need to write it low level usually and videos CUDA code the closest to the hardware because today for training deep learning GPU is the only relevant hardware and the if you look at the universe of people who have high level research expertise and those who know low level programming these two worlds merely don't intersect but today you don't need to do any low level programming you can use any of these great software libraries tensorflow by google pi torch kara's all the others you will just write your high level code and it will automatically be translated to low level code and it would work great this amazing breakthrough is a good news but beside the good we also have the bad these frameworks are developed for research not product ization there are many limitations for product ization there are mostly deployed and cloud do did many dependencies that they haven't that is problematic you put it on the cloud because of the specific hardware that they need lots of memory that they need even for inference after he trained him for continuous prediction sometimes we would like immediate reaction by the time I send your suspicious file to the cloud and the result comes back half your network is infected sometimes you don't have continuous connection to the cloud the costs are too high and it's very challenging to deploy on edge devices on your mobile laptop desktop because of the need for huge amount of processing power huge amount of memory that typically don't have on your edge devices these are some of the big challenges that's the bad what comes after the good and the bad the ugly the ugliest thing about deep neural these deep learning frameworks is that expose high-level building blocks you can imagine for example tensorflow pi torch or all the frameworks they give you Lego bricks building blocks that you can then put them together in infinite ingenious ways and obtain amazing results for vision for speech for text but for some problems you must modify the core algorithms to make inherent modifications inside these Lego bricks and if you need to do that you're completely out of luck you cannot do that with these frameworks so knowing this how can we adapt deep learning algorithms for cybersecurity first let's look at the research part forget about the implementation we need to modify a neural networks so that they could take raw binary file as an input support different file formats and automatically discover the patterns whether it's an executable file a PDF or office file we must modify them at the input layers of neural networks so that they could absorb far various file sizes small size big size automatically change that and they can discover the non-local correlations in the data it took us about a year to make these modifications and the moment you do that well next step is to implement it all these publicly available friend works don't help so we must implement all of that in low level directly and the hardware and efficient GPU implementation write it on CUDA optimize it for inference on edge to then protect the devices that we would like to protect so in all its in over a year's work but let's see if it's worth it if after all of these long process we make deep learning modification would it really help would it increase the detection rate so we have the deep learning framework we have the data let's get to work how do we train the deep neural net well that's in the in the laboratory you have hundreds of millions of malicious files hundreds of millions of legitimate files we feed all of them to our deep neural network running on multiple GPUs training over over again until it's better and better at distinguishing malicious illegitimate files different file formats different operating systems when the training is over can take this brain and use it in inference mode in prediction no longer trains or it's frozen every file that you fit into the brain it instantaneously provides you with a prediction if it's malicious or legitimate this isn't instantaneous because you don't have any feature extraction any dynamic analysis just feed the raw byte values in a few milliseconds you have a prediction for whether it's a file is malicious or legitimate and the results the same kind of improvement that we observed in vision speech or text 20 to 30 percent improvement in most benchmarks of new malicious files we see the deep learning obtains more than 99% accuracy and detecting the files while traditional machine learning based solutions barely go over 60 or 70 percent detection for this kind of files but there's not a problem that we need to solve here explanation neural networks as black boxes are notorious for being inexplicable so what people say well this is a file it is malicious where humans what's the next question we ask why why is this file malicious and the answer is nobody knows I don't know it's a big neural net it's black box but what can we do to provide an explanation we can train a second neural network that this time provides classification given a malicious file classified whether it's a ransomware a backdoor a dropper a virus again another end-to-end neural net and if it thinks it's 96% ransomware 3% virus can expose that as an explanation it's very similar it's analogous to having one neural network detect whether there is a dog in an image and a second neural network if the first one detected it's a dog to classify what kind of breed that dog is so to summarize the benefits of deep learning for cyber security we see that it allows us to do real-time detection and prevention because it's instantaneous it can stop the attack before it starts it can perform real-time classification what kind of malware it is and the best and most important part is it can detect new threats those new mutations of news families that render all current available solutions useless and because deep learning doesn't rely on any features or any specific characteristics of file format or operating system it can work across any operating system any device any file format and finally can be connectionless if you manage to make the neural net so compact that it resides on your mobile phone and your laptop on your desktop then it will provide protection even if it's completely disconnected let's take a concrete use case not petia the worst malware ever last year it infected some of the biggest companies in the world mares the largest shipping company in the world FedEx and many others 20% of the world shipping was dead in the water one out of every five ships in the world was a complete standstill due to the smaller more than a billion dollars of damages and at the same time the damage occurred any device that was protected by the deep learning based model was not infected because a deep learning model immediately recognized that this seemingly new attack was just an evolution of a pre-existing concept and it stopped that despite these amazing results there are still many challenges that we have and we're going to face we need to use deep learning for unsupervised learning anomaly detection for what we've done so far we've taken data sets of files for which we do have labeled this is malicious this is legitimate and feed them into the brain it's like giving data set and telling this is a cat this is a dog trying to separate them but for many kind of real-world problems we don't have training data anomaly detection that's a very important thing that we need to do with unsupervised deep learning but what's really scary is advanced malware that will also use deep learning this is going to happen next few years we already see that in research this kind of malware will be much more difficult to detect there would be self-evolving they will choose that their target not randomly like all of them are doing today but much more intelligently and the damage would be by orders of magnitude more devastating that what we say today this is not a speculation this is a definite future every technology will be used for good and for bad and to prevent this scary future we must accelerate the pace of adoption to bring amazing results we've seen AI from research to product is Asian what do we need for that what are the main things not only for cyber but general for deep learning my firm belief is that to bring the amazing result of deep learning within research to mass scale productization we must improve edge deployment edge inference exactly like yongsan mentioned before all the devices that you have all the IOT is drones cameras mobile laptop we need to implement deep learning and put it on inference on those kind of devices what do we need to do that not just for cybersecurity by the way for vision speech text all this kind of amazing results for healthcare that's one of the most important things we need to bring it on edge devices for inference we need two main components first of all we need faster and more efficient processing not necessarily hardware we're assuming that for better faster more efficient processing we need faster Hardware not necessarily there may be an algorithmic solution to this as well not necessarily hardware if we look at our brain for example we're operating at 20 watts a single GPU is operating at over 300 watts what does it mean it means that most probably there are more efficient versions of the algorithms were using to be able to deploy it on slower Hardware as well how about memory consumption today's neural networks are based on dense representations a dense multi dimensional matrix called tensor that's why the name tensor flow comes from these all dense our brain is extremely sparse in our nail cortex the cognitive part of the brain we have 16 billion neurons each one of them is connected to only about 10,000 on average it's very complex and efficient maybe we can make deep learning algorithms much more compact and efficient such that it could deploy them well I would like to show you the most important deep learning paper ever published learning representations by back propagating errors every deep learning algorithm you use for vision for speech for text for cybersecurity you're using this algorithm the back propagation algorithm for training the neural net for gradually updating the synapses of the neural net until it trains and this paper was co-authored by by Geoffrey Hinton for the past thirty years he relentlessly focused on neural nets his entire life was devoted to the research of neural networks were nearly everyone was convinced that is a futile attempt and nothing good will come out of neural networks and he is by the way the reason why I got into neural networks I really believed in what he believed in and he is a co-author of this most important paper what do you think he wrote this paper and by the way he's also one of the main figures behind the current Renaissance of neural networks the current deep learning one did he write it two years ago five years ago three years ago 32 years ago 1986 the most important deep learning algorithm which every deep learning today is using was written 32 years ago is it possible that nobody can improve upon this algorithm we cannot make it more efficient let's see what Geoffrey himself thinks about that when asked he said my view throw it all away and start again maybe not throw it all away and start again but at least we must find some major improvements on top of it at the beginning I told you about my passion 20 years ago it was chess and computer chess this is my passion today spend my time focusing on this kind of core research to making neural networks much sparser much more compact to be able to instead of hundreds of megabytes of memory use tens of megabytes of memory so the instead of having large dedicated hardware for inference we can put them on edge devices and get closer to the vision that years from now we could employ neural networks on each and every device in the world thank you very much [Music] you
Info
Channel: Samsung Catalyst
Views: 3,109
Rating: 4.8805971 out of 5
Keywords: artificial intelligence, AI, Technology, CEO Summit, Samsung, Start-up, Investments, Innovation
Id: pqkxyhhPZYw
Channel Id: undefined
Length: 24min 55sec (1495 seconds)
Published: Mon Oct 15 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.