What Cisco Machine Learning & Artificial Intelligence Can do for the Network with JP Vasseur

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning are you it's Adele you okay son GG better fellow and pretty excited to talk to you about what I think is going to be is very important for next years and what we can do in artificial intelligence neural network at the quick control identity school for 19 years but we working on the internet for you know cos and pls after around obstacles and then I move to the IOT for about a year and I run the project we did two situating machine learning for three years that we shipped last year and I think now driving a new project lady to machine learnings of a network in the cloud so you topic is not related to sing a part of the product with more of a technology so interacting in Italian what especially you know young opinions matter usually people are very opinionated about machine learning Joel so this way we can make it entertaining so let me start with quick with where I can withstand a I was born almost 40 years ago with the perceptual model for this very daily classifier and we haven't seen so many progress for quite some time and TL you know really the last decade when you look at what I see with there now there are a lot of things interesting happening self-driving cars and there's a lot of Technology in there related to machine learning you know we predict by the way we know now that the machine can can do outfit from human in doing the recruiting counting people machine talking infinite at GRA of gaming valuate pretty interesting to note that G blue with a chess game was interesting but not so reversionary at that time because it was a brute-force approach when you get a game of go where you've got so many combinations that you cannot as a brute force approach we started to use machine playing against each other and improving over time and that that was a you know pretty interesting step forward so this to capture how we can use these technologies in the context of networking so I go to that slide because that usually the first question that I've got what kind of machine learning algorithm you're using and interestingly enough and we've been working that for almost seven years now this is not the most relevant question you know first of all we are using a set of our rhythms is no one-size-fits-all there's a lot of birds around keep running these days and this is a super promising but you know we you cannot unfortunately use one our identity to the many many things so what is relevant though and I just wanted to share with you some lessons the veteran learned over the years is one to work on the clean data set and we know that in our networking you know there are usually a lot of signals and thing like that and for the machine when it's getting too noisy it's extremely Holika to interpret the second one is to really understand the the network so that you can tune an architect via rhythm in the right fashion if you provide too many signals to machine is very unlikely that you're going to get any interesting results and what we've been doing in lighting for many years now is actually to combine the expertise in machine learning with networking find out you know whatever right right signals that we're going to be using I'll give you some really cool examples by that the second one is about force positive so that one is also very interesting because in the area of security for example if you tend to race to me force alarms very quickly the user will not take that into account so your worst enemy is a false positive in other areas you know it might be the first negative and there's always a tension between the two because if you make the system too sensitive you will detect pretty much everything but with a lot of noise and lack of accuracy making the system pretty much irrelevant what this mess across that affect that one is particularly interesting you when you train a machine to do a specific task and you train it using one data set they might be an issue we relate to do what we call overfitting in other words the system would try to will get too close to a specific data set and when you use the same algorithm on a different data set you loose some general generalities you will you cannot really apply to different data set so there are many hours and to do that that this is a rational one a rational for being the cloud which is that we're going to see a lot of data from very diverse networks so which mission connected very powerful that one is really been interesting as well it's about injectable interoperability with some algorithm we can provide an output that they can tell you how the original state if you will and I could share with you many anecdotes but one of them is at some point we developed an algorithm that was trying to model the behavior of the hosts in the context of security and young Watson was released really strong at detecting anomaly that when writing an anomaly you couldn't tell you what was wrong so it was there was no way to interpret the result making the system not not very usable and she can you know each other's theater so here is where I think we stand they are usually two camps you know the prone machine I mean who think that everything is about machine learning and as a future another camp who think that is nothing good about machine learning the truth is of course in the middle and I actually don't believe that we should be using machine learning for everything there are many other approach that are much better depending on what we're trying to do that we know that machine can learn and I try to give you examples on how we can use that over network all right so here is how it started at that time as well as other ways in here UT and you guys are highly respected technologies though you know how difficult these environments are and what video that you see on the screen by the way the real video it was a two million node Network running ipv6 or smart grid and we had to face a number of issues like the lack of stability of links the constraints on the node no memory no CPU so we came up with a lot of different technologies nuke us for lossy loss the power nasty networks 1504 for example you know new routing protocols and server so the question that we have from a few of our customers is can you predict the delay in this network which was extremely hard to do because we cannot push the data the raw data in the cloud that's all because it was a lack of bandwidth and second there was the mathematical our incapable of downloading the delay and doing some predictability so that's where we started two rakat machine learning AI and how we can use these technologies adho network so this is video it started almost seven years ago so now what I like to do evacuee Germany any comments any and you're singing about machine learning zero no you do I just know right now luckily I'm a certain them district Karnataka mean we have to get there and we see the explosion of all these devices I t's being asked to do more with less right it's the only way that you're going to be able to get the kind of visibility that you need sorry I agree with you and you know one of our top executives asked me the question for the second project and machine learning he said what is the main outcome what I told me was I think that we've earned we're not to use machine learning because if you rush out you know we garner some tax problems machinery needs to be complicated so we need to know where what it can do what it cannot do but I agree we with using the massive amounts of data makes it really relevant for from any problems so let me give you a few examples the first one was a product that we shipped last year the name I think is that watch learning networks now and it used to be called the self learning network I drove this project over for three years it was about using machine learning to detect zero-day attacks in the network so the basic idea was we know that house I was i PS ideas technology work and they are very efficient but they're based on signatures so what do you do when you have zero day attacks by definition is no labels is no signatures so we thought well if we can start you know trying to do the opposite and model what is a normal deadline if you will and find out some anomalies that could be the interesting approach to take so here's the architecture at the high level so each node is effectively so on this hour which you see here could be a router or switch by the way the machine learning algorithm is running on prints on the bar and what he does is - first of all we have access to a lot of data which is where net rule that we can consume locally we can do the back inspection on DNS requests for example we are the logs and we are access to ton of data without exporting anything so now the way the machine works is we look at hundreds of dimensions could be the time of the day in order to detect new flows between everything before the destination you know for DNS requests name servers and many many like that and we looked at all these dimensions we build a mathematical model online and we try to make it robust enough so that when you see something quite unusual that could be because of a zero-day attack or data exfiltration for example we can detect it immediately on the router so if you have network with thousands routers you think if you have thousands of models running in town and each other we would basically send an anomaly to the data center where we aggregate all the anomalies been raised so what you found out of course is that in by the way so that to say that upon receiving the nominees we then have access to ice so that we can bring more context we have bright grid and open genus and many other tools so that we can add many more context to it of course the interesting aspect about this one is you need to use anomaly detection and the issue of anomaly detection is financially force positive and he raised too many false positives and at the end the user will not listen at all and it becomes very inefficient so we took a very different approach here we try to use what your hands on spent learning and so the way it works is very simple you basically nomally you give order the context and the user which has give a sometimes thumbs down we have without having to indicate what was the reason why it was a good anomaly it was a bad anomaly and then you know consequently what the system is going to do to learn from its own mistakes and don't do it again so just give an example if you keep saying that you don't want to see anomaly related to new flows or specific protocols the system will learn that these are the common traits each time you give a thumbs up it is because of the number of the things and then we filter out but you know programmatically the the false positive and it doesn't take a long time this is really the thing because if you need to have 1 million times that sums up that can work very well the last few thousands of feedback you see that the machine will automatically basically adapt and learn from its own mistake is that on a per user basis is that's a great point and so we do two things in this case so because it is on premise so in that case it's on a per customer basis and for each users we want also two different approaches because some users may have different views and other users so what you found out for example we looked at we compared to the results across multiple users and you see a lot of discrepancy some user thing that is kind of anomaly great and for number one not great so the system does that on a per user basis and we are looking to select ways to say to disqualify you know when when one user keep saying no or yes all the time and it becomes you know we see a lot of discrepancy between one user and the rest of the community the other users as well just to clarify that in rigid time with your users versus an organization getting by more an organizational level of times like yeah so so basically in that case again it is on prime so the run natural but you see yes one enemy manager per enterprise and yet one for users in the enterprise which is in contrast with the second use case at an chanel okay so let's have a look at is that new project will be communicating more in the next coming month about this initiative so here it's not related to security at all this is really for the network in general and this is a cloud based learning architecture so let me watch you will show you key use cases that I think are a prequel so we are doing two things here one is cognitive analytics and the second one is predictive analytics so I start with cognitive so the way I think that was you know last year so 2016 we looked at the experience we had acquired we in fiscal of that machine learning in the network and we thought what is a really nice use case where we can apply this technology and with the complexity increase in complexity of a network we thought you know we can certainly especially with the cloud-based approach why the cloud because we have this way you have access to a ton of data engine cranial models and you become much more much richer and interesting so one aspect is if you look at and I could give you dozens of examples but I think just example throughput and within a rule-based approach how where'd you put the threshold so that you can say ok the throughput given to next six is good or bad nobody knows what is a good throughput for a given application so we thought we can certainly make use a machine learning to learn what was expecting intimate throughput so in that particular case the data is being provided to the Machine the machine will start doing some anomaly detection you know and so that we can find out some issues so in that particular case here is how to read that these are access points on the y-axis and here you get the device type and so we said to the machine find out issues in this network for the Netflix application what you can see here is that Netflix application that is suffering regardless on the type of device for you know the fuel access point conversely you see that the you access point over there only provided that necklace burns to Microsoft Microsoft workstation images and you know of course we do that for all the applications and you see that the heat maps are all different which makes it very interesting we look as well at the given app across multiple data set and you see that they all vary as well showing back to the point that the amount of data is so vague that only machine can slice and dice in all directions now with because we powerful is when the machine can find the anomaly then do some root cause doing slice slice in night in medical direction and say we actually see that it's an issue with a given hour in a given location at a given time and it is only related to this release on the controller this type of client and at the given time for example and then you can all cause it and that are going to be much possible I could give you many other examples but that that's one of them the second one which to me is even more interesting for the future is can we start forecasting issues that will happen in the future and I mean you need I will you to be frank you know when we started to look at that it was a little like we're thinking what is that we then do a whole so when the issue is random there's no way we can forecast anything obviously that many cases we see that you've got some early sign that something is going to happen a few hours in a few hours and this is where machine engines and how because you usually massive amount of data to machine and machine we start learning some early signs that are not detectable by human and can start predicting what we have an integer I could give you two examples but as of today if I'd taken at the roaming area rate in the Y's Network several hours in advance with very high accuracy we can predict they will be canceling event and of course once you can predict and you can route code looking at correlation between all parameters you can start thinking of fixing the issue even before it happens so that there is a very promising area around so we talked a bit about very wise networks and again mobility patterns is another very fitting one where we have a machine learning hours in looking at all the paths within a network and find out the roamings layers and you know criticality and try to give an oath and all the things so we can use this kind of algorithm in many places we talked about application behaviors network with en series of course is 20 interesting can we assign some interesting patterns with regards nodes high error rate links labs reboot you know AP reboot crashes and ferrata and of course we be working quite a bit under one as well can we basically model the behavior internet the public internet sources providers can we potentially do proactive routine and start offloading you know some some links knowing what you say they should be and you know all the things so we are heavily working on these usages as well we talk a bit more about the codes of control in a you know in a few minutes I noticed on that last slide I didn't see anything security related up there and I know if you look at like what alice is doing they claim to be using predictive analytics is that machine learning or is that a separate group that's doing the talents predictive analytics so they are also using the origin machine learning in that case when I'm talking about closed loop control and predictive and find you and actually the term predictive is a little bit confusing we should probably say forecasting of issues in the future and I'm more referring to the ability to say well looking at this signals low end signals you know we see that initially that happen in two hours and then we can we can now fix it as well before this point happens edge okay so just a few you think here for you an example of cognitive analytics we talked a bit about the application throughput the key takeaway here which i think is interesting is that when you get throughput in a wired network initial might be of course on the outside but maybe the link is well obviously we all know that so what makes this approach very interesting in very compelling is that we give visibility end-to-end to VR rhythm so we look at the type of client we can do dynamic profiling using eyes for example or on the controller we are the inventing learning we get a lot of data that the type of device we of course look at their watch controller VAP will assess their one as well so once you give all these control points of a machine machine we start being a model using you know a lot of input control point and here it's a G's graph here of HEC which looks very much like an Instagram is that exactly mr. Rambis is the model that has been built by machines so to give you an example if you see X lines connected trying to do different applications what the machine will do in this case is to see what what is it expected throughput that we should see in this area of a network for this time of the day for this type of line and based on that we see that this big discrepancy is where we start detecting some some issues I could have given in an other example like a roaming cell rate and we did a lot of analysis and on many data set is very thing to see that you would expect the rate of you know roaming players to be fairly low it turns out to be very high and naturally because of a network sometimes it is the client so I remember one that I said I think we have about 45% affairs because of the client fortifies because of 12 because of network and the rest were pretty much unknown so we looked at that and then we said ok what we need to look at to first identify the anomaly and then the root cause and in that case you want to look at the the client early in the behavior of the client but also the network and finally the DHCP usually time out for example with a good example of root cause for fare when you're old and this is of course giving the probably the visibility of the RSM is extremely powerful and that's what we do with DNA if you question is is there a target so one of the things that the cognitive and legs takes time to build of the data set and run it right is there is there a defined sort of time that you enter targeting to say like 8 hours is actual 24 hours is actionable what is what is the other goal yeah itself it's great question so in that case with some of these systems as you pointed out what they do is we do one shot one shot learning if you will and so then in this case depending on the objective that some hours that may take a week or two weeks before they have enough data but in this case because we are in the cloud you know what we do is that in gather data whether it's attending features and files you name it we analyze the data using the cloud and this is an ongoing learning so it never stops so this is going to be hidden anima so it's going to be aggregate across all social customers I'm thinking about certain okay that's exactly the point and you know it's amazing to see the number if I take example of access point with the you know just adjacent customers you reach easily has a million access point and I give you you know what one example I was looking at the how many data points we collect on the network with eleven thousand access point which is fairly decent networks already but we looked at you know when you sum up all of these data looking at the counters we are if domain and also things it was about 150 millions data point in our that we are collecting and the volume of data is extremely small which is we comprise the imaginary thing but for machine learning goal line because you exposed to so much data and then we do that across customization set exactly so it's not only going to be distributed amongst I mean it's going to be distributed within organization but also within all devices and I mean it's the learning that the actual AI and machine learning the cognitive analysis is going to be distributed so as it's learned it's been distributed out so you know depends on the usage so in the first case which was about the university uses the technology of the attack data to write everything exactly local and they were stuffed me in the cloud in this image over example this is the opposite then in this case we gather data and then we push it we onion devise and we learn something you know in the car okay okay so let me show you another one we click which is about the dictum analytics forecasting as we said and that's why we're saying for observing all these control points then you can start learning a model and either one thing that you thing about this one is that we want to have an algorithm that gives you a confidence level and there's always the case sometimes you use the forecasting issue but you have no idea about the probability and so in this case it's a bit like this weather forecast although it's probably not a good example because we may we may be wrong sometimes nothing they gas back right image was sunny and one that exchanges that's what we are going to tell you I've got comments in the four out of five and to be honest it you just have to wait and you see what if it was right or wrong okay and we want to Germany any any comments about that or any feedback on whether you think this is a going to be promising technologies or future any that suffice what seems the only way to project like yours in earlier all the data points and you can't look if no one can look at the lane closed well I think we're becomes super valuable radical you talk about roaming issues right take a health care customer there's a lot of moving towards like smartphone based communications right now you should be able to learn from Apple Apple releases something new now all sudden we see you know whatever iOS whatever comes out now we're seeing a anomaly where a bunch of roaming errors come out right you can make a decision now as a business day we're going to wait stay on what we've got until that comes down and then tackle the upgrade right exactly and you know I was with a customer yesterday who was telling me that they had two hundred and fifty thousand devices and connecting when I work I you to devices as well you know and when you have such so many types of releases and devices having the ability to look into an ocean of data so here for example I showed you one axis which is the type of client the release and and all the things and then the access point we can have a look at it in a different way we can look at the universe eyeing for example I can I can give you another one which was which was pretty interesting we found out that it was an AP that used to perform really well so the machine now compared the eighty the performance of AP you know between now and what it used to do it could be between access point it could be between customers as well that's another request we had for many customers who said look I'm looking at the roaming failure rate it's about twenty percent you're good about we have no idea so with that solution we'll be able to compare the networks together and so we group them by similarity and providing an idea where you are compared to the other ones the question maybe is coming up by that so we see a lot of focus now on these the large you know the cloud based analytics trends all that stuff I think and which i think is the right spot okay be able to focus first how far you mine are we from these sort of things happening with you know sort of edge level compute analysis and then modifying let's say policy driven architecture right all right way so so for example it's interesting question because when we talk about the architecture we have to think a lot about where should we gather data where do we want to store the data the way to want to have a compute so for me it's more like a use case driven a decision so in the case of EIU tis no choice you know when you have 3G link and you look at how much of data you will need to push through a cloud there's no way into that so it has to be locally on the box and that's why we started with that for yeah I shall want for K for example and so in that case it shows we want another rationale for it by the way which is privacy and as you know in some countries is very strict and so the fact that we were processing the data locally or rather what we were writing to do now you know when you look at the wires or in general the one very very interesting to do that you know in certain centralized fashion so I don't think we're far we can do both actually quite frankly now so the saturation I was referring to is already available and in terms of the maybe I can give you some some numbers at the top of my head the the memory requirement was between 4 and 500 Meg's on the router doing the processing real-time by the way and we took less than 20 percenters you or one course and then a sophomore k so it was really likely so there's no constraints actually we can run in there now we're getting really interesting is a the issue of downside of it is that if you keep everything local you can't be you can't elaborate your knowledge across all the equipment and that's where the cloud becomes really interesting too this is going to be a little bit about staring where UCF I think so yeah so there was a question that came in across Twitter and I'm not quite sure I understand the full nuance of what he's asking but basically it looks like he's wondering if we can put seed data into the machine learning so if we have a custom enterprise application and we want to say ok here's 10 20 30 40 samples of what kind of fingerprint this applications normal operation has can we hand that into the machine learning and say you know keep an eye on things that vary off of that well right so that's only what we do to some extent so again the case of security we look at the traffic that goes because again because we are running on premise we have access to a traffic so the case of we do dpi on the dns request we look at the structure of a name for example and compute the entropy and the name and start to see this is something to be funny you know like we have a strain in a theater so we do some of that and we consume it here as we consume the data the knowledge is being uncaps elated in the ER with yourself so we no need to keep a ton of data because we don't have any resources locally but we actually learn we improve the model and then you can push the model in a centralized location that's one way to do it so essentially it's going to pick up the fingerprint itself a nice common traffic absolutely okay it's exactly ok thank you thank you so what's coming next many things first of all you know that that new cloud-based project I was referring to will we will you know come in soon we're working in the actual run laughs as we speak but it is extremely promising where a lot of customers we supply some interest and just test it and to be honest with you I'm not a big fan of slides and PowerPoint so when we come up in this technology I say well give me your data and test it and see what you can find out and it's quite amazing when you start to plug this engine we can't I wouldn't say that always right and it does everything obviously not there are cases where the rule-based system is actually much fairer as I said but it is super promising so if you think that I think are we promising one you did enter it and do end-to-end device communication we know that when you come in with your iPhone the iPhone has got a lot of intelligence locally but that you need to divisibility what is the best access point to connect to you can only take into account a few local parameters like the strength of a signal CI that but wouldn't be really smart if the this part universe cloud engine could communicate with the access point say oh I know you're trying to the video you're not connecting to the requital it's better Rome to an RIT especially when you can start doing forecasting and prediction so that's one one thing are UT is another one because again so in that case is no intelligence locally because what with some intelligence obviously but we cannot afford to run too many protocols and so relying on the network to provide this knowledge for Vig is extremely interesting all right so the last one it's a really bad closed-loop control and that may take some time because it with a time take time before Yury trust the system to say okay you can detect some issues you can build a model you can suggest some changes and now you guys will automatically close the loop so we see that you know you know there will be multiple phases obviously where we're going to get inside the customer we're going to propose some changes and then you can close with a PCM with an API and just start doing your changes so that's something that that we take time but I believe that you know that's going to be really a very promising Avenue I know that we're running out of time but you have a we probably have time for a few more questions I guess one more okay it's a similar question in a previous presentation about closing that loop and trusting the system just to be more direct is that a matter of the customer trusting the system with the decisions or is that a matter of Cisco getting enough customers to have an updated to pipe into the system to trust the system and enable the feature right so yeah it's a good question I think that's a little bit of both we want to give a control to the user so that the user can say hey first of all let me see what you guys can do so if the best example is the I can give two examples one was in the case of security attack what you can do is they you do time detecting the anomaly the user may say what are you suggesting to do and if we suggest to enable some QoS and or start storing the data and you say okay you can do that you click on it and you start to see the result if you want to block traffic at some point the user may say ok I trust you I see that for these kind of issues you are pretty good at detecting it and do the right thing so you can increase the confidence level so point when say if you see it again next time just do it and so we leave it to the user to decide however to automate the loop and he spent on the process in the organization as well some some organizations we don't like to close the loop automatically over would like to react more quickly in the case of while these are last things we can do if we see that we can tune up the power of an access point in order to to make it a bit better we can be we can activate some Q s again to the condemnation control to stop some some some users and so I think that the on our end I can tell you we we asked quite confident about the ability of such a system to do closed loop control but we think that is going to be a question of adoption for the user to say ok we transit system to do something and it it will be driven like to your point about by policy as well so at some point you can afford quality and say ok I've seen that I trust you you can do that go in like make sense yes ok cool all right so we're gonna have to wrap after you know you can reach me out JP VHS come online offline and we have some chat together thank you very much thank you thank you
Info
Channel: Tech Field Day
Views: 3,665
Rating: 4.6842103 out of 5
Keywords: Tech Field Day, TFD, Tech Field Day Extra, TFDx, Cisco Live US, Cisco Live, Cisco Live US 2017, CLUS17, Cisco, Machine learning, artificial intelligence, AI, 0-day attacks, cloud-based machine learning, JP Vasseur
Id: Jb8U1BrJlXo
Channel Id: undefined
Length: 35min 33sec (2133 seconds)
Published: Wed Jul 05 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.