Juniper Marvis: The Journey to an AI-Driven Enterprise

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning everyone and welcome my name is uh sudhir mata i'm the vp of products here at juniper mist ai driven enterprise literally that is our name and it is my absolute honor and pleasure to be here we are stoked uh to be part of ai field day and thank you steven and the tech field day team for launching this this is near and dear to our heart what you're going to need to hear today is actually candidly a lot of the same stuff we've told you over the last four years and we're proud of that in fact some of these slides we presented three years ago and we're super proud of that because we had a clear vision and truly a conviction of purpose that we can bring ai to networking and bring them to deliver business outcomes and so a lot of what you're going to hear from uh bob g shang myself and others today is is how is ai impacting our customers lots and lots of customer stories there today but without further ado let me introduce truly our heart our soul um and our visionary um and and day one he drew this on the whiteboard and led us to uh to being the leader in the enterprise networking space bob friday the cto co-founder of mist and now the cto of juniper's ai driven enterprise and right after him uh and i'll introduce g singing as well bob if you want to say quick quick hello hello to dear hello everyone thank you it's great to be back and this year says you know this is one of the passions and one of my passions so it's just great to be here yeah and let me introduce very quickly thank you bob uh very quickly our header data science ji xing ji shin wang uh ji xing is uh is amazing you know for a data scientist and and a team of data scientists um a big team of data scientists behind him they are intently focused on customer outcomes and so that's what we're going to talk a lot about today but do you sing a quick hello from you uh good morning everyone and the real glad to be here to share some of the really the real world ai experience we have awesome uh without further ado thank you uh bob and g xing let's dive into it this is a familiar slide for most of the uh the tech field day audience and and we've been at this for quite some time bob i'm going to turn over to this to you yeah i mean for some of those who know us right you know when sujan and i started miss back in 2014 one of those inspirations was really watson playing jeopardy right you know that's when we kind of saw that this ai stuff was becoming real and could do useful stuff you know and that's when we kind of got to if they could build something that can play championship level jeopardy you know we should be able to build something that can really answer questions and manage networks on par with network domain experts you know and that journey really started with data right and that's one of the reasons why we decided to build our own access point because the questions we were trying to answer were the hard ones right why is someone not able to connect why is someone having a poor internet experience and we wanted to make sure we could actually get the data to answer that question so that's where the journey starts you know as sadira mentioned this slide has been up here for three or four years now and the reason it hasn't changed because the mission has not changed what has changed is that our expanse of data has changed right you know as we join the juniper team we're starting to expand our data digestion across the whole enterprise portfolio right we're ingesting data from aps switches routers even the sdk the client itself we're starting to bring that client's view of the network back into marvis you know and what does that really allow us to do that really allows marvis to answer more questions with more granularity you know when we start to bring that router into marvis that lets us start to answer questions about why is the application having user so that is where the journey is started the journey has not changed the journey is just getting broader and deeper as we go along now when we move on to the ai primitives right i think what you first saw out of us was our soe metrics right and what was unique about that was compared to what i did 20 years ago at airspace right where we're trying to help enterprise manage these uh access points is really what's changed now it's really all about the user you know 20 years ago we were not pulling data back about the user we're pulling data back about the access point now now what's changed is we are actually pulling back every user minute from the access points view of the world right and so that data there is being aggregated over time and space from ap floor site building and over five ten twenty minute weekly energy and what that primitive related is set up for is really to allow the data science like mutual information now distract insights out of that data now interestingly what turns out is that once you get the data in the format to apply data science it's actually a format that actually an it person right the same format that marvis needs to do with that data is really the same format that your average i.t person appreciates and that's what we learned we actually did it is once they had that visibility once they had that data in the cloud for the first time they didn't have to go back to their access points of their network to get the data the data was at their fingertips you know the iep it departments now had the data they needed to help answer the questions even if marvis didn't have it the other key part is data science this is changing the game here we know what data science has done and all the other things right whether it's driving cars being diagnosed in the medical world in our networking world these same algorithms are starting to actually let us get to very little false positive anomaly detection so this data science and this deep learning stuff is making a difference it's making a difference inside of our industry right now and then as you can kind of see we'll get into the virtual assistant and the conversational interface and that's really about you know whether it's a person or a virtual ai assistant you have to learn to trust this assistant right just as you would a new hire employee right and that's why the conversational interface is important because it's really what allows you to extract data from the network makes it easier for you to get access to that data quicker and that's what we're starting to see in our support you know marvis even if it doesn't have a correct answer it gets you to the data quicker that you need to get your access it gives you the interface and lets you actually get to learn to trust your assistant and it actually provides the interface that allows you to give the assistant feedback you know in terms of making corrections right just like you would a person yeah when marvis makes a mistake you let marvis know and it will not make that mistake again and so and this all leads to really the beginnings of proactive right this is the ultimate dream we want to get to where you can actually start trusting marvis to take over some of the more median actions inside your network whether it's finding a bad cable or proactively finding that rma these are things that you do not need your it time spending hours and hours on quick question on this i think this is a very cool picture and you said that like three or four years ago you were able to predict many of these things that you have been doing so has there been any surprises then uh related to data anything that you couldn't predict during this journey yeah i would say during the journey of to be honest you know when i write you know if you look at the watson that was the same thing as about a four or five year watson to play championship uh level of jeopardy um so far the journey we're probably up to 60 70 level championship level you know every ticket that comes into this right now we basically let marvis try to answer it first you know so so far i think we're on track right know compared to what i saw watson do we're on track to get the championship level jeopardy in the next year or two right so so far i would say the only surprise if i've seen any surprises you know to be honest i haven't seen any i think we're just you know it's a long journey right it's a multi-year journey that's why you keep seeing the same slide right that's something you're going to get done in a year uh katie maybe if i could chime in um um one surprise probably should not have been a surprise for us is um the impact ai had if you uh if you asked me to guess you know let's take a a university we've deployed in a university about um you know 7000 aps and if you asked me hey sudhir could you guess what percentage of their frontline tickets ai and yourself driving can take down you know candidly i probably would have said maybe 20 30 whatever consistently across all verticals truly we're seeing when when we deploy this ai driven system we're seeing north of 60 70 reduction in tickets coming into help desk not even you know tickets on how long it takes to troubleshoot the uh you know a ticket that came in i couldn't it's obviously a a positive you know turnout of this but that impact and the consistency of it across verticals says that a lot of our customers suffer from pretty much the same problems right you know the network happens to you you know because ios didn't update or android didn't update and and you think you run the network you really don't because the devices have the brain of their own right and so it's that learning and the impact we're having on customers it is incredible uh the stats are um and so uh so that's that's something a positive surprise uh i'd say i would say the other surprise katie there's some people three or four years ago who didn't believe in ai they're now starting to see the value of it give us a little bit more context on this 67 number and the reduction of calls into the help desk one of the things as a uh hr manager or people manager is that you're looking at the skill needed to operate that second third level and filter out the you know those first two levels of calls it's a completely different skill set uh from a first level tech to a third level tech our third level tech still getting first and second level calls meaning that i'm spending my tires or inefficiently using those resources right right so so keith um there i put them in three things when we put a uh a miss juniper network in there are three things that are unequivocally happening number one is and i'll talk about to you specifically to your question how how the hell do we get that kind of efficiency in just tickets coming in that's number one number two so first is we are reducing tickets coming in number two we're actually dramatically cutting the mean time to repair on tickets that do come in right uh and you know uh one customer they're deploying our network in 130 countries their average support ticket resolution time went for from 48 hours per ticket that's their average to two hours per ticket as an average that's sort of a 96 drop so first is fewer tickets second is faster time to resolution and really uh um i think and then the third is you know one of the things that is not necessarily ai but it's probably part of the system here is um one of the uh service providers one of the top four service providers in the u.s uh they did a deployment with us about um you know you know 12 000 and now up to 31 000 aps in the first 12 000 aps they found that they had a 50x you know acceleration in how fast they could deploy nothing to do with ai keep but they used to deploy one store every two nights because of controller and image and blah blah blah and they went to 25 stores a night with us and said this is the fastest deployment they've ever done in their technology retail technology rollout history right so those three things but let me answer your first question and i know i'm not eating up time on this slide uh jeff is already twitching saying hey man get off but but it's very important how do we get fewer tickets right you know what gives here's how we believe and that that is why this is so important we have a system it's a living breathing elastic scalable self-driving system that's running in the cloud and that's why this is so fundamentally different than anybody that took a network management in the cloud and renamed it and suddenly uh now has ai um our system because it's it's it's a you know dynamic baselining it it's looking at anomalies and vr so if you look at a lot of the problems in the in the field that are intrinsic to our system we're fixing them by virtue of uh you know learning and anomaly detection and literally real time without it intervention meaning if you have a software process got stuck if you have you know a hardware chip got stuck your beacons are not going out your multicast is getting dropped your um your ap is not you know transmitting packets to ethernet or vice versa all of those types of things we quickly detect and you're going to go we're going to go into this section i attribute fewer tickets to self-driving and we're going to discuss that throughout the day today and that's uh you know of course you know uh you know it's we've built a great network and and you know part of that is that and then the if the if it's not self-driving then the ability to proactively find users having problems and and get them to a better place i'm going to show you a demo of that keith um that one of our customers uh servicenow actually you know they taught us a whole new schema of how to get them better right so this is the framework five phases right data ai primitives algorithms to uh to dynamic baseline and and and adapt and react a conversational interface and self-driving networks you talked about uh reducing the mttr time um so obviously the uh the aiops uh mttr classicness has two parts to it one is the mtti meantime to eradication the other one is the resolution the identification obviously consists of you know even correlation noise reduction and you know somehow figuring out based on the incident this happened therefore are you reducing majority of that or the second portion of that is mttr which is more about runbook automation self-healing networks or or you know um self-driving network as you call or or you do both of that if so i'd love to found a find out about the second portion of that if you really do that andy there's a there's a full 30-minute conversation on the second portion self-driving that's coming up and and but we do both of that yeah absolutely and and so we'll we'll speak to that great question thank you yes sir good morning i do have a question regarding to uh marvis so i'm new to juniper so maybe this is a frequently asked question but it's so if a customer buys hundreds of devices is there one marvelous or are they multiple marvelous and and and so i'm trying to understand how the ai architecture kind of works right is the device autonomous from an ai perspective or does it rely on something in the cloud somewhere where marvis is a single entity controlling all the devices yeah so so um i'll take a stab and bob i'll i'll let you uh chime in as well um but we run uh marvis is our ai engine and and we have a conversational interface to it uh and natural language processing interface to it but it is looking at the entire organization but as bob's going to speak to this in one slide you know it this it's it's all about data and so um you know we're we're going to lay out into specifically how we are you know uniquely per user permanent data and that data that's coming into marvis is able to assert the experience of that user but it's marvis is the ai engine that's running you know and looking at the entire picture including it from a user to an ap to a site to a complete organization across you know several countries if that's the case bob anything else you want to add i mean i think the simple answer marvis is your single aiop solution across juniper enterprise and you just data from every you know you know what question we're trying to answer we don't ingest data from any source to answer a question okay so what happens when a server who is connected to the one is disconnected from the one it can't reach marvis how how does the the isolated networks which then communicates it's it's it's uh it's issues or questions to the to the end user great question we we actually store a lot of that data uh you know in the uh some of that data uh transient in the uh the network systems itself and and and the connection gets restored we upload it if it's a prolonged outage yes you will lose uh some data but for the most part in this day and age uh you know our customers we don't actually consume a lot of data we actually have a lot of customers that run on you know actually we have a customer that runs on 512 kilobit lines that's the entire uh van link to their uh to their stores and we still you know provide all of this ai we're talking about but uh yes uh marvis runs in the cloud and there's some data entropy that is in the in the device itself but it's if it's a prolonged outage then there is a gap all right i think there's okay but interestingly things like you know if an access point gets disconnected from the internet somehow you know we've got mechanisms like bluetooth we'll have that access point transmits its status via bluetooth to its neighbor to help get its status back to the cloud so there's redundancy built into the system trying to overcome internet breaks if that makes sense yeah okay thank you okay i'll do this like quickly sit here so we can move on uh so for those who know me i also make wine and my spam make a barrel of wine a year you know and so my famous thing is you know great wine starts with great grapes great ai starts with great data uh when we actually started mist one of the things that's unique at missed compared to what i did back in my airspace phase 20 years ago was back then we sent synchronous data back right you know we send data back every minute every whatever every five minutes or so this time around we send both asynchronous data back synchronous and asynchrony right and that asynchronous data is really the user state if there's ever a change in the user state that access point will send that information back to the back end right and this is really where the beginnings of the paradigm shift from i'm not here to help you manage an access point a router or switch i'm really here at mist this time around to help you manage the end-to-end client to cloud connectivity you know in frederick to your question right yes i need to get data back to the cloud the first step and that mission is to make sure i you know for marvis to work i have to have that data back in the cloud the data cannot be stuck in an apr access point you know so that's probably the point of this slide here is really user state right the focus here is really understanding the state of the user or the state of that device connected into the network hey bob is there do you have any stats on how much data is coming back to marvis uh yeah so right now every access point looks like a skype call depending depending on exactly how many clients are connected to that access point you know it looks like maybe six to 10k bits per second of data coming back continuously coming back to the cloud right now awesome bob just a clarification you mentioned six to 10k bits per access point is that per day or is that per that's continuous it's um so continuous like per second so actually it's it is it is right now um we on average we send two kilobits per second two to three kilobits per second per access point right and so if you have you know 10 access points or five access points in a retail store multiply that linearly and that's it's kilobits per second all right thank you i have i don't i don't know juniper well either from an architecture perspective but you keep saying the cloud is that um a public cloud like where does this the data where is it put into its data lake yeah so it's it's uh we we are in in in the public cloud uh missed runs is a in a as a cloud uh sas service um we have clouds uh um around the world um uh but yes it is um it is uh in the public cloud and we we offer the product so just so uh gina just so you know um we are basically selling access points um switches routers um security appliances all streaming data to our public cloud and um and then um you know all of this ai comes out of that data lake that we're building makes sense yeah awesome so so let's actually start with you know what does this data mean and and how does this fundamentally differ from you know there are vendors that are going to follow us here that's going to say hey you know more data more accounts more users more of everything is all good you know our assertion is it's actually the quality of data there's a harvard business review article right you know um more more garbage is not good for ai you know it's actually the precision and quality of data that that separates uh you know of us from almost everybody in the industry so what is this singular sorry if i can sorry what type of data do you gather did you yeah so i'm going to show you in just a minute thank you yeah thank you uh so what's the singular focus for us when we say data every user every minute um uh um lisa i think so what we're doing is every user literally every user every minute we're gathering data around was that user minute a good experience minute or was that a bad experience minute so every user every minute imagine you connected at your home right now or you know our largest customer now has you know a hundred thousand access points uh being deployed about 80 000 in production right now and their average client count simultaneously connected client count just in that one organization is about 400 000 users right 400 000 devices in one organization and every one of those 400 000 devices on a minute-by-minute basis we're saying hey was your coverage experience good was your capacity experience good was your throughput experience good were you able to connect fine if you were connecting during that minute so every user every minute we're carrying you know several hundreds of state vectors into the cloud for that user and that that's foundation on us you know in in the machine learning models the features that we can use to classify if that user is having a good or bad experience it starts with this granularity and quality of data lisa does did i answer your question yes thank you yeah thank you yeah so so that's what we're using basically we're taking every individual user's experience every single minute whether it cares about that user experience or not we're collecting collating and curating it and this forms the basis of all the ai right you know a lot of other companies use snmp polling and archaic old stuff they don't have this kind of every user every minute data and that's what sort of separates us now once we have this kind of data then you know you go in and say what primitives can we land such that we can measure are we meeting the users experience vectors right so bob i'm going to quickly have you chime in here yeah so thanks so we'll go back to that primitive going so this is an example where we're starting to expand for marvis we're starting to ingest data from switches and routers so when you look at the switch we want to do the exact same thing we did on the access point right we want to aggregate data up from the user experience and what this really allows us to do on the switch now is now like the vlan use case right we currently can actually look at hp cisco switches through lldp so we have visibility into those switches what we have with juniper now is the ability to actually do something with that data right so now marvis has more granularity in fact i get data from switches so i have more visibility if i'm having a switch problem or a vlan problem but with the juniper switch now we can also start to take action on that data so this is kind of an example of where that primitive framework that same framework that we used on the access point we're starting to extend it across our switches across our routers and then when you get to ginseng we'll talk about how we're taking those events from the different events and putting them into graph for temporal correlations so this is the power of the framework right we're applying the same framework and that's why you see the same graph year on year right the same graph we've been showing you the same framework is being applied to switches routers and devices also a curiosity question i i get that you're able to measure the network that's under your control but with the with the work from home concept now um the almost the entire b2b or the business network has moved to the home network now are you able to granularly measure even the consumer side of the network yeah i think what you mean you'll i think you saw if you've seen the news right juniper just acquired a uh sd-wan router company 128 a software rant company so i think if you look the home is a good example the home is becoming the new micro branch of the enterprise right right and actually it's kind of a good example of you know where i t now's needs complete visibility right so ai is becoming even more important you know in what we're looking at you know what enterprises are starting to do they're starting to stick enterprise class devices in your home right you know like when i'm at home right now it's like yes you know i have to be i have to be worried right now whether or not my broadband internet's going to stay up right what's going to happen if also sudden it goes down right i almost need i need an sd-wan at my house right i've got my hot spot phone right next to me for emergencies ideally i want my network to take care of that home so yes we're starting to work with people around you know taking software-defined routers like 128 sticking in the home and invest data from that piece of the point of view so when something goes wrong it can proactively take care of it i t has fun it wants the same visibility in the house that they had in that office got it thank you yeah and andy actually uh what what we're starting to see like i'll give you a very simple example i know all of us example have examples like this uh we have a hospital that took all of the agents that you know call insurance companies and get payments and call patients and get payments they moved all of them out of the hospital converted that into uh you know caregiving facility and these people are never coming back right they're they're they're staying at home collecting insurance payments and stuff like that so quickly they found out holy cow we can't support their experience on comcast and and whatever home broadband and home ap they have so they're now giving out you know missed aps at home first it's an enterprise grade uh you know wi-fi device but most importantly they're able to now bring marvis home for all of these users to be able to answer the question hey when i joined the zoom call i'm having this issue when i'm on teams calls my call drops whatever it is they're leveraging marvis to be able to answer that enterprise at home is a real thing and we're seeing massive as you saw in jeff's numbers uh you know you know throughout this experience even though corporate deployments have slowed down our business has actually you know far exceeded pre-covered projections because of this ai at home enterprise at home movement as well yeah i get that what happens is you know in when when people were moved from from the offices all the remote locations into going into home you know they thought it's temporary and then it has become permanent now so every company large enterprise that i'm speaking they realize that you know it's a lot cheaper to set them up in the home office so that becomes their office and the future going into so that was my reason i asked you a question you have a visibility but then by by providing the equipment which is a lot cheaper than providing an office space that's that's easier to deal with i get it thank you yeah it's far more supportable so dear on on that though you know let's talk about the opposite end of that which is um i don't necessarily want that corporate infrastructure into my home because i either because of trust or policies that might be put on my home network and i have more than just corporate information corporate data riding on that network how guess two questions one is are you finding that those kind of work from home deployments are trying to deploy corporate policy and then the second is how do you address that or even coach your clients or customers to address those kinds of issues yeah in in general i think as the world is moving to sort of um you know zero trust and um and ztna and security yeah you know the perimeters of security perimeters are being shattered the enterprise is coming home tim right i think you know that's a given uh it's it's happening for us what every customer we've spoken to uh talked about is how do we get visibility and supportability at the same level especially like andy said if these guys and girls are going to permanently stay home just to basically bring this right so you know there are there are an incredible number of companies um that are continuing to you know we all think of of work as what all of us go into an office but there are healthcare institutions hybrid institutions you know robotic you know distribution facilities you know our business has gone through the roof in fact deployments are accelerating even within the enterprise i mean not the traditional carpeted enterprise everywhere else right and so one of the things that we found as we jumped into this is is you know there are only two types of users in any network literally only two types of users right you know users that connect on the wi-fi network or use the connect on the access switching network that's it so we started our journey with um with the wi-fi piece but now what we've added this is actually the deployment at the juniper global corporate headquarters we've added visibility to the wired network why is this critical with iot with sensors with everything that is coming at us our customers said hey guys you know we love the wi-fi visibility you brought can you bring it's the same veracity and efficacy of data to the wired networks so what we did is literally you know we launched this in juniper about nine months ago now and this number this successful connex number was 24 on our global corporate headquarters and it said no chance in hell you're you're telling me 76 of the time you know devices that are trying to connect on my wired network are not able to connect we said well data doesn't lie so turns out they found three types of devices on the wired network number one devices that never belonged on the network so they were able to quickly isolate identify remove them number two devices that were previously working but because someone pushed some config change and suddenly that port got blocked but a thermostat doesn't open a support ticket an iot device doesn't have uh um you know doesn't scream at us so that device has just been disconnected and languishing right and i have a question sorry go ahead uh i was thinking uh so you talk about about the aps do you connect to other aps like other vendors do you have a centralized management for them as well yeah so listen great question um uh so today what what marvis does is we do take data from uh third-party systems such as you know you know switches that are not connected to um uh to to mist we actually learn a lot of the switching uh parameters why are the ap is using lldp um we we do a lot of ai and we're gonna we're gonna talk about that in you know just observing data we do a lot of you know analytics about dhcp servers radius servers adjacent servers to the network without talking to them we we make you know positive assertions about you know how are they working and how are they functional so yes we do bring a system-wide view um how and and we do take stream data from a few third-party systems today that's one of the things that is expanding within our ai engine is actually streaming data from um more third-party systems okay because i was thinking about the investment as well if you wanted to get started absolutely absolutely um typically people start with one building uh then they see the impact right away and then it sort of organically grows but um but yeah you'll uh you're absolutely right you don't have to fully upgrade everything day one to get to this nirvana can i control which data is being sent to the cloud because from security perspective i might not want to send specific ldp or radius information or attributes to the cloud for data peter great question uh great question um so so today um the data that's going to the cloud is just 100 metadata only right it has no single data packet actually ever goes to the cloud so it is truly something that you as an it engineer if i had a sniffer sitting next to you and i captured the air you know whatever i would capture is the data that we're capturing right so for the most part from a security perspective we obviously we have fortune 10 companies running this and we go through a lot of security audits and inspection around this on a daily weekly basis um it's all just metadata no physical data packets ever go to the cloud yeah you mentioned that you started out deploying the the missed ai solution at juniper network and found that only 24 of the devices were actually connecting to their wired network and then over time you were able to get that up to 97 or something like that is that what you said yeah so so actually what it is uh ray was um so basically every connection attempt we would rank that as an attempt and we would say was that a successful attempt or a failed attempt so 24 of the attempts were successful and which is scary 76 were not and and and you know what what was happening was there were a ton of devices that were previously connected that due to over time config change whatever it's that simple visibility that didn't exist juniper actually our own headquarters we had three you know we had solar winds and several other monitoring tools that never gave them this kind of visibility and yeah actually today it's not even 97 they brought it to a hundred percent over the last nine months uh um this is actually as of yesterday night uh this is a hundred percent um that this is by the way this represents the health of 22 000 ethernet ports in in uh juniper sunnyvale campus that's you know ai doesn't have to be complex you know it's our job to distill it make it easy to consume but a lot of ai goes into understanding false positives eliminating false positives so we'll we'll go through this ray all right thanks i wanted to follow up on peter's question about security and because i don't think the data is necessarily the only thing to worry about with security if you're collecting metadata metadata and you're sending it through an ai engine that means you can have an application that turns that metadata into information and perhaps connected with other information so what in what metadata are you collecting i mean like so what do people need to be worried about what you're pulling into your system to be able to process uh gina great question so so metadata we collect is is is typically you know uh mac addresses ip addresses um um it's that kind of metadata um and so um and and we from day one i mean we actually uh uh used to run uh the enterprise business at uh a very large competitor uh we had the opportunity to acquire another cloud company in there we had a bird's eye view at what state of the art security was you know six years ago and we took that a whole another notch i'll just give you one example um and there's a security conversation coming up as well um like we take our mac head provided today they take a mac address they stored it in there in the database we take the mac address we obfuscated before we stored it on so not a single mac address you if somebody stole servers at one of our public cloud facilities first it's it's block encrypted it's encrypted with you know organization specific keys all that good stuff but even if all of that were shattered and they got down to it you can't even find an actual mac address the way so we've taken security to a whole another notch in fact that's why we the federal government was one of the investors in mist um you know and that was this was built into our bloodstream from day one so um there's a whole conversation around it yeah but that's kind of the question i'm asking so like could the federal government actually do like they do with the cloud providers and request that information because they do feed that into applications that are able to attract people you know very well discovered i mean i i i would say gene i mean security and privacy is always an issue with every customer we deal with uh it's definitely an issue with the banks but we have banks are now using mist now they're moving to the cloud you definitely have to get over the security audits to get them to move to the cloud the federal government is even moving to the cloud right dod they're made with fips we've always known about fips i know if you know about fedramp the federal government is now basically putting fedramp in place basically to move to cloud services but it is an issue right gdpr it's always an issue securing privacy are like two sides of the same coin that's that's precisely the point i was asking about it's not only about privacy and what data you're sending but you want to have the control of what data you are sending to the cloud so me as a customer i want to know and i want to control which data i'm going to send to mist and as soon as i hear from miss that it's leveraging metadata to redesign my topology which i might for for a specific purpose want to hide and obfuscate for a number of reasons then that might be a bad thing so that's going to which data i am sending that's actually a key component of trust and security in general yeah it's not just sorry it's not just the uh the data itself right because you know one of the other issues again we i don't know if you'll have time to talk about it when you create models using the data that data becomes embedded in the data the model is nothing but a menu of data collection so if you're going to use a model that's based on a certain data that's not to be used then you know you need to clarify that explainable api and then kosher ai to validate that model suggesting you know what my data entirely came from this so you store a snapshot for auditable purposes later so that's even a more complicated topic yeah andy andy and peter as much i love the passion around this let me just say this right we run a very very large fortune 100 fortune 10 companies in general what are the principles we've followed are follow the local compliance follow the customer security compliance and and security requirements uh happy to have this conversation another day uh on another security field day maybe uh uh but uh you know interested time um uh bob i'm gonna have you uh take this uh you know sort of uh ai primitives conversation yeah so i mean i think the other point here is that if you look at self-driving cars if you look at watson playing jeopardy it is not one algorithm that's getting that car to drive or getting watson to play championship level jeopardy the same is true with marvis you know when we started the journey we first started with kind of what we call an sle framework which really allowed us to apply what we called mutual information now what that allowed us to do is for one of our very largest ecommerce customers you know they had a warehouse where they had a consistent problem with the devices connecting to the network right and they never could figure it out once they got harvest up and running it was really mutual information that led us to narrow it down to they were having a particular problem with the particular barcode reading device with a particular os so that was the power of mutual information being applied to data here what we're looking at is really temporal correlation and the gc and his team are working on what we call graph databases now what this lets us do is another case we had one of our very largest retailers basically called us up they said hey you guys your stuff is not working you know zadir was up at midnight about to get kicked out of this poc you know it's like how could this be you know your competitor just got through it it turned out what had happened is someone had misconfigured a router on the other side of the network that could not pass large mtus so no devices on our network that actually connect so i had nothing you know plastic had nothing to do with our network had something to do this is where this temporal correlation graph is starting to help right we're starting to take events across the network whether it's a configuration event on a router on the other side of network or some health event on a router right where a router goes down or a switch goes down or hcp server goes down you know so this is really a next framework they're going to let us start to that get to that next level of championship jeopardy it's basically starting to correlate newer experience events with some other event in the network right and it's what we call temporal correlation yeah and and so uh again where this this leads to is when we are able to stitch a user's experience across wired wireless van that's when uh you know magic happens right so uh juniper mist we are sort of in a very unique um uh place where uh we we have one cloud that mans the wired wireless uh van assurance experience and so once you stream data like this and you're able to ask the question why is somebody having a bad teams experience or zoom experience or skype experience or whatever we're looking at the user's experience on the wireless link on the switch on the van link up into the cloud and and so being able to put that entire client to cloud perspective is very very very unique to us you know you take siloed sd-wan players that's all they do if you take you know uh traditional companies they have their different uh systems manning all of this so we feel really good this is what we call the client to cloud experience and bringing ai all the way from the user which we're going to talk about the marvis sdk but all the way from the user into the cloud this is an example of that now so jis maybe let's talk a little bit about you know a lot of this results into finding actionable insight on the network so uh gi sing i'll turn it over to you right yeah so i will quickly go through a couple of like bob said we really have it's not just one model it's really you know a mix of the models heuristics and it's a very complex system i just pick a one example a normal action just because i think as you know anomaly detection is probably the most widely appreciable email problem that almost every product needs to address right it's the same for us we have been spending over four years of time continuously tuning and improving our homegrown anomaly detection model and finally it reached very high accuracy with this neural network model called stm launch short-term memory and by training a multivariate uh stm model we can really capture both the temporal and the special correlation over all these various user-centric features like severe demo it's so it's a multivariate because we have so many of the features and the parameters we need to track and this actually guarantees a very high efficacy of this uh anomaly detection model because we only detect the user impacting network issues so they are on the productive you know ml is not just about model right it's all about this like productization you know rather than other vendors which just train one model for all of one course model for all their customers all for one's industry sector we are actually baselining and training one neural network model for each site so it's not a one model for each customer so it's one model for each customer to cite why just as you know you know walmart has thousands of stocks the store in cupertino may have different characteristics than the store in montgomery so as a result you can see we're really training and servicing over 10 thousand neuronal models simultaneously in mars this really showcases the power of a cloud native infrastructure so sharing you mentioned that you're training separate models for each customer site for anomaly detection yes yes because it's a it's a how many customer sites you guys got you must have thousands of customers stocks maybe millions i don't know it's it's phenomenal right so far i think we have tens of thousands 10k sites so you have 10k separate model training activities going on and inferencing activities going on this is really the power this was this is the manifestation of choosing a cloud native infrastructure you know this is impressive thank you yeah so we're not just talking about models you know at the end all of the you know training on neural network model as some of you may know it's very you know low bar work at this moment 10 20 lines of the python code on top of the tensorflow for the service in the neural network at this scale and achieve very high accuracy that's uh really the top-level problem we solve for the industry here i don't think that everybody's getting even closer yeah and this accuracy you know anomaly detection is a is a double-edged sword right there there's the intelligence you can get from your own infrastructure but then there's also intelligence you can get from other folks as well can you talk about how you're bringing both of those together if assuming you are but are you bringing both of those together and if so how because you know for me as a customer understanding what would be an anomaly within my infrastructure is one thing and that's incredibly valuable what would be even more valuable is if something is seen in another customer that i can then benefit from yeah that's a very good question you know so we are doing something like the transfer learning concept in this alone detection it's not even not talking about across customers just think of as imagine different sites you know like different stores even for the same customer we have a slightly different behavior you know the the store setup the especially we care about our client climate certainly are different right so when you bring up a new store think of that right then because a non-detection need a baseline we use 30 day of the data to train a baseline but before you start we don't have any data what do we do actually we take the model train from other sides of the same customer which may have similar behavior to study with so it's not a code starter it's a warm stuff then based on the continuous data we collected from this store we will add the customization into the model so this is a transfer learning concept this is exactly the reason why we have to train one model for each site because if you put all this site we have here together the model is going to be too cost you know it's really a good balance of the granularity and also the sensitivity you know so now i get i get that where where something that might be an anomaly for me might not be an anomaly for ray for example in his infrastructure um but there are that's i think where the fine-tuning comes in and the intelligence could really kind of benefit both of us yeah i think the answer tim is yes we use global data for transfer learning but ultimately each network is unique ultimately you have to get down to that site you know one site you know site in india is not going to be the same as the site in the u.s but yes the transfer learning lets you leverage the power of the global data and then you basically want to train models of things the other thing i wanted to point out is when susan i left cisco the thesis of miss was that there was a fundamental architectural change between cloud and ai right you know we just got done acquiring meraki and i could tell that when if you want to do ai you basically have to have a real-time cloud infrastructure to actually get this to work right right and that's why i think you're seeing some of our competitors starting to struggle here right ultimately you have to have it's really a blank sheet of paper problem you got to build the right infrastructure to handle all the data that's coming in right you know ten thousand you know you think you think doing ten thousand models is difficult but not if you have the right architecture right right exactly like bob said you know people saying oh i will try one model you know which is good for you know this like a similar customer you know just ask the question whether it's because of your limitation of your capability of training really the best branding model you know so but anyway um so the other challenge you know and only caption we pick this because it's so common across the you know different industries another common challenge of anomaly detection is this difference between statistical anomaly and venus which user really cares about this problem is also to the question hear me ask about so for example you cannot wake up the itty metal at 2 am in the morning just because your neural network model from a summer deviation you know at the end you have to turn each anomaly into actions so here like about topic before for each anomaly we detect before we send alerts to a user we actually use the mutual information to finally realize the relevance of different network attributes to really find the root cause of this anomaly like as shown in this example you can see the windows device this is what bob talked about that e-commerce customer so the windows device actually is highly associated with the failure of this unknown so what happens there is actually the you know push the certificate update to all of the windows devices for that site for the forget to enable that or before you enable that on the reddit server so in the end of suddenly all of the windows devices start failing which triggers the norm so you can see the key here is not just the same as a model result you know to the user which sometimes is noisy or you know overwhelming the key is to really find the root cause behind each of the normally to really turn them into actions so the model per site and update is not all the different than the edge concept right from my earlier life you know working with edge devices every edge is considered as isolation and then the model can be customized for that when you do that the issues what they face and that is the first one is once you continue to customize that model per specific edge the model starts to deviate from the norm how do you handle that model run away right that's one issue second one is at what time when you infer things at the edge the model is not standard which is a deviated model what time you think that you had up update call home and update the central model but more importantly when do you decide to update the centralized model and then push it down to all the edges these are very good questions let me answer them one by one first it's uh it's not the act the model is actually the inferences down in the cloud so there's a we don't push the model so far we don't push the model down to each of the edge devices the inference is down in the cloud so all of this training all of there's no like essential mode as i mentioned each side has its own model so this trend this model a trend updated every day with last acidity of the sliding window of the data so then come back to a model when the model deviates that's a very good question you know when you use the 3d or the behavior can change the reason we use sliding windows to try to not dramatically change the model for each side but still sometimes i can tell you when you really go to the productization there's a lot of problem the model can not because the behavior change sometimes because of your data pipeline issue right so you may drop down later so we end up but this is really the beauty of this uh you know multi-tenant cloud environment so we actually after we every day after we retrain the model for each side we actually check the deviation for each each side if we certainly observe a big change across all these customers since they behave what does that mean that probably is not the customer behavior change is something changing our cloud so we have to use this across customer validation correlation to really detect these days we even detected the buyers introduced from our cloud departments far how often you do that is it a daily basis minute by minute basis it's a database every day to retrain the dates correlated across the mr universe and actually there's also validation stage it's not saying we trend and we deploy because that there's a other kind of uh issues can happen we have to trend there's self validation you check each of the site model how much deviation and you check the global and unless they pass the criteria we're not going to actually this is the automated trending validation deployment and the inference when the model is deployed safely in s3 like for example aws the next minute it will be picked up before the new uh for the incoming data anomaly detection so the quality validation is real time but the updation is based on the needs okay got it okay exactly so all of this is fully automated we don't do any manual tuning all of these things and this is running at that ten thousand neural network models so very quickly um what does this mean to our customers right so as i said uh uh one of our very large customers they deployed two thousand aps every night and as they're deploying aps as they have aps that are deployed if they need to go and figure out you know is all my config correct is all my uh you know systems properly set up you know these are three examples i know the video is probably not legible um but um three examples very quickly is if you deploy an ap and the wireless user vlan is missing there we can you know we can detect that we can find out we can alert them you know again the you from a user experience perspective they'll say hey i walked under that ap my wi-fi dropped reality is nothing to do with wi-fi right same thing you know if there are devices that were previously connected and you know are now for whatever reason unconnected or if i'm seeing lots of failures and a spike of failure suddenly on a radius server or a dhcp server without speaking to those servers very high efficacy today when we call this marvis actions by the way for those of you who are new to mist and when when a marvis actions web hooks alert or a streaming alert happens our customers don't even bring that to network engineering they directly send it to operations hey go take care of it they have that level of high efficacy confidence when we do anomaly detection take it to the bank right so this has been very very helpful in some very very large networks uh real examples uh real feedback now um so the next piece here if you just to recal uh recalibrate on our journey we talked about data we talked about ai primitives and we talked about the um the toolbox the ai toolbox and we're going to talk about more tools in the toolbox but the fourth vector there is this notion of a virtual network assistant uh and so bob uh you've been uh our vision and inspiration for this i'll i'll have you leave this yeah thank you you know so if you look at the 20-year you know in the on the industry and the journey we've been on you know 20 years ago it was all about clis right everyone was managing these boxes by clis and 20 years ago we took dashboards right and we basically took the it department in the industry and said let's move away from our clis let's start managing these networks by dashboards and then we went to apis then we just said hey guys everyone needs to learn python you know and this is what cedar said this really took the deployment scale up right you know people were deploying you know one store night they now deploy 50 stores a night and that was really around basically cloud api we're really moving into the next era of conversational ai right and this is really going to be the era of making it easier for our iet teams to actually get to the data they need quicker right and this is where the conversational interface is going to become the new standard for actually interacting with your network and getting data out of your network and it's got two components to it one is actually being able to learn to trust your ai assistant you know so we talked about earlier that conversational interface is going to be the key point of getting to know marvis you know how do i get to trust marvis you know marvis how did you get to your answer you know i think so several you pointed that you know you want to know how someone whether it's a human or an assistant you want to know if you can trust that answer you want to see the homework behind it and plus you want to be able to start giving your ai assistant feedback you know hey marvis you've got the wrong answer right and i don't know i want you to get better you know so this is where i call the new year of conversational interface um as a fun day i've seen some of our competitors poke fun at this here that this is really where we're headed some of our competitors said oh nlp uh that's terrible you know blah blah blah and you're going to see them talk about it so but jishing um i'll i'll have you take this away yeah thank you sudhir and this is a high-level architecture of the marvel's conversation in interface right so there are like three things i want to quickly highlight here from the left to the right first is this unified interface as bob mentioned this is going to be the single interface to satisfy the all different uh requests for the day-to-day network operation no matter you want to troubleshoot something you want to even configure something even open a ticket it will be all done through this neutral language based on basic interface the second layer is this angle engine you know as you know one challenge of developing chatbot is the feedback like bob said you want to collect the feedback continuous involved but you know people tend to lazy to give the feedback all people only give the buyers feedback when they are dissatisfied so with this new interface we actually have a building ml based segment attacking engine you don't have to manually tell us every time whether you're happy or not marvis knows that while talking with the user and will continuously learn and improve last but not least this interface rather than some general purpose like network uh kind of chatbot you probably see before later you know this one is fully empowered by this marvelous intelligence we have learned over time for each customer like your baselining like anomaly detection you know knowledge graph etc so this is really differentiate us from other you know general purpose of the chatbot in the industry yeah next uh i'm going to quickly talk about this i don't want to spend too much time on this you know this angle engine still is the call of any of the chatbot how smart you can understand the user intent and the questions you know here we use the transfer learning i talked a little bit about a normal detection but this is another application of that as you may know transfer learning is really the foundational technology for almost all of the ground breaking advancement in text and voice recognition these days now the most famous example is the gpd3 released by urban ai you know a few months back which got a lot of attention in the only world so similarly while we do marvelous conversational interface we are leveraging a state of our unreal engine which is pre-trained with over 10 million articles and this is also the same engine used by google home and amazon alexa what it really helps us is a bigger boost of the marvel's interface in terms of the understanding the general intent entity revolution etc you know by standing on the shoulder of this angle engine what we need to do is to add this network specific dialogues marvelous has learned and accumulated in the last couple years so the key concept here is like we're not a reinvented wheel for everything we do you know in this case we leverage the state-of-the-art unl engine from other industries some consume industry but it brings the best user experience for the network customers and so so i think just as a quick demo what are we talking about here what we're really talking about is i think today there is more and more and more data as you're managing these networks at a global scale right so more dashboards and more data and more and more of that is not the answer we really believe that um you know an ability to interact with marvis ask simple seamless questions get actionable real-time insight um is is truly ai assisted insight is truly the future right so whether i want to take actions with with the chatbot whether i want to troubleshoot you with the chat part whether i want to be able to say hey you know what's wrong with the network and just get get to get down to work uh we really believe this is the future we're investing quite a bit into the conversation interface um here and and this is sort of just the beginnings of this right so so this is sort of our north for us is you know the dashboards will exist the data will exist they you know you can always look at all of this data but ultimately we think networks can't be operated using sort of this conversational interface uh so i see that you have a building a complete conversational interface is that a standalone interface just for you or do you have any plans to interface with the other existing constitutional interfaces namely alexa and others so we we are you know we today why api integrate with siri and alexa and other other other systems you know we've uh rscs and customers have written integrations andy already um this is uh you interacting directly with our ai engine right you know speaking to marvis so this is a standalone built-in system-wide across wired wireless van client-to-cloud interface so one interface across your entire network um but specific to your network yeah two uh team international question there i think this is uh we use some of the opens or andy sorry use some of the open source uh actually uh chatbot or conversational interface framework then we build our you know knowledge and this you know intent entity all of this things on the top actually that framework come with all of these apis directly to slack to alexa all of this as long as we you know build the core there it can be very easily integrated with other of the tools you
Info
Channel: Tech Field Day
Views: 1,328
Rating: undefined out of 5
Keywords:
Id: HPtRwTwZo1w
Channel Id: undefined
Length: 67min 29sec (4049 seconds)
Published: Thu Nov 19 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.