Arista Networks CloudVision Demo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and so I'm andrée pêche I'm a director on the software engineering team here I was actually spoke at tech field day three and we were mentioning before so it's exciting to me to have you guys back here and and yeah I guess what we want to do here is kind of make this a little bit more real talk about one aspect of the cloud vision portal really focusing on the network of provisioning part and how we can really automate things for that large set of customers that Jeff was talking about definitely please you know jump in and ask questions make this as interactive this as you like so I guess to kind of set the stage here you know it as Jeff was saying we're really trying to provide that that kind of automated you know that network automation that cloud automation for though you know the the large tail of customers who have maybe not gone as far down that path and there are really three main parts to this that I want to kind of harp on here as we go through this demo so one is the notion of the network database right we have to get to this point where you have a single point which defines what you want the state of your network to be right you can't have a you know and it has to be a network wide view it can't be I've got this config file for this device that I've then copied and modified for this other device and done that a thousand times now you have all these files you have to manage you need that central point that defines what you want because that's also that point you're then gonna go and say okay well what's the current state of my system and doesn't actually reflect is that compliant with what I want it to be the second part is we need to enable customers to make changes on their network without ever having to go and touch their boxes I think it's important that you do actually need the ability to go to your devices and make changes and and you know debug problems as they happen but when you're making routine automation changes you need that framework to be able to go to find your change get approve and push it across your network in an automated way with checks and balances and visibility along the way to give you the confidence that the change you thought you wanted to make has actually happened all right so it's not just about having it happen automatically you need that you need the visibility into it into the fact that it's actually there and a lot of this comes down to where the API is that we expose in the ELS level that really make this possible and usually the API is we've been building over many years with our largest cloud customers that they've been leveraging to do this in their and so now we're trying to bring that into a turnkey solution for the rest of our customers and then the third part is is knowing that you know they're you know customers have their own investments that they've made in various products already whether it's our own config management database or task orchestration systems and we're one piece of that larger data center orchestration problem right as people are rolling out new infrastructure or new services there are a lot of different changes that happen across different parts of the infrastructure not all of which we control and so we need to be able to integrate with those those those pieces be able to take information over you know via API calls or whether entered by the user and provide that end-to-end solution for the customer where we're really doing what we do best which is the network the network management piece so with that I'm gonna go in I'll stop this a couple times for questions but I'm going to go in and give you a brief overview of kind of what we have here so here we are in the on the kind of home page you get access to a couple different different apps and I guess the main thing I want to talk about here is this service dashboard this is where we pull in information from risk comm to be able to give you information like hey this new software releases out there's a new piece cert there's a new field notice and it'll bring that to you at the customer site and so if we go to the network provisioning page what we see here is all the devices managed by the cloud vision portal so in this case we have all these devices that we can then arrange into what we call containers and these containers are really provide us a way to group devices in a way that makes sense to you logically so you'll see here that we have a container for the different data centers within each data center we have a container for each pod in this case I've actually created a container for each availability zone within each pod so if I have availability zones across racks for OpenStack deployments you know I can create a different container for that and this hierarchy then gives us the ability to to place configuration definitions or properties at more than just a device level right because if you think about what's happening when you manage your devices if you look at a config for a device 90% of is the same for every device probably 99% of it is the same if you have like a simple templating algorithm right the notion that your network you know each device is a snowflake hopefully is long gone right your to build large-scale networks our highly reproducible where you have a design you stamp it out and so when you when you're in that environment you you need to be able to take the configuration that you're applying not to the device but to a property that is associated with that device based on where it is and have that configuration then get applied so by reducing that to by creating this hierarchy we can apply configuration at different levels and reduce that duplication that otherwise happens the other important point to talk about here is this undefined container so these are all the devices that have come up in and and in zero-touch provisioning mode so the standard way that that boot up it's kind of like pixie booting a server right you boot up your DHCP for an address you get a boost script and as part of that boot script from the cloud version portal you register yourself with a cloud vision portal and then you wait for configuration so now whether you're doing ongoing config management or your initial initial provisioning you're rolling out new racks you can do all that configuration from one place it's not a separate process so these devices here now can get place in the appropriate containers applied configuration and bootstrapped all without ever touching the device right from their initial initial plug in so finishing up on kind of the the basic attributes of the cloud vision portal via the other thing you know so if we look at the tenant container here you can see some of the configurations that we have defined we call them config lists they're really subsections of config and we'll talk about these in a little bit more detail later but you can see here at the tenant level we've defined that the triple a configuration the e API configuration some basic network and time zone configuration these are the same across everything that I'm going to put within this within this tenant container and similarly I can add configuration at the different levels and this is what's going to allow me to enforce that you know at a high level everything should be the same in this way within this data center you know these properties are the same within this availability zone these properties are the same and that's and that's a really important way to reduce duplication the other piece here is that we can tag every device some of these are automatic and some of these are by the user so here I have every top-of-rack and data center in one pod one and then that was the ListView this is now a logical kind of hierarchy view where you can similarly filter on these labels and so if the number of device is that you're managing grows right as you have hundreds of devices being managed you can actually go and quickly filter down to what you're looking to look you know make changes on see the state make the changes and go from there so that kind of covers the basic pieces what I want to do now is put it all together and kind of go through one example but maybe before doing that is there any any questions that pop up as we go here all right I'll keep going so we're saying if there's like a parent-child relationship with Sabina and Leif so if you were to make a configuration change anything that's in that container and a child of it would also get that yeah that's a good point right so so the containers are a strict hierarchy labels are are not strict hierarchy labels are can be applied you can have multiple labels on a device and it can there's not kind of an enforcement that you know if if you have this label you don't have this other label that's that's applied by the user the containers are strict hierarchy which allows you you know I think the easiest way to think about it is based on location right I think the data center or pod availability zone model works pretty well because that's kind of how configuration is applied and yeah ne-yo you inherit from top down you know things that are closer to the device or the device itself can override what happened above but basically you the configuration of one device and we'll see this in a sec is really all the configuration from the tenant from the top-level container then all the child containers going down and then what's applied to the device and so in general what's applied to the device is really just the stuff this device specific right it's the IP address that you're gonna get rights it's you know it may be you know depending on your environment that may be most of what you have maybe some port specific speed configurations but in general a lot of it should be at the container level and with templating you can actually put even more at the container level and just apply a template with specific individual you know kind of device specific variables filled in thank you so for this example what I want to do is actually go through kind of a common thing that we have to do at the networking team layer is go and apply a new alcohol across your environment and you know this is one of those things that we don't talk about as much because right now we talk a lot about integration with that with SEM folders and orchestration systems and what do you do as virtual services move across your infrastructure and that's really important that's what cloud vision does - through our integration with OpenStack and VMware NSX using OVS DB and similar technologies but it's equally important for the networking change be able to make those underlay changes in an automated way so that as vulnerabilities come up or as changes need to happen or as you're rolling out more racks you can just as easily provision and monitor what's happening on your infrastructure so in this case we're gonna go look at the config let page I've already created this config let it's called the server port Akal this is my all of my server facing ports have this Akal applied you'll see that this is just CLI right this is not trying to teach you yet another data model this is for network engineers to know how to use immediately and we can go and validate this across devices using using e api so you know that syntactically correct and one thing you'll notice is this configure is not associated with any device it's just it's there it's a subsection of config but then we can go and apply that at for example the datacenter one pod one layer and what we're saying here is we want all of the devices within this pod to have this configuration and so when we save this one important thing to note is we haven't actually touched the devices this is this is configuring our intent that we want this configuration across all these devices and you'll notice there's that there's this pending task for actually running me is there's a whole task management system for actually approving when tasks can actually get run because the definition of what you want to do and when you actually want to do it are separate things and if we go here and look at the config for a given device we can see that on the left there are the config 'lets on the right are in the center there's the the design configuration what the configuration will look like when I push this and on the right is the actual running configuration of the of the box and I'm gonna pause this here because I think it's really important to understand what's happening under the covers here you know if you look at a lot of config management systems there's this whole dance you have to play of like this is my config file from before and this is what I want it to be and let me diff them and figure out what commands will actually need to send and oh but I'm on this type of device now and you send these ones before that ones it's just a mess right how this is where we're able to leverage the API we provide some AEPi there JSON based API and config sessions which are templates at the CLI it's re transactions at the CLI to be able to say look here's Mike here's the full configuration I want the configuration of the box to look like this there's no dipping on the conversion portal side it just says you've told me this is what you want the config to be here you go and the trick is by leveraging the our system database we're able to apply we take a snapshot of the existing config we roll it back to the clean state we reapply all this config and then we copied the database back over and in doing this we only change what actually is different based on the new configuration you're pushing and so from a system perspective you have all these agents running around they only see the difference so in this case you'd only see this new Akal nothing else changes but from a complexity perspective I know that the configuration is exactly what I want it to be if it fails nothing changes right that's the other thing that is terrible about automation is like automation is great until it half worked and didn't quite apply and then how do you get out of it right so this is an all-or-nothing push and by doing this view you can see that there's also nothing there that you're gonna get rid of that maybe someone put manually we have a whole reconciliation process to be able to say hey there's actually a command that someone put here are you sure you want to blow this away right I don't think it should be there but maybe you think it should so I think this is a really important part of kind of the power of the API so we provided and then how we then leverage that to do automation in a much simpler and cleaner way and so if we go here now we'll go to the the task management page assuming I've hit play again yep I have this task management page and you'll see the pending tasks there's one per device and now you get to decide how you want to apply the configuration you made when you do it all at once whether you do it one at a time wait and do checks in between right but now you have that control of how do I get these tasks approved how do I run them and how do I make those changes you know maybe you still have an outage window you know Tara change control window to do it when they're you know at a low peak time you cannot do it all and automate way and with confidence that after this is done it's actually fully fully fully complete and part of what we provide here and this is again the challenge of automation is the whole compliance checking aspect right because you you want to know that what you think should be there is there and that someone hasn't gone to the switch and changed it when you didn't realize it and now you're at risk of losing that configuration so the compliance checking is really important I think the final thing I want to talk about is everything we've done here with the UI is actually built on top of the json api is provided by the cloud vision portal so there's nothing that you can do manually that you can't do domain and this is really important because again our customers have made investments in in you know in custom config management databases they may want to leverage those that's great they can take the configuration push it into the cloud vision portal to define what they want the configuration to be but then leverage our api s and automation to be able to push that change across the network in a hitless fashion no or if you're using ServiceNow or some other ticketing system you take the pending tasks you push them up to that system you get them approved and only once they're approved and scheduled you then go and run them Ryan again this is how the cloud vision portal gets integrated in to the larger problem of data center automation so that's that's my quick demo um I don't know if there any other questions you guys should talk more about transactions talk more about truth I'm not saying do it now I'm saying just in general II as a company you should you should emphasize that oh yeah I mean the ability to make changes in the same way where you like not to think about the specifics of us on one side you said this is what I want the state to be and it just happens and this is the power of the database model for sure yep yeah yes yeah so this under the covers the cloud version yeah so it's a it's a cluster eh a model so it's three nodes so it's an and it's quorum election you get n plus one or two and see this yeah I'm posing one can go down and then when it comes back up another can go down that's the Rosi model similar to many of the controllers out there yes yes that's that's a really important point because when you think about kind of the open flow model right you have this this now central controller node that has become central to any decision making you're having at the data plane level and really that's that's not what we're doing here we're trying to automate the the management kind of the management plan not that controller data plane right and so this is all about how do i integrate with orchestration systems to provision new services or to you know with management tools for pushing changes and so that's right if everything goes down it's all you know you decide to take it all down just wanna you you need to upgrade it everything works fine right it's you can't provision new services but given that you're taking it all down that's kind of to be expected so yeah so getting good questions well can't you just counsel to to any particular box and sure configurations from there oh absolutely right and that's another really important point nothing of what we're doing is changing the fundamental way in which you interact with your in which you can interact with your boss right mean from a scaling perspective an Operations perspective you need to be able to go network wide and not deal with each box individually and do things in an automated way but at the end of the day you got to be able to get on your box right the seal is not going away when you're debugging when you're you know when it's 2:00 a.m. and you've gone it on to try to debug stuff you don't want to be told no no those tools you've been trained and you can't use anymore right so like all of that still works this is the orchestration and then of course from that box you need to know well what did my orchestration tool do right you can't hide that from them as if you know it doesn't matter so I was describing in the case of failure let's say the controller is there then they need to apply you know Akal on the box is something arrogant oh yeah just change it right and from the config from the cloud vision portal side what you'll see when you go and make the next change is it'll say okay well here's what you're here's what your current configures and hey there's some commands here that I didn't push right do you want those or do you want to blow them away and you have an option to there's a reconcile button which basically creates a new config let applied to the device to say hey here's the configuration that's on the device that we're gonna blow away unless you change and then you can make any edits you want or figure out you know or you can decide wow I do want to make any changes right now somebody you know my box is out of compliance I didn't think it should be in general what we expect is that you any sort of change window by checking is my boxing and compliance if yes start to make changes then push them right if no figure out why I get into compliance and then go and make changes awesome yeah what about rogue changes so you got your controller running you've got ten folks on your team you have a meeting with nine of them and say all right controller now yeah number ten doesn't show up decides the next day to make a change on a switch how does that get remedied well so sorry I thought you were going to talk about the I thought you were asking about the the person option but you're talking about the controller so you have the control and operation somebody decides to make a change local on the switch oh yeah again I think none of this supplants the standard process that people are going to go through right so you need to go and check before you push the change that nothing has changed in the device that you didn't expect then you push the change and then you go and check this and compliance so you can still make local changes even you can absolutely still make local change there's no locking you out of the CLI and and from from an orchestration system integration it's actually interesting that's one important point is we separate the the notion of underlay configuration and overlay configuration right so when when open stock when the open stock integration decides you know what I need this view end of the excellent mapping on this port because that VM has been placed there that's not static configuration right you do write mem that configuration doesn't go into this into the switch because it's not static great wasn't inputted by the user and it's something is totally dependent on what OpenStack tells us right and so that's separation you can view those separately and you can always make changes on the static side that that that integrate with that so you can always make your own changes if you want to to get things up and running or to in order to prevent something that shouldn't be happening okay well what happens in the orchestration program when two people try and make a similar but not quite the same change so with your ACL example yeah two people go and set up a configure but they set the ACL differently the merge guys you had you had like a queue of like 12 things that okay happen to look like there were four different devices yeah would you see multiple entries for the same device that's a good question right now what we what we enforce is the one task per device model reg is very tricky if you start to have people make different changes and then running them in different orders and in general you know what you're doing is you're making these changes and then applying them earning a change window if you're doing multiple changes to a device it's because you want it to change the previous one and we may expand on this further but that's that's the model you have one change per device and so if someone comes in and makes another change and then it's kind of additional to what was there and the same configure and so what would happen on that in the task you know before you run the task you look at in say okay is this what I expected expect it to be before I go and run in like oh that's weird this other thing that added and of course there's role based authentication to be able to prevent who can make changes also very important right and I'm trying and I guess yeah because the problem also become so when you have a single change pro device is now you you may be bundling for config let's for four separate changes really into one push yep if one of those fails it's all or nothing well it's all get rolled back which is that's a choice you get as oh right because I think you know within your change window you get to decide okay am I gonna try to push everything all at once or they say okay I'm gonna start with one and then do the second new to the third and then oh crap everything went wrong and it's Sunday night and the markets open tomorrow so let me go and roll back to what it was Friday and that's where the power of network rollback comes but if you get any stage one at a time then this regulator of having four people stage changes and then someone over executing or scheduling it change one at a time I think in this case we're trying to get the simplicity of making sure that you join up with a weird order and I can say that one and more process well in a lot of the companies that were going out there have a lot of process already so this is trying to fit in to that process not create a new one right just in terms of what mattered alluded to in terms of making individual changes on individual objects in terms of the next time you push a conviction from a parent object to those child objects with those changes be overwritten as part of a standard configure pushing or does it leave those individual characteristics and changes in place so the club is in Portal defines what the configuration should be so you're using it as the day base where you say this is what should be and so if there's something on the switch you want you can bring that in to say okay this is part of the definition of that device but otherwise the assumption is that you're you know you're using this to be able to automate and be able to say no this is what it should be and I need to know when it's not this and so so yeah so we don't leave it there right unless you explicitly tell us to pull it in
Info
Channel: Tech Field Day
Views: 11,024
Rating: undefined out of 5
Keywords: Tech Field Day, Networking Field Day, Networking Field Day 10, NFD10, Arista Networks, Andre Pech
Id: ts5jh8OrMRg
Channel Id: undefined
Length: 22min 26sec (1346 seconds)
Published: Fri Aug 21 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.