NetApp HCI Build Day Live! Condensed

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

ladies and gentlemen live from Boulder Colorado this is the HC I build day with NetApp and now for your viewing pleasure alistair cooke Thank You Jeffrey and it is awesome to be here in beautiful Boulder Colorado spectacular mountains rising up as beside us and also we are right in the middle of downtown Boulder so there are lovely boutique shops all around outside that unfortunately we haven't seen because we've been here inside the adepts offices for the whole of our time here and in Boulder now build a live as you know is an independent organization we bring you the inside scoop on what it's like to deploy a real IT infrastructure products in a something close to what you might see in the real world we are independent but of course we do this with vendors so we are here today with new tap as their guests and they've supported us to be here but the content my opinions of course are my own and expressed really freely here the nice thing is that everything has been working smoothly for us and so I have been saying nice things all week joining me on camera playing the role of the support engineer because I play the role of the customer is Kevin pepper a kitten what do you do here yeah so my name is Kevin Patrick and I am a Solutions Architect part of our cloud infrastructure business unit focused on our net fhc I in storage great product lines so I've been here for three years and focused in the field whether it's from helping position our products to helping support deployments and and other aspects throughout the you know since the launch of our nanofy CI product and you'll based in Chicago but regionally around the whole northeast that is correct I am based out of Chicago I work in you know mostly central based but I've also been America's base since I've started here right and the center of what we're showing this week is the new type API solution so we're starting with a basic vSphere environment because at the stage that's the hypervisor that the HR platform supports and it's a fairly simple environment that we're starting with but what is it that we're gonna be building out what have we got sitting in the back room so it's not too noisy here yeah so what we have built out and what we're building out is we're going to start out with our each CI deployment so this is going to have two independent compute nodes that will deploy ESX vSphere on to them we'll also have four storage nodes but in addition to that after we run through this deployment we're going to show how easy it is to expand our clusters as well so we'll be adding a storage node independently and then also a compute node to the cluster and that's one of the consistent things we'll see through this is that there are two independent but intertwined clusters and that's that the difference with the NetApp API is that the storage cluster and the compute cluster aren't co-located there are two discrete clusters that you can scale independently but that you manage is a unified whole that is correct and that's one of the biggest differentiators of our products and one of the biggest reasons I see our customers not only choosing that up HCI but really then taking what they purchase and grow that out to continue to deploy based on that simplicity and scalability that we have and that's the feedback I've had from some customers is they weren't interested in deploying HCI because as you scale beyond eight ten nodes of HCI your east-west traffic between the nodes starts to be problematic and that's as we're separating out and having a a smaller storage cluster that doesn't have to occupy the whole space of your 20 node vSphere cluster avoids that scaling of the east-west traffic as you get to a larger cluster yes definitely we just had a large retail win that just came off of the competitive platform because of that reason they started seeing that they bought this product and went down an HDI route for that simplicity aspect but what they realized is after a while you know they're running in a year or two they start having all these separate silos of clusters because of that limitation of only being of us to scale that eight or ten nodes that year you just mentioned Allister so what they just purchased was you know bringing in that up HCI to solve that being able to truly intermix different application workloads on a single platform and have that independent scalability to you know add just storage if they need it without limitations and you still get that storage cluster size limit that we still only one eight eight to ten storage nodes to perform the cluster and a scale feeling large but you're still saying it the reason we small size but that doesn't then limit your compute from getting out to the full 64 vSphere supports yeah so we have customers that will go that yeah fall 64 compute nodes but it's also on the storage nodes so we can go up to 40 storage you know so we start at 4 but we can continue to scale that out it still goes out to so pretty large numbers okay now let's very quickly look at the environment that I've built so take a look at my my screen here and the nice thing was we chose to bring a very small lab environment here I just brought a single vSphere server single ESXi host in here built as usual with my Auto lab build and so it's as a single server it's got a small number of virtual machines on it and really we're not going to take a lot off here other than making sure that we're working with a brownfield environment so it should be familiar to people who watch previous build their live events that we run with a simple environment we have only a single host I haven't brought my two to no works workload cluster because we're going to build new workloads completely on on the native API but it is brownfield environment we did have some challenges on Monday afternoon and Tuesday getting the network configured to to mirror how I normally see that my networking so as usual if there's going to be a problem it's it's the network and as we work through those issues we've got things resolved out now there was a phone call that we didn't actually have but should have had before we turned up on site will you do the discovery so this is that the way the normal engagement it flows with real customers as you do a phone call or a site visit beforehand and gather some information can you talk us through a little bit of how that process worked yeah so the way that we do deployments is exactly what you're talking about having that pre prep meeting to go over the networking and the way that I position that is to really have that done before we even schedule an installation because if we are using the customer networks it's on that call that we give them you know and we'll be going through the guide that we we hand out but we're gonna ask for what networks we require what the you-know-what vlans we need so that they know exactly physical cabling as well as the logical cable and networking requirements that we have with our net fhc eye even if we are using our own provided switches because we do offer switches as well and in that case our team is going to be doing the build-out of the switching environment for HCI but we still want that prep call ahead of time have all of that laid out so that they understand how we're going to handle up links into their environment and truly understand what we're going to use that we're going to design that cluster and get that networking information and then the result of that call is this Excel spreadsheet that I've got I've been in libre office here where there is there's a whole checklist of things to look at as as you build a can presumably during that call you're sitting there filling this out making sure that the data is collected that we've checked all the right things and that you've moved on then to the installation details page where we can actually start to see how everything is going to be configured for the customer as we roll down through this spreadsheet we'll get into some of the really significant details like how they the cabling is going to be configured and we'll we'll talk a little bit later while we're waiting for things to deploy about how you might build that capelin out yes but importantly in here we go through the number of ports that are required both Tinh gig and one gig and what sort of transceivers you're using all that kind of information is gathered in here along with the whole stack of networking information so this is the one central place that you'd have this over to the customers at the end as part of their delivery documentation yes we do and one of the things those before I send this out it's going to be that prep call because you look at this sheet and there's a lot of data on it but we've actually automated most of it so there's only a handful of fields that we actually have to fill out and it will auto build out IP ranges so for instance for here we just gave the first network IP and the rest of it the naming conventions all of the IP blocks were filled out automatically for you I know I kind of struggle with with this because I come from a traditional background where you want matching IP addresses for resources of the same type and I have this whole useful layout that I'm realizing is completely wrong because I should be just trusting the system to build it for me and using these automations I need to shift the way I approach using these resources and we'll see particularly as we give in to the given it is deployment on the top of the getting used to cloud native where things have in decipher undesirable names because you don't care in particular you care that the service is being a little bit more than anything else correct yes and we do give our customers that option so we have this auto you know fill in the different ranges but if you want to change it and will actually be going through that so we're going to change at least one of the IPS as part of the deployment to show that flexibility also we got through here and the documentation very much the the necessary documentation as you're deploying that new infrastructure all the networking all of the IPS and yeah alright so we've got all of our documentation about what we're gonna build there's also this deployment guide here that we we have now we're discussing before that normally the deployment is a White Glove service from your local would it be in this a PC or it normally be a partner you see it would come and do this deployment you know we have it in both cases there's a lot of them you know we do have partners trained that are comfortable and are doing their own implementations we have a lot of customers that actually have our HCI in their labs they're developing their engineering resources around it deployment services and other services to layer on top we also do leverage our own professional services team lets you do these white glove deployments and we we see those going on all the time we try to always like you mentioned the white glove treatment is something that we always want to be part of to make sure that we have that not only a consistent purple you know set up but it's going to be reliable and we'll get the customers up and running and then continue to support them through the life of the deployment correct and so this deployment guide that we're not actually going to need to read extensively but I mentioned of its it's the quarter people building additional services around this is thinking through the deployment processes and as you all I'm going to Paulo my methodology reading the documentation if and only if all else fails yes and this is also why I ever given beside nectar to look at how how I'm getting things wrong if it goes wrong because that's certainly what we found yesterday as we were getting the the kubernetes piece up was there was a couple of faults and my infrastructure that caused the problems rather than being anything to do with the net product which as a customer is disappointing because I like my own infrastructure to be nicely built but from the point of view of doing this event with me Apple was was good that was my stuff that broke yes you know what we've seen is once we have that networking consistent we have you know networking is correct we have a consistent repeatable process with meta HCI implementations once we have that piece and that's one of these other reasons I see a lot of our customers we have customers whether it's in mining industry or pharmaceutical industry that you don't have this extensive IT staff in all your locations and going with net fhc I you can have that consistent consistent set up no matter you know where you're delivering it and who's going to be doing the implementation I think that was hearing during the week about just how insanely powerful some of these HCI nodes are and I can see them going into places where you need a lot of compute but you're not necessarily but a lot of routine staff they don't know oil and gas exploration presumably there's there's quite a lot of places where it's deployed out to give extreme compute power but with no local management for it that is correct yes we have a lot of that especially when we start talking about end-user computing or looking at you know just media in general as well as what we're doing now with our new Nvidia processing nodes we have both our M tens and t4 nodes that are can deliver both VDI as well as the the latest node that's coming out to do inference as well she doing the machine learning element all all inside the Sochi ID level yes nice well we should build an HDI platform sounds great I remember we were going to go to a initial launch point off one of your pre-built notes so we built the IP mind configuration there's nothing special about sitting I might be in my configuration but you factory research the nodes apart from you to see the static IP address on one that is correct so you know what we are using is we're using a private IP addressing scheme so you're not having to login to all nodes so eglee's at least mention that we are starting with four storage nodes into compute so instead of having to log into all six it's actually just one we assigned one temporary IP which we you know ending in 199 for the network and so we'll launch that and kick off the deployment from there okay so it's on my management network with 192.168.1 99.1 99 and this is where we find out whether you rebuild it exactly the same today as as you did the previous today now we're a little concerned at the time and if this doesn't match we'll just you know pull it up into okay so you know what we'll do is we'll log into the ipmi and you know just double check so looking at the spreadsheet this was going to be the first of the storage nodes this one here this tg1 that one use so there's a I'd be my address this is all I will show you it's for real that we were having a problem at the initial Department let me take another tab and go there at least it's something there is much more reassuring and this will be admin and admin I'll capital actually yes it is the vendor of the subway so we'll go to remote control we'll go to I KVM and we'll launch that okay so you know what I did here is I looked at this and this is kind of showing that too we interface it has the auto i peeing so just hit end so you know when I was building this out you know just resetting things up I actually am you know one of the my mistakes years I didn't actually put it on one so we'll look at another note here to make sure that it has you know so let's kind of go down the line and both you know this shows that that screens so this is all you know nothing really filled out here if you go all the way to the bottom I'd we did actually you know just keep scrolling down with your mouse you know one of the things we did do here is we can see that private IP address and then we set up we just enabled it to have LACP so let's go into 2:15 okay let's check the neck so we're just kind of you know this is again my is also deers if they on the bone okay and this is actually what I was doing this is you know make sure that we have that DHCP on and you know enable LACP so that is one of the you know the first things that I do is part of a deployment resume we want to look at 216 and see if that's got the static IP address on okay and I'm let's just look at 217 so you're speaking the spin about the address to be on the wall yes and if it's not there then I might have forgot to do it so we'll just put it in now last night because you were not able to ping we can try so kevin has been working really hard the last few days to get everything to work so - yep looks like you just missed one step late last night as you were doing the second receipt of the day yes I did but as we are doing a full build-out this is actually the first step you would do as part of a deployment so what'll actually if you do is just go down to ipv4 and we'll hit enter and now type the IP concert site one night one night you know and then hit enter sup Nebraska's 4 to 5 5 to 5 5 to 5 0 and then I added the Gateway just being that - which would be very familiar to anybody who works with alternate it's the same saying that management they work for all in it all right and then here asked to save it yes to accept the changes and now we will try to log in to that node like that to complete assume that distance services restarting in the background go static addresses being accepted we'll go back yeah tomorrow 199 and what refresh on that oh we've got a nice wizard to start with okay so I'll just follow along the wizard you can get started switch to HD yes I read the deployment requirements that is the deployment guide we had up already today powered up the network yes I'm authorized so as we are deploying both our nodes as well as VMware we have both license agreements as part of this we've the me to have license agreement very carefully yeah yeah I accept that VMware License Agreement you know I've read this one before I think it's the same version I'll accept that one too and then we'll continue then right here one of the things we have is we're looking at the configuration so we have both the option of deploying a new cluster so this would be you know greenfield environments you want to bring that in deploy it we've had a lot of customers in the past do that as well what we're doing here those as you mentioned we're you all ready have a V Center built out they'd already has your compute nodes instead of trying you know building our own cluster and moving things over well actually be doing the joint and just join this directly to your cluster that you have already all right so I hold into my normal DC virtual Center DNS name in the DNS server which of course is the domain controller and then a user name on vSphere it's going to be an administrator at vSphere local and that password or they account it's officially deployed the vCenter appliance at six dots it's one seven update three we should be able to talk to and now I need some credentials for a storage cluster and management yeah so what we had in that deployment guys we're gonna use administrator here and then we just have a couple password requirements that we have to and before we were just doing it all over case we're just my reflux of being kept away on administrator we use that password and we'll use it again just confirm good reasonable complex must with requirements and so now what it's going to do is is pulling up our network topology so we have two different options here in terms of how we can do the cabling so we can do a six cable configuration which allows customers that don't want to be using VLANs to have everything separate so it's going to have management on one gig it has and then different port groups for I scuzzy than the VM networks instead what we've done here and we've cabled our system up for a two cable configuration and one of the things there we did is um you know and we mentioned networking issues and how we started this is you know when when we had that prep call and we think we've discussed it we just you know this is a new cluster this is a customer build-out so yeah well I pulled all the cables and we started all over followed that deployment guide which actually will go through but it shows network configuration and we followed that guide to how to cable it and and set it up little immature nicely and so we're using on the compute nodes just the two team gig it works and everything that's converged onto those two team gig networks although there are two more is if V Plus ports available that is correct and we do support customers adding onto those later so if you wanted to you know enable those after the fact we can certainly do that and if I remember rightly the compute nodes also have an empty PCI Express slot so if you need even more networking that would be an option to upgrade as well although that would limit your ability to use a GPU and the compute now yeah and you know depending on the version that we're at we do allow certain you know how we can expand out clusters in that and we'll show you very shortly the unboxing video and so you get to see the actual hardware and then there's only a for cable option for the storage nodes yes so today what we're doing with the storage nodes and you know and when we even look at the compute node so it shows two cables here we actually have a third cable plugged in the third cable is for lights-out management on the storage nodes we do require for today so we're keeping the management on a one gig network and then we're doing all of the I scuzzy traffic onto the to sfp+ so we're SFP 28 we can actually do both 10 or 25 gig Ethernet on it I'm super super IDing out that storage network allows me to have a non-routable storage network which is always preferable and do management on a separate network there's some IP based storage solutions that require routing between for management which it's it's a debatable but there's certainly historic preference for having an isolated storage network yes and then this also allows I see and I mentioned that third Cable for you know lights-out management on the compute node will on the storage and most of my customers just have it in failback mode so it's four cables total and it's doing both the management traffic as well as the lights-out management row that's all so that suits really well if you've got that all on the same VLAN it will continue on with it and see what we get set to see now this is the point where my parsing coming to you about powering off the additional nodes that we're going to use to expand will come in and we'll see with her whether registered with you late last night as well that I did not end up powering them off so it is still in here but we can you know what we'll actually see here is the other thing with net fhc is we allow mixed clusters so you know it whether you start with a certain drive size or a different a certain node generation we actually allow for intermixing and we've done that here so we're using you know different size of nodes as we that we will be adding to the cluster than what we've started with so we're starting with these to compute nodes that both have 8 cores and both have 384 gigabytes of RAM so they're moderate configuration this will be your your scale out kind of configurations yes and then the storage nodes we have we've got one that we must use as the one that we're currently connected to and then what else we've got three more that are in the same enclosure but we have another one that's in a different enclosure correct so we actually started with two enclosures and you know from this original deployment we had the storage nodes in one we had compute in the other and then as we went to scale it out there were two empty chassis x' two empty slots within the one that we used had to compute nodes we ended up adding a compute node into it and a storage node so the actual inclusion isn't specific to being computer storage as the node itself that is specific and that is correct excellent and I guess you'd quite like if there was a six a slid enclosure that was was three you high where you could have the entire initial configuration inside of one enclosure at least customers might a better idea well what do we actually do with the cluster and part of our flexibility is you know so we're starting out with these two so let's say if I knew that you were gonna start with you know go to three I'd actually you know put two storage nodes in each and then when you add that third let's say you add two more storage nodes what it's going to enable by default is actually chassis protection so we have protection domains you don't have to click anything it's going to start redistributing your data and you know have that chassis level protection as well so just another safeguard as part of our data protection so if you were doing this for real with the customer each of those two enclosures would have two storage nodes and one compute node correct just not the way it's laid out in this this particular case although again it's it's my my preference for things looking pretty is getting in the way of things being really functional correct and you know we are flexible like you mentioned it doesn't matter what slides so you know we make our recommendations and and you know either way will work ok now we've got a dns taken from the configuration I already provided took a default ntp time sync is always important and then there's a few settings down here that need to be configured that is correct and this is actually filled out just like it looks in that HCI installation deployment guide or the template the excel berry spreadsheet that we were looking at earlier and so you could follow that and enter everything but you know one of the biggest challenges there's it's humans right obviously accuracy of typing things and did we type the right IPs and we're so we've actually developed and we'll use it here is you know launched the easy form and we've added a tab and you know it right into that same spreadsheet that we can look at that will show what the hell out in here so if I go back into that spreadsheet here is the easy form view so here's the answers to the questions and the short version of the the wizard that is correct now one difference with this is it shows me centers we're doing it join we won't see that in the actual we're not configuring an uzi Center so we don't need the sittings for that UV Center so let's pop back into the easy form what did we decide we're gonna call these so you name there's BDL for build day live - in that app each CI we get the product name nicely placed alright next things we want to do we do want to use VLANs will likely lands management compute management all of the management is using untagged so we won't put a VLAN ID on those that is interact the emotion is always on VLAN 16 where author labs using VLANs and I scuzzy on 70 so the thing that confused me when I first looked at this is I expect to go across and then down but you go down and then across because when we have mix in here all of the other fields open up for me that is correct so we start with here and at this point now we start filling it in and it actually will autofill out for you a lot of the other fields so you know even though this is the easy form to fill out the rest it does you know you saw it just fill in the rest of the subnets across the different networks as well and it corrected for my typo there we want 192.168 now management node was this fellow yes it was what is the management node that we're deploying you so what the management node is going to be is it will be a VM as part of the deployment and that is going to be used or upgrading the system as well as providing access into meta active IQ support so not only will that be for proactive support cases in case management but we also use that portal to send statistics up so that you can have we collect up to five years of historical monitoring of your system that you can look at them from active IQ to to look at what's on the new system full testing and performance review and that kind of things for active IQ yes it does and it will show you forecasting about you know we start building we built in our own intelligence to realize you know based on your growth patterns how long do you have before you'll run out of disk capacity storage capacity on that cluster does that look also it's coming a performance or we don't see that as being a common problem for customers with this platform yeah we you know and the performance side because of our scale out architecture every time we're adding nodes were adding performance as well when we're adding that storage side I'm actually computed as well it's bringing into some compute performance but yeah we tend not to have that issue but we do show that as well so it's filled out all of my computing storage management increases in here that's just double check as I just make sure that matches up so we're actually so what because that started at nine by default it actually started the next one at ten but we're using ten for one of your servers so we're gonna that next one and make it a lab and now we've got novelette because compute management is gonna use a twelve as well so we need to have storage management good entertain so internal validation going on all the way through set up the subnet for the motion no default gateway that's the end up 191 belch what did we have one start there it'll live in four that start and then storage this is the standard IP storage network inside Auto a or inside the customer we started with 10 30 min you know we're on the ice scuzzy Network actually oh we're down there yeah I'm looking at the wrong place you are absolutely I'm good you'll heal saving me from myself Medicare I like that easy one two makes it wider network settings hmm so it's now filling out that form we saw before and immediately going on to the validating the other piece that's nice yeah so we're going to be looking for any IP conflicts or anything else within the systems we're doing that now and right now all these fields are grayed out so that will open up when the validation is done and then we'll make one of the IP switches there we were going to make one of the changes regulators yeah so what what we're doing now is we've moved our management node over to container services and based on containers being ephemeral we've interest do start Trident plug-in so we're doing persistent volumes off of the cluster with it so we've now introduced full of having access to the storage network as well as the management network on that management node so what we'll do then is you know I Auto filled out one of the IPS that we just want to you know based on your best practices and how you keep your environment we're just going to change one of the IPS up one number so that wizard lets us create something that's nice and programmatically and until we consistent but here as a customer I can override and say much you know I want it slightly different because I have a an unusual standard in my organization which is what we'll do yes the sheet and all this information where it just can be stored in that spreadsheet nowhere else correct for the information yes we're storing it in that spreadsheet and you know for what we did for a pre installation we started all in that spreadsheet or you know both year access and we to get that you know the information over to our support organization as well so that they kind of understand what the deployment looked like up front and you know how it was deployed and you know if we have an expansion that we have those resources as well so that's the documentation of the support team we're going to have that you're going to deliver to the customer as part of at the end of the project but then the customers probably got their own IP address management tool that they're going to fit this all into as well correct there we go we are successful with network configuration we're back at that management note wasn't it that I wanted to change the IP addresses yes it was and it's right here yeah so what what this did is based on that auto IP addressing is it put your storage I scuzzy IP at 10:00 and you just wanted to keep it consistent with what the management is you know the last document management as as mine and now it will you know we'll see that green check back so there's no IP conflict there and we can now continue Dean continue so we've got a whole lot of this is what we're going to do summary of activities correct and you know there is a button at the top right that you can export this information out and but after you press submit and we start the configuration you're not lost there you know you can actually still download it after they thought of it as well right well let's start deploying and I will follow the instructions because we're looking the management interface currently through one of the nodes and that nodes IP address is going to change so we'll copy there and we will just pop that into another browser tab now and it won't come up yet because the IP address hasn't changed so we'll go okay on here I mean the wizard goes and and how long is this going to take you know in a new deployment it takes about 45 minutes you know that consists of adding in you know setting up the ESX so we will actually what we've done here is we've given options if we didn't do a new V Center build up if we did we'd have the option of using six seven or six five so we have both of those images on each compute no and so what we'll do here is we first validate all that deployment data we create the networks and then we start installing what you picked for the ESX version so it will install ESX on those compute nodes and then once it's done with that in our case though since we're doing a build it's going to join your cluster so the order of events will be create that storage cluster install ESX integrate all this in the vCenter and then create that management node then one of the check boxes that was on the last page at the way bottom was do you want to report this to active IQ and it's checked by default and so what that does for us is it sets up the connection right to active IQ to have that remote monitoring set up for your cluster I like remote wonderfully I might have something smart that's going to particularly look across data with multiple customers and start seeing trends across customers and and letting me know how that affects my environment it's one of the great values of having a management service that runs as a service and the cloud somewhere for me yes you know we didn't even need to copy that URL to the clipboard because the web page just redirected and while you were watching the unboxing video the web page switched from being the temporary IP address to being a permanent IP address of that node and here we are with the with the progress in the cluster and interestingly there's lots of detail about what's been completed all of the steps that are done and that's why it's nice to see and we can see the progress of we were hacked so that all ESXi deploy is happening at this point on those those compute nodes so while that's still progressing and we can see overall progress is just over 50% so I think we're going to beat your 45 minutes estimate yes we always like it when it's a little bit quicker but of course completely hands-off so we could be going out having a nice lunch well this is going on or more likely dealing with some this course it is nice around here to go get lunch it is we've had some beautiful lunches here and downtown Boulder and surrounding areas this meal of all of course was directly upstairs from the office here outstanding steak counts up students with wonderful views up into that into the hills but most places that we're doing this don't have nearly such nice facilities so we shouldn't make if you haven't been to Boulder Colorado let me see the Mork & Mindy house come and see the just the downtown yet it's awesome but I did want to have a look a little bit more at their networking options and configuration options and see what we actually got there so you gave me this diagram of what the back of the wreck looks like with the switches and because we're using your switches my my normal build a life switch is to just don't have enough 10-gig ports for compute force storage nodes and to compute now it's little low in the expansion we're going to do so we're using these two switches that you you provided with us they is an option for customers can you talk us through a little bit what we had have configured on there and how this all looks yeah so what we've done here and you know is plumbed all the networking up actually and I mentioned this earlier but we did it based off of a diagram that we'll see in the deployment guide the same installation document that we sent to our customers to fill out and so what what this now looks like is we have you know different colors and I label them here and these are 25 gig switches but you know in certain environments we may want this isolated especially for bringing it to a remote site that in some cases they say we only have one gig today well we can bring these switches in that's a perfect example bring it right in with the solution and we can actually use it all of the networking right directly on that which is what we did here so what we have is we have the five storage nodes and as we see on these switches as each storage node had those two connections to one gig and we can see that on the left and the the storage nodes also had the 210 gig and what we've done here is we're actually using the breakout cables so there's 400 gig ports on each of these switches so we use two of them for the is ELLs in between so we're connecting the two switches with those and the other two ports we are using breakout cables so that breaks out to both compute and storage nodes and then that's providing those services so you're using fewer physical ports on the on the switch in order to get all of that high bandwidth connectivity to the nodes and it's not clear on the diagram the storage nodes are labeled up here as in blue is storage 1 2 3 4 and 5 and the compute nodes have the purple labels on on the nodes yeah and you know any certainly in Iraq those labels aren't going to be there the easiest way if you're a customer to kind of look at this system and be like which ones which is you know when we look at this right now on these these storage nodes f2 10 gig 1025 gig ports and the compute nodes have 4 so that's there we saw that in the unboxing video with a rafael alright so that was that was the easy way to identify from the bank once once you've got them out and on the bench you can see the case modulae in vram Bay Becht cash module and the storage nodes that's not present in the computer there's a couple of towels on which is which yes and then with this setup what we've done is we have two green ports that are saying that are up links and when we talk about what we require from your networking and your switches as we just took 2 ports so we up linked from our switches here into your network and that configuration was automated as well so in my side run the script that we provided there run the commands that ya provided and it built out the switch with those up links ready to go so that visual switched conflict was written by that spreadsheet so long as you were running in Excel I'm running it and libre office so the automation probably won't work in there but back in that config spreadsheet there was a switch configuration page so the switch configuration page is what we'd actually look at next so you first need to pick out you know what number but if you go to the top of this page and we'll see if your application you know what it doesn't look like it did because it's got config five selected there what it would be doing if it had that automation in place is if you change it to config four for instance is it would start filling out you know actually it may have here the only thing it would start filling out some of the additional information I think this may be just because I put in in the in the checklist we may have you know selected no unknown options I I think it's to do with how I opened it in LibreOffice that I wasn't confident that the the automation that one so nicely in Excel would run at all and not make a mess of everything in LibreOffice and then what we have on the config plan are getting a tab is you know there's options of how you want to cable it so if you're a customer that's you know that top one there would say hey I'm not gonna use your 25 gig switches to give me one gig connections I have my own one gig switches for that so I'll just use my management and we're doing two cable and what this switch will build out is it has all these ports so it'll build out that config up front even though we're only using five storage nodes and we're gonna scale up to three compute nodes it still has in here you know having that and I don't know what the number is but like twelve compute nodes and twelve storage nodes so that if you're going to scale that six node you just plug it right in generations or they're ready because of course network configuration changes are very tightly controlled and problematic at times yes and there's a plus box right on that left hand side well you know and since we're just looking at one here we can hit that and this is what I mentioned about a select configuration let me move through this a little more slowly so it's visible on the on the screen so what this is done here and there's even instructions and how to use this and how to deploy with it right on the top is you know essentially what we want is we put that purple section there so we're going to have you cut and paste those commands first after that cut and paste the rest of it and it builds out your switch for you so the whole switch config now this this is for the minimum switches that you sell that presumably this configuration builder is specific to the middle note switches that is correct now when you know and a lot of our customers are using Cisco switching you know it is a standard and when we do that our professional services guys have examples so we provide those examples of how we want the switch ports configured just so that you know in case we're not working with a network team in a lot of these cases we may be working with a virtualization and a storage architect as part of that deployment and at leat we can at least give them what they're going to ask their us limousine for I'm hopefully a nice clear language the network's team understand because often there is a mismatch between understandings they're correct and you know as we're doing the to cable it's it's basic network configuration that we're asking for but you know in the storage nodes I showed on the when we launched that ipmi screen we set it up for LACP on the storage nodes so it's important to know what ports you're plugging into those to enable that the link aggregation yeah and then on the compute nodes as we mentioned two cables is sharing multiple networks so it's really just making sure we hit all of those different VLANs nice all right so that just gives us nice consistency using these pre-baked I'm scripts in here as well all right let's take a look and see how far through the deploy progress we are well we saw at 80% we're configuring the management node now it's going to take a little while to do what we should see and these centers there's been some progress so we actually have our new Dom HCI data center and cluster here and how compute loads are in and our management nodes being deployed on top of them so we're seeing good steady progress but it looks like we've still got a few minutes to run and wouldn't you know the wizard has only got to 97 percent as we've played out those videos so let's take a look at how things are progressing such as configuring the management no properties at the moment we do still see in vSphere are our hosts with a health issue what's at ember sitting here so we have the TLB CPU issue present on these nodes normally you'd apply a hotfix or disable so features we're just gonna suppress the warnings in here around the that's CVA issue that we we're familiar with and then at the moment we haven't yet got our heartbeat data stores for these for your H a so H a has been automatically configured on the cluster and it's just waiting until it's completely happy with that just see how there we go we aren't complete with the installation progresses all done so that's nice and at the top it says go start using yes so at the top here what we have on the left hand side is to launch our application to upgrade so what this is for is our hybrid cloud control services will launch this and you know SSL Certificates showing up here so it's just wait a minute that's yeah wait ten seconds for it to redirect or secure and this is the username and password that I just sit in the wizard before the management and NetApp HCI management node that is correct that was administrator and password so what this is here is we can you know upgrade an installation so that would be you know we can click on that now and you know with this latest build it's already there but if there was an upgrade you'd press begin upgrade and it would pull that in and automatically upgrade the management services to our latest latest hybrid cloud control services that we have to offer right so nicely built into the product to do that great and later on will will enable this new tap guard service of the need help communities service and we'll look at expanding as well although we won't use this interface to do the expansion yeah and you know what this interface does is we see it right now because we just launched it and you know we have the expand button which we could bookmark but there's also a way to get to it right from the vCenter plugin so within vSphere we can go to our native HCI plugin when it pops in so it's going to you know we will have to log out and log back in to get that to show up and you know we should see as a blue bar pop up at the top it kind of says hey you have a new plugin to enable and we will relaunch it with that when that comes in so that's lovely and again the administrator at least via dot local on their Kindle should rescan at this point the plugins that's been new warning about using expiring licenses it's now running the deploy plugin we can see the deploy plugin process is run we refresh the browser to start using the new tap element plug-in yeah and so what we do we do deploy with temporary licenses for ESX that up is not providing permanent licenses for customers so they bring their own licenses and can apply it later and it's just as we take a look at the yes it's over itself it's in this case six point seven deployed out and updated managed as normal as a vSphere server that is correct we keep it consistent with how customers are used to upgrading their VMware environments so they can use VMware Update Manager to update the the compute nodes to the latest versions and look at it that way and we can see in there the capacity we can see some storage capacity service what four terabytes of storage available and looking at that is how data stores we should see yep there's some local storage inside the node a very small amount of that two hundred and sixteen gigs of local storage that will be the Poupon and then we have two new database you know data stores which are coming off that storage cluster nice nice and easy not a huge amount of capacity in there because this isn't really deploying out of a customer's production environment it was production environment they they'd be more storage in each of those computers yes and you know and what we do though is you know we Auto deployed two data stores it's part of the deployment we're putting our management node on one of those we're putting you know if it was a new deployment it would put me Center on there as well and then from there you can certainly scale it out there's a lot of automation built into this platform so we can automate however many data stores you want to put in or we can go through our management plug-in for that we got two people in the channel ready feel free to go over to youtube.com forward slash we brown-bag to talk with us Joseph's Oda said it's perfect for us they got the purchase approval for two hcit Aziz with 4x compute nodes 40 core 768 gigabyte around and four storage nodes and then Franklin sent elope so it's nice setup you're going to have a lot of fun with that if you're using kubernetes check out the nks cloud enablement functionality it's super slick and real we are definitely going to be doing the encase deployment that will be a last big piece of the build a live event today and I think it's pretty cool yes I agree I've done a lot with n chaos and kubernetes over the last couple of years and not not the MKS side to start because it wasn't you know didn't exist it exists at least for NetApp side of things net a purchase stack point just recently and integrated that whole platform right away and so now we do you know as mentioned in the chat it's bringing those hybrid cloud control services right into our on-premises NetApp HCI to really deliver a private cloud and delivering kubernetes services we thought about this before so now that we have our cluster deployed out with our 4 compute nodes and for storage to compute notes we wanted to show the scalability because this is one of the differentiating features of the meta HDI solution where am I gonna go how am I gonna go about doing that expansion so let's start by going to the plugin so well I'll show you what we what we deploy so let's go to menu here and we'll start out with the meta element configuration so this does allow you to have multiple clusters in here so it does have that getting started but let's go to clusters and we'll see that we have so we only have one cluster that we've built out it's our 4 node cluster and then we can see this here if you ever needed to or wanted to go directly to a storage UI and that's a good you know and you don't want to pull up your spreadsheet it shows it right here and you know as you as you just showed here you have that option to expand in this right here as well to expand that fhc I you know so there's a couple of different places that you can do that we saw that screen right after the deployment engine finished that it said do you want to expand we see it right here and then we also have it into the management side so one of the other features that popped up was an encryption at rest so by default we did not enable it but it's you know all of the drives in this platform we do sell non encrypting drives for you know countries that may not be able to accept them but in in this case they're self encrypting drives so there's no performance loss we can just enable that feature right on the cluster and that was was on one of those storage nodes we have since there yeah presumably us destination for all of those yes so we have again a getting started page here for the managing the cluster so from here we could create additional data stores and and expand this practice have we gotten here so we have the ability to see all of the objects at data stores and volumes and and it is all astir yes if you needed to add another data store you're not creating all of the provisioning yourself it does launch a wizard to create that data stored and what sighs you want to to make it and the provisions that across the different nodes within your foster and we can see all the way down to the individual drives and all of the storage nodes in the ER as well which is nice yes well what we have here and we're going to do a storage node expansion as well as a compute node and but we can also scale back so when we look at here it's an exact opposite process so if we were going to scale back we would go on this page here and we can select the drives for that node remove the drives from the cluster once it's done syncing remove the note and at that point it's safe to you know instead of just pulling it which we could do you know we prefer doing this way through the protection on the remaining nodes rather than having it become compromised the protection by just pulling the the storage node out and have that the cluster rebuild afterwards yes it's protected I know it's it's job number one for any storage system yes but I've had a lot of customers that buy net FH CIA they may start with two of the exact same configurations and you know what often happens though is one site that is growing at a faster rate than another or you know whether it's two sites five sites or any number of sites what we look what they're able to do with this is remove those nodes that are in an existing cost or not a four node we're you know we we want a minimum of four so if we're in like an eight node cluster for instance we can go remove a node and go ship it to another datacenter and that scale in is usually much harder than scale out and scale out is a much more routine activity but scaling is crucial in some use cases you don't want to have a older storage characteristic where you can add to the bucket of storage but you can never remove that dependability is pretty cool yeah and another customer I have who was you know had a one of our original element clusters and they went to refresh it with our latest nodes and we got on the call so you know just like we're doing here to prep how do we want to do this migration what do you want to do and what what the use case here was it's mostly VMware so VMware we can do storage of emotion and just create the new quest as it is and and do that traditional or even leverage our replication so we have built-in replication to do it for them though they have some direct attached older windows hosts that they can't take that application downtown right so instead of building out a new cluster they just added the new storage nodes to their existing cluster and once it was all done syncing and ready to go they ejected out the old ones one at a time so a zero downtime migration for both the VMware hosts as well as the direct attached hosts into the cluster so you end up with the the grandfather's acts of storage clusters everything underneath the cluster has been replaced at least once yes and over time multiple times exactly oh and I've just been poking around pretty much at random here inside the management interface to see what we have but we did come here to do the cluster expansion yes we did and you know one question that comes up especially you know having customers on and listening to this is they look at this page here and you're looking right in the middle and we see node role and we see that one of those nodes has an empty role and really there's nothing wrong with that so in a four node cluster we will always leverage three of the nodes for running our ensemble services as well as a cluster master service once you're at five or more will always have five so you know once you have six you'll see an empty role again but we'll kind of see that so this is the same behavior for any majority node clustering that that you want an odd number of nodes actually involved in that management cluster in this case the ensemble nodes that are clustered together to provide high availability of management and that if you lose one of those nodes you've still got the majority remain if you have four nodes if you lose one a majority remains if you lose to a majority remains but you don't get any bitter resiliency than if you just had three notes and that cluster correctly this is why hereness yeah you had six knows you're still going to have five you always want an odd number and a majority node yes and so that's part of the ensemble services that I'm mentioning there and then the other services cluster master so the cluster master is our entry point into the cluster so as much as we filled out this IP spreadsheet and we made sure which node had the IDS in most cases you're never you're not really looking at it at that level because you have a single interface if you were to provision out you know we're starting this isn't @fh CI but let's say you wanted to bring in a bare-metal host you don't need to know if we have four nodes or eight nodes or what the IPS are we're only looking at one IP and we do redirects and behind-the-scene from there so the configuration of the thing that's consuming the storage is always can be just to a single IP and everything else is as learnt from the cluster that is correct making it very easy for our customers to not only consume storage from a base set up but be able to you know maintain that simplicity as they scale it out right well let's go through that adding another node as we scan out a storage cluster sounds great what do I get so from this page we'll actually just click on that actions button and I mention that you know expanding your cluster it's in a couple of different places here so we'll click on that from this page here which is launching our scale so from here we need to log in again the same login and password so we you know part of that wizard we made it consistent using the same log and password across all of the different services so first here we're asking if you're going to add a compute node you know sprite now we're talking we're gonna add these storage node first we could actually do them both together if we wanted but again we're you know we're just going to do storage we particularly want to show that that the ability to scale storage and computing yes yes I said no to adding compute nodes and so now just go that looks for available storage nodes so this will be doing discovery on network same as as did in that initial wizard that we ran correct and where that initial configuration had these you know again I had that you know if we sought there had a 169 IP address that we didn't type in it was automatically discovered when we built out the cluster that was all replaced what we're still able to actually leverage that so this node if we launch that same ipmi how to band interface we wouldn't see any eye peas in it well we would see the one six nine but we didn't manually so anything yes so if we select turn this he's got six drives again self encrypting drives two one gigs two 10 gigs healthy networking everything is continued what are the other things here is you know we're showing the node type in rock capacity and we showed yeah I don't believe you recalled and I don't recall what the rock capacity was on each storage node from the original deployment but this storage node that we're adding actually the drive size is double what we had in the original deployment and it's a different version so you know we saw each for ten or each yeah each for ten for a storage node and now we're adding this H 500 H 500 storage node and so this is using 960 gig drives where the original was using four 80s so what that's showing is is our customers is the scale out simplicity is there and customers want that but you cannot have that architecture and then lock them into a drive size or generation because if we're trying to transition people from a three to five year buy everything today that you need for growth to buy just in time for you what you need when you need it our architecture allows for that what by based on your budget cycles as well as what we find as customers haven't transitioned to a consume everything as they need it basis but the flexibility to as your budget cycle allows maybe once a year you get budget divided way once every two years no matter what you can still ed ed to the cluster are there any boundaries around how old of a cluster I could add new highway to do I need to within certain number of versions well so if we were adding a node to the cluster we do it you know like part of the deployment we do have automatic upgrades will detect and upgrade the cluster my preference is still just to do it upfront so I would at least make sure the note on the software level that we brought them both up to the same code revision now in terms of generations we've had a compatibility guarantee some multiple generations that we can add within a cluster so we still do have clusters with you know I see customers with three different generations of nodes within the same cluster that's that's really flexible for customers just that require that requirement to be at a consistent level as you're bringing you and reduces the risk as you bring the new nodes in yes it does so we've got all of the same IP configuration to pop in here as well so what have we got the scuzzy VLAN is defined we've got our default gateway for our management network and then we just need to fill in the remaining hostname and IP address ins has eight now did we we had that back here and our installation details didn't leave because we planned for us correct so you were right there right at that last line that's the scroll on this laptop and RDP session and then to to leave our offices from there we go that's my hostname yeah my management IP and then I need my advice cuz II I fear out of the spreadsheet as well and as soon as I entered that last IP address that's going up and I presume it's doing that Network validation again the same as it did when we entered that the whole lot and they was it correct and we'll continue on that yes I do what to continue to use active IQ so I don't have to choose to turn on encryption at this point so the note down here is data encryption will be available on the cluster afterwards it's nice that it can be turned on our words if I chose video if I neglected to turn it on beginning correct presumably once it's turned on you can't turn it off you can actually disable it as well oh wow that's a nice feature yes so it's you know click it on we can enable it it's this drives are already encrypted so it doesn't need to move anything around or cause that performance impact it's just what you can enable it alright if I hit add notes it's gonna start the wizard we'll give it a moment to bring up a status for us but of course it's gonna move a whole bunch of data around and that's going to take some time there we go as it starts starts the the progress if you're watching closely you'll notice that given is no longer sitting beside me joining me for this segment of the live stream is my friend Andy banter Andy what's your role here what's your title well my total at net app is actually storage janitor my role is actually the Emer integration product manager for the CI bu business unit so per the net f CI and SolidFire business unit and the storage janitor title suggests that you're cleaning up after a lot of things that have been done I have spent most my career cleaning up after storage offenses and Andy's history goes all the way back to the ice cozy stack and in vSphere it actually goes all the way back to the fibre channel statins at Sun Microsystems there's a lot of history behind there and this is why we particularly switched in some to Mandy because he's going to share with us some some deep insight in that how things go but Geoffrey there's a question that came in as we were switched out so we had one comment from Tech reckoning he goes very interested to see the nks and how it's differentiated from the other case cases it takes to burn others yes nobody should be rolling in their own kubernetes it's an interesting one thank you John and then Joseph Zoda goes for this install he's asking is encryption at rest software option if the customer does not buy encrypted drives so and the answer is no it's not so the only way that we actually offer encrypt our encryption at rest is if you have SED drives if you have if you do not buy a CD drives we have no option to do encryption at rest you'd need to implement encryption another lion say you could use vSphere data store encryption yeah absolutely you could and that would actually give you the benefit of encryption at rest and encryption in flight the disadvantage of using V cert encryption would be that you've just negated the capability for us to do any deduplication any day animations you've lost all the efficiencies that are available on the cluster so the short answer is that you if you have any thought that you might need a encryption at risk by buying US agency our worth self encrypting does yes and as far as I know there's no price difference there it's one line item yeah so it would only be an expert control I believe that's an int I I believe that's it talk to your salesman to make sure it's interested because Kevin did say there were companies that you know we send them non-encrypted drugs that's all we can do yes it's us eights full control yeah there's the export controls on on she's secure technologies and that's one of them now although a new additional node deploy has has completed out the reason we want to have Andean is to talk about what actually happens during that cluster expansion because when you're adding additional capacity to any kind of storage array there's got to be some movement of data it's a fairly intrusive thing and Andy is the man for what goes on on the inside right so let's let's back up just a little bit and talk about how a SolidFire cluster works so typically in a minimal cluster you'll have four nodes and the way that we actually maintain the databases and maintain the data integrity on the cluster is we have a set of nodes that are known as ensemble nodes which are basically the notes that are allowed to vote on what happens to the cluster this is always an odd number of nodes because there's no way that even if you can't allow a tie if you're trying to decide which way things should go so now for node cluster you will have three ensemble nodes as soon as we went to a five node cluster you go take a look at the note this you'll see that all five of them are ensemble nodes so the way that we actually so the maximum we'll ever have is five ensemble nodes and so if you have to look at the list you'll see there's ensemble nodes and then things called cluster nodes which are not voting members of the cluster there's also something undid anote lists it would be called a cluster master so what happens when you add a node like you just did is there's uh you added a management IP address in a sore jaw IP address for that you will never touch those IP addresses again we have what's called a storage virtual IP address and a management virtual IP address the cluster master owns these IP addresses at all times so when you issue an API call or you use the UI you're actually going to be talking to the management for virtual IP address so the way the storage works is I scuzzy does what's called a nice cozy login and so the I scuzzy login is going to always be directed at the storage virtual IP address and the what happens in the cluster master node is that the the ICOC target on that note can it will then say no I want you to temporarily reconnect to a different IP address and that's where with the storage IP address that you used on that note will come in handy so you will never actually have a nice closing session that there's established the storage virtual IP address it will always be established to the storage IP node the storage IP address on the particular node and that's the ask as a station redirection yes a yeah so cool Gazi feature yeah the important part is is is called a temporary redirection so if for any reason we lose connections that that volume that target through that I scuzzy connection ESX or whoever is going to actually issue a nice cousin login back to the storage VIP storage virtual IP address and it will get redirected somewhere else so the reasons this might happen would be that the underlying no it has gone away there did failed for some reason the more common reason is that the underlying volume has actually migrated to a different node and this is exactly what happens when we add a new node to the storage so we we spread the volumes out across our nodes across our cluster by their size of the volumes and by the IO requirements and volumes so we try to make sure that all the volumes are evenly distributed across the nodes in terms of size and in terms of the IEPs that they need so as soon as we added a new node the ensemble nodes are going to start figuring out we need to actually migrate some data from one node to another you never see any of this happen there it's entirely non disruptors there's nothing you have to kick off there's nothing you have to start and any volumes that are in use will get evenly distributed across those nodes looking at both capacity and i/o requirements so that the two I data stores that we've got is a very small sample but it pretty assignment you might have twenty different I scuzzy data stores on there initially that'd be spread across the first four nodes on average for data stores per node right busy there and how large right then silently as I've added the fifth move spread out again and they'll be right and if if you've actually said that one of them has higher i/o requirements and the others that one might actually have a note I'll do itself while others would would um you know gather on the other node so it's also and also in terms of size I mean there's no way that you can actually have a large enough volume to consume all of the space on a particular node but again we will attempt to spread the sides out and part of the reason for this is how we actually handle recovery so if for some reason a node fails or drive fails the whole idea of having a cluster derangement is that all of the nodes work together to bring the system back up to a healthy state and so the way that we actually do our data protection is if anything fails we will have all of the other nodes in the system or all or all the existing nodes in the system work to redistribute that data and have it back and protected as quickly as possible so we don't ever think of ourselves in a degraded state that we think of ourselves in in a state of we actually need to rebuild if something has failed and that's typically very short so if we lose a drive we typically try to recover from that in eight minutes we lose a node we typically try to recover from that in an hour and we we've done the mean time between failure math to trust that our data is protected and then of course in this kind of distributed scan systems your resiliency comes back and then you can have a second failure exactly right pass at a willing you can have exactly exactly and and if for some reason you've had enough failures or you filled your cluster full enough that you don't actually have data protection you'll get a big warning saying you need to add more storage to have your data adequately protected we we normally never want people to get into that state I think Kevin might have actually talked about a IQ which does phone home type reporting so we will also get notification of that if you have it enabled so we'll get a phone call from it from the support team saying hey there's there's a problem that you know right exactly but the the whole idea with the discussed I suppose the redirection is the entire way that we handle scale out our health our you know our Health's self-healing functions our ability to do rolling up raids anytime you need to one of these you can simply have the volume migrated to different node and have the I scuzzy connection redirected to another node and that way we you never lose access to it so if we're doing a rolling upgrade we will evacuate a node all the volumes that were on that node will get distributed to the other nodes we can go in and upgrade that node and as soon as it's healthy again we will say that it's healthy again you we have volumes come in to that and a couple minutes later we'll evacuate the next node and start having started upgrading that node and if people are used to a dual controller environment with one controller owns a particular London has to be trespassed across on to the other controller that's quite intrusive thing and usually involves an interruption to a higher performance yeah and with I mean I scuzzy is designed to not do that so the whole idea with the I scuzzy redirection this is a very fast transaction yeah and completely routine it's business as usual completely routine yes so we it should be several milliseconds of of loss between the lorry and nice cozy blog out a nice cozy login rather than sixty seconds that we sometimes see and yes I more storage or longer yes yes so we went to show off the new node we felt than Unitas completed coming across and we can take a look back in our element manager and well here you actually have the note this you should be able to see the ensemble nodes in the cluster nodes so we're still seeing for and there let me hit the refresh button and get it to reload the data let's just see if I switch to the drives you it's not yet showing me my my fifth storage night there's now reloading the node list now I do have five nodes and we can see all five of them in our ensemble as we saw immediately before we did the upgrade we only had three ensemble nodes and there were two that didn't have cluster nodes cluster robots right actually there was only one that didn't have there was one of those at the time any right and you can actually see the top when it is the cluster master as well so it as as well as being an example mode so there's there's no distinction you can that it can your question master does not need to be an ensemble mode they're two completely different separate functions yes so we have plenty of expanded capacity in there things are all here and available to us yes the way it should be and we could provision more volumes out we have enough storage for what we need for this okay I mean if you want to go through the exercise you can simply go into your vSphere client and say that you want to create a new data store so if I pop back and have some constants yes yes and then yeah configure storage develop actually you would need if you're going through the vCenter plugin you can say that you want to add the data store and that will create the underlying volume as well so we do it here in the element management yes we were just a moment ago actions yeah really restores its own body view and we go ahead and create a data store but it actually takes a couple minutes for it to be discovered because we actually have to create the one and then have to have it go be rescanned on the other side I scuzzy risk and the other story every site it's I need to create a new volume Who am I using it exists you want to create a new volume let's make it a terabyte and QoS policy in here what have I got to do on this policies looks like there's no curious policies on here in this truck oh well you can create a custom policy greater than one that yeah and you can just use that that default policy that is set up for you yeah allowing us a reasonable amount of admin authorization using the access group as the default yes and do I want to do si o si si doing some quote quality of service at the vSphere right in that case you are if you actually turn on si o si on the vSphere side what we actually do on the on the meta PCI storage side with SolidFire side is that we will actually query those s IOC settings and set the QoS on the volume accordingly nice so I can then on an ongoing basis control through the s IOC and have that policy of yet line thank ya and being the the VMware guy I would actually recommend the used virtual volumes in that case instead of actually using Sao see where you can set the individual I ops for each employee to run things evolve exactly yes I will finish out and that new data store will provision up and be available shortly and yeah I mean to pull you off script but well-skilled is always good okay I think we were also going to add another compute node as well let's do that as well it's I said ourselves another compute node so we can do the cluster nodes and we're gonna expand again and then pop that into that same UI yep well it says we're already done that it was there a complete to be hit on there but I log out of that session and close all of these leftover connections and trigger that expand again there we go this time it wants me to log in so it should have lost all the session credentials from the last one and did I type my password wrong no no very good you know it's still showing us that we've had at our ad at our storage node successfully what I really want to do is come back and start again well return to the welcome step there we go are we adding compute notes so the same was it again but this time yes - and in compute nodes which means I have to agree to the License Agreement from VMware again yes yes I agree to the License Agreement it was pesky license agreements vici lab dot like was my V Center I'm in two minds about this not storing all of my credentials for me I quite like that you're not storing my credentials and then it's inconvenient that you're not storing my credentials and I think I must type that so I'll try it again from the start we're gonna add compute node to the existing data center which is the net MHC i1 this tells me we're definitely connected to my visa into server add it to the existing cluster and that's it we go set a password for the user we're time fetch ice words must met which means I need to type them correctly twice in a row it's always fun when you just cut paste and wrist type with the first time yes this is what I like plants with managers that will fill it in for me automatically begin doing that discovery off the order assigned IP addresses go we've got no storage nodes we have to add but we do have a compute node and this one is running at 20 cores rather than the eight that we had previously and twice as much memory scorner yeah those of this fear configuring up AJ on this concerns me because now I've got one node that that it constitutes half of the resources of my cluster I should buy another one of these in order that it only consumes a third of them makes up a third of the resources that's actually very important for the storage nodes as well so you added the storage node there was larger than ones originally you you cannot actually add a storage node that's larger than one-third of the total capacity of your cluster because in just in case you do lose that node that risk is too great exactly now thirty or six seven again just copy and paste out of the spreadsheet that was kindly provided by by the unstoppable Kevin management IP address I now have a new challenge of friend stop Kevin sorry given i citrus so the reason we actually have nice cozy a and a nice cozy beyond the hosts but not on the storage is that we actually use ESX multipathing from the hosts so the ESX host will establish two storage paths from the two different adapters that you've given to the the nodes and be able to multiplex across the pads as their guidance around how I should sit up at multipathing there is guidance it's you should actually set it up for round robin and i'm pretty sure that we actually set this up properly when you install it for the first time and you should set up the number of eye ops be somewhere like between seven and ten right so not one but not one thousand which is the right right so so the whole reason for this is you want a number to definitely be lower than your cue size where you typically find this if your number is larger than the queue size you fill up one queue and it has to drain before they you jump over the other one you don't want it to be one because there actually is a little bit of processing that ESX has to do for a path switch and so you don't want to have your ESX host because anybody I don't say exactly but somewhere in the order of half the q-tip that's gonna be yeah useful value it's I mean it's very easy to see if you actually do a TCP trace of when when you're actually path fresh and do the round-robin versus when you actually have both teams for all right you don't actually want to dive into that deep you know I think that's probably this is done in a dedicated session where we we set up the lab and can generate quite a decent amount of i/o that that's actually a janitorial task maybe we will have janitorial build a life with just Andy and very deeply into the tips of storage analysis and and she fine-tuning which I know is a very deep topic well and II now while we're waiting for this this additional note to run out and since you've shared such a lot of the insights of how this works and some of the fundamentals of how Isaac as he works with this scale out storage systems thank you for joining me alright so we have another question from the gallery here Matt Judson asked can you mix and match se drugs our sed and non sed drives I think I can answer that you have you basically have to turn the encryption off on the drives to use that right that is correct so we can still support it within a cluster but the best practice would be we would try to stay with self encrypting drives so we do you know we put in that practice to you know work with the NetApp team work with your partner to order them and we try to make sure that we are ordering those encrypted drives but if you did have that case where you brought a note back from another country you can certainly intermix it within the cluster alright and our deployment of the third ESXi server is still in progress it's still just sitting up here 6-iron so we're gonna have a look at that active IQ that both you and met Andy mentioned is a really important part of the environment yes so what we did and you know again that that checkbox is part of it do we want to enable active IQ monitoring and reporting and we said yes and you know what actually happens because we're we're in the boulder office is we actually have it set up so that our internal systems are put in staging you know we want we want to you know your own dogfood and you know so we put every internal system in there and automatically what happened it was a challenge right we're trying to figure out where is it looking in the normal customer area for this new cluster we've just deployed exactly and it did show up it was just in the staging environment that we found to us that we got into last night and found our cluster and added it but since then we completely destroyed that cluster and we brought it back into you know we redeployed so what we're gonna actually look at is I have a lab environment I've been using a lot you know we have a few SCS you know my counterparts that use this whether it's for you know helping us learn or using for customer demos so we can look at that one so there's no history on our bus but we could show you the cluster we deployed an hour ago but it's not gonna be very interesting correct this is your cluster then given with but we got three computing five storage nodes so it's the same scale as what we've just got out - yes it is and you know when I meant at the beginning when we look at the vCenter plugin every time you look at it we'll go to the reporting page and within that reporting maybe not every time but when you click the reporting page what it's going to show you there is you know what is the capacity utilization right now what is the performance of the system but it starts capturing right when you bring it in for that long-term historical view we we have our active IQ now you know one of the big factors of our whole system has been automation so what's interesting about this is I have customers whether they want to use grow fauna or another tool to monitor they're they're like how did you do this what metrics are you looking at and you know as we go around it here if you actually click on reporting and click on API collection so this will actually show you exactly where we're pulling our data from so these are the statistics with the collection interval that we're pulling off of your cluster and this relates to the fact that the origins for the underlying storage system hit here is with SolidFire who initially targeted service providers and for a service provider you give them an API and they and integrate it into their automation tools and so you've always been API thirst and these are just the eight nine points that are provided by the SolidFire cluster that is correct so it's there's nothing in here that you can't do you know whether it might be a formula that we've created to to display it differently but we are exposing all those metrics for our customers to consume whether they're using our tools or their own and then make their own meaningful reports and dashboards out of those matrix that is the rest but as we if instead of just looking at automation let's actually look at what the visual interface can provide so we can go back up to dashboard and as you mentioned Allister it shows we have three storage nodes in five compute nodes now so it's it's detecting this and we can well this one's my setup I just forgot because we we are actually expanding our system to three compute nodes in five storage but um if you press show details it will give you a little more information on this and we can see you know what we're running so this is running a version of eleven three we have ESX and it gives us right here what our recenter is you know we know what vCenter were connected to and you know as part of that consistent deployment we're deploying it and and we saw this with yours as well we created that new data center you know a nuclear new cluster within it so we have our net fh CI data center and i also noticed that you've got a much more consistent cluster than we have with three compute nodes that are all h 300s and storage nodes that are also at 300s whereas ours is a little more diverse to show that capability that is correct but actually I might walk over from the hotel this morning I was talking to a colleague that's looking too he has five of our older nodes and he was upgrading the code today to integrate it as well so whether we you know it's deciding are we gonna just deploy a new cluster so that you know if we want to rip this apart and move things around and and do that we can or we can actually just add those right to this cluster that was nice that was something we also talked about with Andy so what else do we what other great information are we getting in yeah yeah sit down just I'm pulling down a little bit it does show you your overall compute utilization so that's showing what we're doing right now shows some capacity and performance metrics but you know maybe we want to go a little more granular so let's click on the underreporting since we already scaled that down we can click on capacity and so from here by default it's showing the last seven days and we can actually see that some workload got moved do it about a week ago because we can you know we actually did a rebuild of this cluster recently we wanted to say you know what we're just gonna scratch this thing rebuild it test the new code out and bring it back up to speed so what we did there is we also have an an on tap system so we brought in the on tap system everything's attached it's another thing when you look at our net fhc eye we are providing these hybrid cloud control services so hybrid cloud control today we have NetApp kubernetes service in preview we're gonna be showing that and shortly we'll be releasing cloud volumes on HCI so that's going to be give you is the easiest council possible to be delivering sips and NFS services back on your same HDI cluster but we also have that full support for integrating with our own tap systems so whether you're integrating it like we did in this cluster where we have it attached so that we you know all of our images are off of that so we have our repositories and where we store all our templates and other things but also for snapmare replication right so before we we before we destroyed this cluster we snapped mirrored everything right on so snake nerd from the the SolidFire origins element OS storage cluster to the on tap platform so completely different lineages of storage platforms yet you're shipping snaps from one to the other and then bring them back here that is correct and you know I have I worked with a lot of global companies and some of it could be global so we're doing that quite a bit right now is you know they're they want that flexibility you're buying HCI is a simplistic platform we're bringing in that F HCI but what if they wanted that data protection to be on a you know our system is all flash yes we may be a cost per gigabyte of storage is relatively high not ridiculously high but there are lower cost per gigabyte of storage locations like potentially is still having some spending hard drives and they on tech platform as the destination for your data protection correct yeah nice so back to this page here what we had is we're showing kind of the used and you know how far you know we've grown a little so this is where every capacity forecasting comes into play and we see here that we have a warning threshold air and total capacity well when Andy was in here he was talking about how you know data protection if we lose a node we're not in a failed state we actually you know we bring it back to full h:a again in under an hour and that's where that error threshold comes in so that's really you know when I'm talking to customers and what to look at it's it's that state once you're past that you can you know that maintaining under air threshold is what where you allow that no debería back to full h a so thence warning threshold is a node failure plus multiple driver failures I can look at that threshold in order for me to have a yeah I'm at the warning threshold and and still get to a situation where danger is at risk I would have to have an a full node failure plus multiple drive values and the surviving nodes and when I'm at that warning threshold well what we've actually done with warning is it's a percentage of under error and you can change your warning threshold so you're getting all of the alerting that we can integrate with but the point there is you know if you're buying cycle is going to take you three months then yes maybe you need to move it up from 3% to 6% or 10% so that you know hey I'm gonna hit error threshold and 10% give yourself enough time to get through the procurement process to bring a new note in find that window that monthly window often times that we're you're going to be allowed to add it into the cluster as well you know I might have just written about exactly this issue for fatigue target that'll be published very shortly but you definitely need your capacity thresholds to include the all of the non-technical time of approval and ordering and delivery yeah and yeah absolutely in this case we only see what eleven days between the warning threshold and the Arats reach all the time we probably in a lab environment can cope with that but probably not in production if we're seeing that centering definitely what else we see in here from this page it's showing black capacity and metadata and that's because we separate those two services out and you know when customers ask me so when do I have to worry about metadata capacity and really this is this comes in because our architecture is always on been provisioning deduplication and compression with zero performance loss for those now if you're getting extremely high efficiency ratios that's where metadata may come in to start filling up we've had it where we can you know define more metadata if we needed to we could add that second node in so we've seen that I've seen that in the past but it's it doesn't come off right and this is that you you dedicate physical storage device for metadata and the architecture don't you yes we do and so then you would just be allocating another physical storage device in the node to provide more metadata capacity correct because of that scale and scale out architecture we could remove a data drive and add it back into that pool if it was ever needed and that again means that there isn't a situation where I'm out of one resource and have to buy a complete new physical node in order to get more of that one resource when all of my other resources are underutilized that it's correct so then let's move into efficiency because efficiencies are always something that comes up what kind of efficiency in it am I gonna get and you know this shows these trends over time we went up we went down and that could be thin provisioning but what do we see here well the only thing some things going on around India stuff of those snaps is you're restoring Bebek yeah when you create a Coster we saw it here right we created this big cloud this cluster we got I think Africa the default is I think it's 2 2 terabyte data stores that's what we got provisioned and we have a 200 gig 400 gig VM in there yeah so when you look at that the rest of that space has been provisioned yeah and so we have we start exposing those big numbers just because we're not actually consuming that space yet so as we start consuming we see those numbers kind of level off and come down but what we have here is you're showing those you know the average d dupe and compression ratio on this labquest at 3.3 acts and NetApp does have efficiency guarantees so in you know for our customers - if they are interested we can always have that so those guarantees will be around the raw storage versus the use of all capacity and if I've got a particularly odd data type maybe I'm dealing with mass of non compressible datasets then there might be a place where that would have to provision the additional storage to be able to store the amount of capacity that that was guaranteed that those kinds of guarantees as a useful to customers who have very weird data use case but they're usually not a lot of risk for me that because most data types can crease and do really nicely yes definitely basically if you're only relying on a three maybe four times one question that's coming from the from the chat here from jose oda as about that and SED drives do they miss out on any is there any difference on the storage of efficiencies between the driver Krypton and non-encrypted drive because we're using self encrypting drives all the encryption is done on the hardware level so there it has no effect and what we're doing in the element software level so it does not affect your efficiency ratios at all and this is the differentiation between using the the encryption at the vSphere layer with the data arriving at the element OS has already encrypted and therefore it's hard to compress into egypt versus using the self encrypting drives that are on the other end of the data efficiency right down at the physical storage yes exactly and the other thing with these efficiencies is they can continue to scale we have its global efficiency so if you start with four nodes you go to the 40 node cluster the deduplication isn't tied off to different drives or nodes or anything else as global access all the data set the other interesting interplay between encrypting at risk and deduplicated systems is that we're usually trying to protect against somebody stealing that hard drive and getting out our offer what's its deduplicated compressed data that's on there you have to work out how to rehydrate and uncompress the data and the actual data Laird is not very very clear so self encryption at rest is definitely for the paranoid rather than for the mainstream although if there's no cost penalty why would you not write self encrypting drivers if you're allowed of them yes and the fact that the metadata is somewhere else yes we have that dedicated as well making it even you know can't do much with the data at that point which makes it also you you don't have that penalty as we move things around so Andy talked about this and how we just dynamically move the blocks and that doesn't just you know so that happens as we scale the cluster out it just moves the existing blocks around and rebalance as a cluster and that's that would have been going on in the background up we had it on or additional storage night and it actually even more than that so we you know part of our garbage collection and all of this is automatic you don't you can't change you're not going to tweak that or change it but what we do is we optimize the data layout so as you know there's always going to be in storage there's going to be blank spots that get created as data is written and deleted well what we do is part of our whole process there's we bring be live blocks back in and just keep writing it out so essentially what you come out with is a clean you know what even if you're twenty percent full or 90% full we're still just and we're not trying to find the holes that's really crucial as you're using a educated and compressed environment as it gets very full you can get into a situation where you have significant performance penalties from some of the assumptions that you made about that the media characteristics yes this so we fleshed them with with hard drives hard drives were terrible for it yeah definitely and then other things to look at on this page is we'll go down to performance and so the performance on this is down to a you know kind of what we're seeing in the utilization of the cluster I opsin throughput so again we can always highlight over an area the other interesting thing is that you know underneath that line we see one of those spikes so if you go to scroll your mouse down a little further there's no further down on this one well what I like right in this little bar if you highlight over that area yeah actually zoom right in so you don't have to mess with what time you want to look at you can just do it very easily and I can drag the buzz and yes we're just looking at a pointless aim and we can you know change between that to make it right I've got a timescale at the bottom and a detail feel about let's let respect that as I get respect that as I go between the different views as well that's nice then as we go down here it's let's say we you know we're running into it and you know you're hearing back and we don't report the volume name within here but if we go down to the volume level so this is under according so you know we kind of go down to volumes next and being a lab environment we really don't have a workload going today then we can actually get that you know we see the quality of service bands that are configured for each each of these but we can the far right you could actually go to actions and click the three dots on one of your volumes and you can go into details so this is if you if you needed to and this is on the storage layer but you know if you wanted to look at what it was exactly going on what is the latency that you're seeing on that particular volume you have all of those details right here all of the things that your storage administrator traditionally is looking at as they're helping you troubleshoot issues with your replications correct other things I like to look at is um you know as we talk about quality of service and if you go back to the active volumes tab you know we have some different quality of services configured we can see 20,000 bursts and and these different maxes and customers often say well what do I need to configure how do I need to consume this and you know it and if they don't you know whether it's service tiers especially you mentioned service providers and coming out of there they can do cost based service tiers based on quality of service bands so you could just do a very simple gold silver bronze kind of tiers of service and and have just three policies that might be apply yes but in our system there we we do everything in the minimum so instead of like trying to limit your quality of service we really want to guarantee kind of hit that minimum threshold for you but what often happens and you know for talking to a remote site maybe they don't know those details and wherever you start we actually brought in right into active IQ no upgrade because this is you know our interface just came in one day you know and so you have QoS recommendations so that is actually looking at we can click on that I'm not sure that we'll see some so on the left hand side towards the bottom is curator and Asian's and we are starting to look at so there's no maximum recommendations right now and it's saying look at this volume three has a thousand but this is telling me as those volumes aren't even being used means a hundred percent of the time is below the minimum so that the minimum is the minimum IELTS that our guarantees know yeah you can't have a minimum that's actually going to be delivered it's the minute this is a guaranteed amount of ions in in VMware's CPU that the same kind of measure as the reservation is the guarantee that it's going to be available and I doesn't work the same way yeah but here it's telling us here this is a lab you're not doing anything with these like this equipment yes yeah and you know what yeah and where you men right you know you're talking about reservation and that you're actually physically reserving that space out within VMware and you know we've all probably run into those where you try to add a new VM and you're like the clusters not using it now with us what we're doing is you know we have a minimum we have maximum and burst and I have this customer you know a big you know another big customer of ours a networking company that I've been working with and they did everything and guaranteed on the minimum level back to their end users and what we were talking about is how they could better utilize it so what we actually recommend and what a lot of our service providers are doing is you're taking it you know you're guaranteeing that minimum level but that's not your SLA because we don't want to physically reserve like in that VMware example that if we 200,000 IAP sin that minimum cluster and we said everything at ten thousand what happens after that you get 20 and then the next one can't be created don't be allocated back so yes so what happens in ours is let's say you know you know everything was open right so you add that ten thousand minimum and two hundred thousand max and burst well what would happen is if there's pressure on the cluster so your cluster is running it that hundred percent utilization it's going to start bringing all those volumes down from where they're running so if there's a volume that's running that panda it's running at forty thousand I ops and it's got that ten thousand minimum it's gonna just kind of scale that would be ease back down so that everybody gets the ten thousand exactly but in this case we have a bunch of volumes that are not even being used the other classes you know the other volumes can use their I ops right because that is it's drawn from the the single large pool of guaranteed high ops and you can only draw from it so much recommendation here is just easing back for the ones that would be light yes if this is a real environment then something with a high IO critical load you would bring that the minimum that up on that load just to guarantee that service level by the minimum yes exactly I try to find those critical workloads have that have the highest minimum and then you know to really make sure they're always have that performance the other thing is Andy talked again about the data layout and how we do this well the data is laid out in the most optimal stance up front everything is spread across every drive the data drives their data when do you adjust this minimum it does not have to do any data data move is no moving from tier to tier because you don't have tiers yeah it's it's almost like we're you know we're limiting more than we're you know adding so if you go and change that minimum from 1,000 to 10,000 it has that performance now nice the last thing we could look at here before we can check on our hair and axilla yep would be the virtual machines it's what we're doing there is we're bringing in the virtual machine list we're bringing in VMware alarms so we can kind of see what is going on which you know servers are on and off and you know easily do filters so I see a couple off so you could even click that plus and I think it's powered off would take that off so that actually did the opposite so it was already filtered to fund each other's pallet old machines actually shows off here so if you click on powered on that's what I meant to do and now you can just see the list of power down VMs we can see the number of sets of vSphere data sets coming in and the other thing too is if customers want to go even more granular we integrate our HCI right with our cloud insights so as part of NetApp's whole data fabric vision to be fully connected and allowed that monitoring we can bring that up HCI right into there and then you have that full granularity down to you know from the hypervisor down to which disk which data store that is on point and click right through it and there was an awesome demonstration of carbon science at tikal day 19 when i was there and it was a San Jose a couple of months ago absolutely awesome demonstrations as was checking out that tech field day video if you're interested in carbon science or more let's take a look now and and see our newly added note annually at a compute though because it's now uh successfully being deployed some guys out that guy and what we should see coming back into our hosts and clusters view is that yes we've now got a third ESX server in our cluster and of course it has that same CV and which I'm gonna suppress so that we actually see it and if there are any actual euros going on in there so now we have our three notes in a compute cluster we have five nodes in our storage cluster lots of capacity lots of space to run something so why don't we work Logan here all right sounds great the workload we were choosing was the yep Tsukuba newsy service and moving on from traditional on-premises vSphere infrastructure to something a little more cloudy since keen on cloudy things would I go to get started with the need up community service so what we need to go now is if we still have that other page open which we may which one would you like this close out that consult don't need that remote console anymore where we had the deployment engine you know I'll close that okay so what we are gonna do here and this is off of memory but if we don't have it let's go into the deployment guy because it hasn't already happened documented there let's go into this is the M node IP so the management node one six 8.1 one 29.9 yep / I thought we got a many jitters I think we might have been there before okay there was no capital it's my preference for there we go that's the administrator and years off this time I type the password correctly okay so this was the other interface that we could have went to to expand that cluster then minor cluster would have actually launched you right into that active IQ so that's the model to get to active Ike you don't need a expand takes us to the same UI that the plug-in took us to for it expanding and we've already looked at the upgrade the fact that we're already fully upgraded at deploy at the moment yeah and when let's click on and what we're gonna do here is enable hybrid cloud services sounds fun and what will happen here though is that that first button that showed upgrade the install is what we have right now is we have medic kubernetes service in a live preview mode so our customers can go try this out right on their net FH CI cluster to be delivering the same services that we deliver the NetApp kubernetes service that's offered between the different hyper scalars to also bring it into your on-premises net fhc I so I can get committees as a service from public cloud this is know giving me communities as a service not a science project on premises and that correct and you know we've designed this there was a commented chat you should never and one of the guys said you should never be deploying you know basic uber net easily rolling your own communities as John and that suggests that Kelsey Hightower's blog posts about kubernetes the hard way is letting us know that there is an easy way now to is that time trail or would you have it from day one today so what's what's the prereqs before I can actually enable this unlike the HCI cluster yeah so one of the things is when you when we have to create an account and NK s net F dot IO so we'll go and create an account there and when you create an account you get a 30-day trial and so that that's what we can use then to deliver these services and test it out so but is there a minimum version of the of the new type HCI platform that I need to be on presumably I'd I would need to be updated to something fairly recent for this to be enabled yes we would need to upgrade it we need to update the code now with the release that we used here we saw that when we clicked upgrade button it was already done now if you're on an older version we'd bring that newer M node up to speed or up to the latest version and then you have to click that upgrade services and that would bring the latest service bits into the cluster now once that's there we can enable it when we come out with these others is that you have to see here cloud volumes and cloud insights right on that up HCI that's going to be that's where that upgrade button is going to come into play with our system that we have because it's already upgrade will continue on and build ourselves and they have too many servers here okay so this is that question you just came up with is where do we go so it's got a we can click that link there let's just open it in a new tab if it does that we pre provisioned in the town that's never been used some tell me if I got that wrong now we'll find out with our time that's collection Korea it looks like a hit so now we are in our NetApp kubernetes service um what we need to do first is create a token so we're gonna go to our profile here yeah profile preferences credentials this API tokens no tokens found Edie token just please one token token for HC I and when you create this token you do want to store it it's only showed here temporarily and then you won't be able to grab that again grab that placed it back into the one place that it's needed and emit and so now my on-premises magnet app management note is going out to talk to the web service and registering and establishing it that relationship that is cracked and so frosty sound that looks like a very container name two words joined together yes and that comes from your profile so it looked at your profile that we just created and we're back in there we'll see that frosty sound and we can just make our name somewhere look well yeah she's a word there that button or the kubernetes right on the top it says I could have another organization in here alright so I'm going to say that we're currently in Boulder of the frosty sound organization and we want to be in a native API data center and cluster there is actually one thing did we need to build out those poor groups cuz we were gonna have some dedicated point groups you know it's a good thing that you're here with me saving me from getting too far in front because of course we do need some networking for the Humanities elements remind us again what would the three networking's were needed we're so what we're doing here is we we have all these networks that are shown that are created by default and for kubernetes best practice set up on HCI make sure you want to create three new networks so we're gonna create a management network so this is going to be for that management side we're gonna create a user network for the different containers to talk and then a data network so that it can talk directly to our element storage and I've pre created the VLANs behind and sit up DHCP on these the subnets that are underneath these VLANs and set up routing so that the workload and management networks can both get to the internet to talk to the management service but the storage network that I'm just adding is the final one that for for nks that has DHCP on it but it doesn't apparently need routing out to the internet because this is our normal storage network this is the same storage network that the six servers are consuming the storage cluster over correct so I really don't want a route from the edge of the Internet that's a VLANs 17 right now I have my three three networks and the the standard Vyatta rounded that I use in the lab has these three networks connected to it go back in here we go to continue on this one and it's good thing that we created those before here though I'm going to have to refresh that list there we go this is the management one workload network I'll refresh that was too then actually reflow it clears the whole lot doesn't it that's right and then storage so here's the things we're going to do I'm going to build out you have kubernetes service make it available or the Bolden region put the vCenter and use these existing three port groups enable services in it says it'll take some time so the enable has running at one eighty five percent now so it's getting close to finished the thing is that this is deploying infrastructure on top of our vSphere environment so I'm going to pop back across to the lease intercept and what we should see here in the host in clusters view and I'll close up the cluster I brought with me is the beginnings of the actual new deployment so there's that looks even better if I go into the Ames for you because there's a folder containing all of the MIT kubernetes stuff and so there's a bunch of virtual machines that have been deployed which is really cool yes definitely so what we see there is that we have a master server that we're creating so there's going to be that kubernetes master services and then we have three worker nodes within the pool that we're starting with and you know that's where we are with this builds we're building this out and getting it all integrated and one of the other pieces that we've done and we talked about our kubernetes services at being upstream kubernetes so we're not developing off a fork or anything else and you know one of the one of the auto manufacturers I've worked with is it was exactly a challenge that they looked at when they were looking at some of the other services that are out there their developers started actually in Amazon and they saw the challenges with you know the master service being locked down and then when they started looking at things they wanted to bring the actual production workload into their on-premises environment and so they were doing community-based development and public cloud for production deployment on-premises it started in public cloud because of resource constraints you know how long it took IT to long to deliver services on premises I'd certainly not want to be the IT team on premises building kubernetes as a service exactly but what they actually did is they purchased out of agency I so they started out met fhc I using a another platform for kubernetes are using okd and now they're at the stage that they want theirs ready to evaluate our meta rnks service on HCI so that they can simplify further what I mean by that is they're not having to with this service we see what we've just built out the VMS right here by default what if you need to upgrade the service you know and what goes into that when we have and when kubernetes release was released and what we've done with our net of kubernetes services it's one-click so we click one click upgrade to the next version so we're we're very easy to upgrade and scale so we'll actually I think it's a good idea to scale this out even when we finish this deployment all right but this is the the infrastructure component so if we pop back into the wizard it says your cloud service has been enabled congratulations and now we have one-on-one services has been enabled out where do we go to manage the sku bonitas environment so from here we'll actually go to that same interface you went to MKS net mal because now we're actually working from cloud services so when your customer deploys this and you know when you guys deploy this we can you can pick where you want to do it so that example that we just talked about let's go to i'm control plane and I'll kind of explain where I'm going with this but we aren't tied to just met up HCI now so when you click on ADD cluster what you'll see different and if we launch this before and it kind of just popped up really quick was it brought in that up HCI to the top so once you enabled that service it brought it in but you also still have these other services so if you wanted to play kubernetes on AWS just the same upstream version that we're using with net up kubernetes service we can man so we can manage all of your cloud-based kubernetes deployments as well as your on-premises communities deployments right from the same platform same interface right here so we've got a very verse of places that I could run NIT activities and just manage it all from this one console yes and so that portability of using kubernetes and dr containers underneath is truly here in terms of being able to take the application and out lift it from one location to another correct and then as we are you know we are in preview mode as it shows for it nat fhc i but we'll be bringing in home charts we're bringing in sto making this all very easy to consume and then the ability for you to bring in your own charts so bring in you know what we'll deploy is a sample sample application but we could actually brought in a chart here and you'll be able to in when when we come out with this in production you'll be able to use those applications in your charts to be automatically click and deployed on your cluster step me back back one moment because not all of our audience is familiar with helm what is what am I getting out of having chance beyond what we're going to show you here with a young wolf on deployment JJ yeah so with helm I will be able to much easier package and deploy applications and bring in your application so we have a list of trusted chart applications already so that could be what we're what we've been doing a lot of work on is bringing in these services together so typically you wouldn't see elasticsearch by itself so we bring it in elasticsearch associated with cabana and one other I would yes thank you and so we bring all three of those services into a packaged chart and so it's a collection of micro services that are brought together to deliver a full application correct nice but at the moment we're waiting for that to be added to this particular deployment on the HDI that is true yes if we click down like AWS for instance we'd see some of those charts in there we don't have you know just like we had to associate the API collection what's great about this service to those we get the exact instructions where to go so you know when I first started with this I think within a week of us in acquiring the company I went right into here and you know I'd never I didn't have my AWS credentials created or external access and that and it was literally just watch walk right through the back although the bouncing ball yeah it stretches so as we're deploying here on netapp HCI what am i doing I've got one master being created two workers being created in my boulder because it's the only location I have default workspace SSH key pair we're just going it and make that for me please okay and so this now you know just like we had frosty now we have raspbian you could change it or even click that little circle change my glitter or what I want to show soon you either yeah delicate sky I'd like an approval it might not be an elastic one but if there's a delicate one was been a little delicate out here with the hail and everything we've had this week we had two days in a row of really brief really violent storms throwing down pea-sized hail and that they coincided always with win finish of work as we were hitting hitting out back to the car so it's been interesting being it's mostly been beautiful blue skies here in Boulder just these little narrow storms have struck us a couple of days this week feels a bit like being at home because the weather changes so fast yes so we've got our Cuban eighties version that we're in here we were going to install the dashboard to go with a submit we can going on some underlying yeah defaults it's good when you can see it cd4 key values as well so what we saw when we were in vSphere is we built out a service cluster up front so that's based on the services that are running now so now what we're actually doing is we're building out that end user cluster that we can start deploying different container based applications offer and so we can see the master node spinning up now and that new committees cluster presumably if we come in here with a state here as deploying at the moment as we drill into the cluster we can see that state pool and see how to work the nodes and our master node all being deployed out nasan and once this is to point out we will have the ability to expand it as well so we can expand out the worker node pool or even shrink it you did it seems that we didn't make enough of a offering to the demo gods and that although we've had much success today we have struck a little challenge as lovely delicate skypod was being deployed out and you knew virtual machines were being deployed we experienced a failure on that brand-new brand-new to us node that we added to the cluster given what was the history on that note yeah so this node well what we where we started is we you know we have their rack and so what we did is we brought in a node and we were looking around how could we should M instructor add and so we actually got a node like you were asking where did it come from there's actually a returned unit from a customer site that was in extreme heat conditions now we were able to still rebuild it but you know right when we put a work load onto it it actually stopped responding so one of the things that you don't see as we do this event is that there's usually somebody who is running around scrounging equipment for us to bring together all the parts that we need with the peculiarities we want to do the cluster expansion were the different yes they're significantly different one so Doug has been tireless in his chasing around finding the things we asked for and here is clearly pulled off the wrong disc I know that was not really to be deployed into production so the consequences that we've had a node failure we've made sure it's not going to come back and fail again and that also happened in the middle of the deploy of our delicate sky cluster and it seems that our delegates cluster is not very happy and so we're finding cloud native methodologies rather than trying to heal the sick we're going to build something new so we have also whilst an Ingo and Keith were having a good chat we've deployed a new kubernetes cluster here on the inkay is called damp fog exactly the same configuration but it didn't experience a failure of the ESX server than the Lord's deployment it's not so delicate it's delicate and of course being here in Boulder we don't experience the damp fog that you get in downtown San Francisco so here we've got our cluster it's been deployed out we've got some IP addresses these 172 19.2 that's on the in KS workload network that I edit and so these are valid IP addresses inside of our network that each of these kubernetes hosts has so we've got the worker node IDs as well as the master IP here so what else do we get in here you know one of the things is you know you can't start deploying new workloads on kubernetes and we might run out of resources how do we add to this and make it simple for our customers so if we do that here what we'd be doing is we'll add another worker so we can add additional worker nodes right to the cluster so what we have here is we have our default worker pool you can create you know new pools but we'll just keep it on this and we can pick the number of nodes that we want to add we've got two nodes of the moment let's have a third one so we've put on an odd-numbered cluster size which always feels more comfortable so what this will do is now it's going to push that out right under the same our same VMware cluster it's going to bring out another worker node right into this to deploy additional pods so you know I can't even see where its deploy because come up so fast I was hoping that we'd see a a powered off machine being created but it's already powered on and starting up it must be coming up as we go cool what else can we see well that's coming up you know another thing we can do is launch the kubernetes dashboard so we do bring in the dashboard died by default which when we are doing our building out a new cluster so this is you know as I mentioned its upstream kubernetes so there's this is the same dashboard you would have but without the painful deployment process all of that science project to building and deploying and again you're just providing the web service that's the encased on it at that IO is just providing me a redirector to get back to my own premises kubernetes cluster here or kubernetes cluster wherever else I deployed it within things correct so what we can do with this is we can actually deploy an application so we could home we can look at that am I gonna do that from you so where we would go from here is on the top right is a plus button so we will go to click on that and we can input from a file or you know just paste some yeah Mille so what we what we are providing here is just a basic ghost application it hasn't been edited it's just but we're doing that with persistent storage so we're going to create a persistent volume claim against the pod because as containers are ephemeral we want to you know as we start writing our blog posts we don't want any sort of failure of a node to bring down pods or anything else or just a restart of a node yeah yeah not for a container to be restarted would top it traditionally be returned to us a mutable state yes and so this takes a couple minutes so we'll wait for this to show up you know we have one more pre-recorded video that we can pop them while we wait for it I mean it's it's gonna take a minute or two just start up so well now our application has been deployed out the containers have been spun up we can see the deployment is successful and here we've got our pods all all running which is nice of course this is the container we've gotten here is a webserver and blogging platform so I guess we need to connect to that if we're doing this in production we'd have a load balancer in the front one way we'd probably have multiple of the web service and a load balancer that brings the salmon to a friendly port that is correct but even in this what we've done with nks is you can go to any of the nodes to connect into these pod applications so what we'll we'll go look at is let's find one of the IPS and you know because we don't have the load balancer in the DNS setup for this part we'll grab that IP then we'll also grab the service port that we're using and that's all just to find in the ammo file of the application and how we we are using that one of the ways we can get that IP is to go back here into the damp fog cluster and just grab it off just so you have it finished we saw that after it note is out so we'll go pop the IP address in there and then we need a port number and was to somewhere down here in the details of pods and replicas sits and services there we go so we've got a ghost service which is the blogging platform we've got in here and it's on 431 787 so node port there we go hooray we've got our web service running certain there and this is this redirection of a standard port up to a high number porters standard docker container could pop networking kind of behavior so it should be familiar to people of building applications on top of this correct and so this is really just a web application now that just it was reasonably pretty sitting here inside our container and if we wanted to customize it at all we could create an account by going to you know adding the URL to slash ghost so the first part here it's just gonna ask us to create an account so this is to be clear not a neat product in any way it's just a very simple demonstration that we can here yes just an open source blogging application that's ten digits there eight they look good well you can at the bottom you can actually skip this later there's nobody else working with me I don't plan to have this live too long but we do see here you know on the top left we can see that it says build a live and we do have that so if I say welcome to folder too many people watching me time so let's publish it top right we'll just publish this page we can do it now sure and that is in actually click on View Post there's a pop-up that disappeared okay there we go so we can even just save this so what I'd like to do now is we have this running to the front page you wouldn't share this there's no very good perfect learning new plot so one of the things with kubernetes and it really depends on how you deploy your application but this application was deployed as a replica set so if we we can look at that from within damp fog or actually from this kubernetes dashboard we see it right here that we do have a replica set and we have pods of one of one and that's based on how we have it set up but and the way kubernetes will then work and you know if you're especially if you're a doctor user and you haven't used replica sets and yeah simple application you kill it it's gone with replica sets it's going to just rebuild it again because it knows that it's requesting at least one to run and if it sees none it's going to let's listen up so let's go we'll go up to pods right here and we should be able to click those buttons on the right and just say delete and we can see there that age hasn't been updated because we you know just haven't refreshed the page but you know we'll delete that and then let's actually okay so this is refreshing for us so we see the deployment down and you know with the replica set design what we're seeing now is that it's going to be rebuilding it so it's gonna build a new pod but it's going to use the persistent storage that's being provided by the native platform correct because it might not even be on the same node anymore Berndt we actually in this is not refresh on by default so we'll just refresh that and see where we're at see where odds are odds that's been running I have 40 seconds let's see if the services of matching will started still refresh on exactly the same URL and there we go it's come back again it's still yeah we can open the page project I dad have persisted through the destruction of the pod yes it did nice so that's the end of the content we want you to look at here what have we done today so if we kind of recap what we've done we started out with a cluster that actually was you know my mistake but we which is the most realistic you can get because when you plug into a customer environment there's not going to be a preset temporary IP so what we did there is we set that temporary IP just to get into a cluster and then we built it out right from our net app deployment engine entering simple IP ranges it built out the full cluster of building out a storage node and that had the full net fh CI deployment from there we expanded the cluster so we showed that ability that we can not only expand but we can extract as well not you know not typically how routine and activity to to take the notes out correct but then you know as you've seen our net app HCI is really transformed into hybrid cloud to be able to deliver on premises based services and have that same consistent services whether you're deploying on-premises or within the cloud and what we really did I toured this end is we demonstrated what services we have available for that with our net app kubernetes service nice and so we're building out that whole hybrid cloud idea that we're getting more cloud like services on-premises you're caring much less about the underlying infrastructure that it's that that infrastructures always has to be there whether it's yours or whether you're renting it from in public now but the you want to kill less and less about the mine you detail at that underlying hardware but make sure you have enough of it and that it's it's a good quality but you're caring more about the services that you lie on top of it but I think is that transition we're seeing from the IT infrastructure professional caring about all the bells and whistles and the cables and making sure the IP addresses are matched up and looking pretty towards more of a higher level services delivering the infrastructure services that in for kubernetes they can be consumed by your developer community delivering more of those those business alright or more there's still infrastructure services but they're closer to what the businesses is asking for I think that's a really nice transition to be seeing you yes and the ability to be agile with this being able to even showed expanding that kubernetes cluster we've shown all of these different expansions so that as the business needs change we're not having all these you know we're giving that self-service portal to deploy these these without adding additional resources targets and that the system just balances itself back out it's a really nice set of capabilities and I'm really impressed with what we've been able to show particularly I mean kubernetes is notoriously a science project to deploy out yet you know we are it took us 45 minutes elapsed time to deploy kubernetes and then deploy out an actual application on top of the kubernetes cluster so given you've been working really hard all week so thank you very much for being here of being such a really useful resource that also for all of the long hours that you've put in as we've gone through multiple iterations of building out of that environment so thank you Kevin and thank you also to my team Jeff behind the camera during his awesome job of the switching and coordinating and making us all look good also to Tracy who's been running all the social media for us and leaf quite a lot of you of who heard about this event throughout that social media stay tuned for more build a live events Tracy will continue to be doing an awesome job naturally there has been an army of people helping us here at nature I dag has been our primary person running point for us on on-site making sure we have the things that we need my first contact was with Tim audience oh thank you Tim could everything you've done for this dude brings to hear also auntie bender was sitting here beside me he's been a constant back and forth making sure we had what we needed and that everything was going to be shown well thanks also to Michael white for his contributions and helping me along the way Diane who rushed in here to make sure that despite having had an old family we would still have a functional kubernetes cluster thank you and also thank you Kevin okay that's awesome for helping us through finding out what was wrong with my infrastructure the first time we wanted to deploy the net community service also with us of course you saw the video from Keith Townsend CTO advisor it's been great hanging out with him and with Melissa this week and do check out the CTO advisor YouTube channel lots more great video content on there Jeff we've got some closing thoughts as well oh yeah so I also like to thank everybody on the chat that is asked the questions and that has joined on we were on YouTube we're on twitch we were on Facebook we were on Twitter so tell us where you like to see them build a live of course build a live.com forward slash net app for this this look this build day and check out all the other build days and of course if you want to have your own build day let us know and we'll go from there and thanks to Elster let's give them a hand this has been really fun it's really nice people doing great work in a fabulous location so stay tuned for more great build a live and we will see you on the intertubes soon you

Info

Channel: vBrownBag

Views: 1,260

Rating: 5 out of 5

Keywords: #vBrownBag, vbrownbag Enterprise IT Education, netapp hci, data management, converged infrastructure, hyperconverged infrastructure, build day live, netapp kubernetes, netapp k8s, how to install kubernetes, how to install netapp, enterprise it, data center installation, netapp kubernetes cluster, alastair cooke, jeffrey powers, geekazine, kevin papreck, boulder co

Id: Y0ZUu92xwf8

Channel Id: undefined

Length: 153min 27sec (9207 seconds)

Published: Fri Sep 13 2019