Dell EMC SC Series Demonstration with Jason Boche

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is the lab environment that we're that we're going to look at and so we wanted to save some time and I think we still have some time to go into a live lab environment to take a closer look at some of the Federation capabilities and Federation features that Kaushik talked about there's obviously been a lot of interest in that hopefully Howard comes back and he's able to see some of this but so mainly what we've got teed up is a demo on live migrate with volume advisors so the seamless data ability piece the you know there's several different use cases that we can tie to that and also live volume with automatic failover so that's the stretch storage high availability data protection automatic failover that kind of things so we've got we've got two demos to show this is just kind of a lay of the land of the lab and what you're gonna see in DSM that's our frederick management tool we've got for se series arrays got to all flash SCP 30 20s that's SC 56 and 58 as well as two SC eight thousands with a mix of spinning 207 K and 10 K drives so and also in the lab we've got a couple of applications with various importance you can see that VM won the workload is Iometer so we can see real-time what's going on what's the experience for that application where the end users are going to be using it VM one we'll use for as a mission critical application for the livelong automatic failover Derby's for metro storage cluster piece and then vm 2 is also another application that we're going to use for volume advisor and live migrate okay so I'm going to assume the role of the storage admin and this is Bob's IT shop Bob sitting in the back he's actually the application owner so we kind of designed this to be a little bit interactive but bobble bobble owns the applications he's on the hook for the SLA with the business units so he's he set and defined some of the requirements for these applications he'll convey those to me we'll use the Federation capabilities within DSM to set up policies to make sure that were you'd hear to the SLA this guy so without further ado let's go into the lab and I need to take a seat cause I gotta drive this what we see here is this is my vSphere environment in the background these are again these are the applications that we're gonna work with your virtual machines running in a stretch cluster that have been presented volumes from our su 30 20 all flash arrays we're gonna primarily be working within our federated management tools so this is Dell storage manager DSM for those that are not familiar with it and you can see our storage centers in the Federation here here's the to SCA thousands with a with a mix of spinning disk and then our SCV 30 20 all flash arrays that's SC 56 and SC 58 so the first thing we need to do is find out from the from the application owner you know what what kind of policies do we need for this application in terms of you know latency could you show us the options out here yeah sure yep so we'll go to threshold alerts now volume advisor is based on threshold alerts those are not new we've got many many different types of thresholds that we can set and the system volume advisor works off for different types of thresholds so we talked Kaushik talked about them front end I ops on the controller it's CPU utilization on the controller's capacity or consumption of raw storage in the page pool on the systems and then last but not least latency on the volumes okay so if we look at threshold alerts and go to go ahead and create a definition here and we can see that depending on the type of object that we select we're going to get different options for the type of threshold that we create another again there's many different types of thresholds that we can create here alerts that can be generated but for the purposes of volume advisor we want to go ahead and and choose I think we were going to talk about a latency yeah and what so doesn't really latency and what I mean what what are we talking about wait and see what is yeah so Bob as a dramatic things during the Black Friday we need to sell a lot of stuff so he wants to set up a track show for his application I think he's ten nearly second we said yeah well this is not flash away we can probably do a little bit better than that so let's let's call it I don't know if I can give you I can probably guarantee you five milliseconds yeah these are is so sold it will define a threshold alert for volume advisor that's going to monitor if the the latency on a per volume basis goes up above five milliseconds okay now we want it we want to be alerted via email if that threshold is exceeded and also from a volume advisor perspective we want volume advisor to you know that layer of intelligence to let us know is there any other storage centers within the Federation that you could recommend that this would be a good placement to migrate this workload to so we're gonna check the box do you recommend storage center and that's it all right now that's the threshold now we got to apply it to the different types of objects within the system what do we want to actually monitor do we want to monitor all volumes on the storage center a specific individual volume or perhaps a group of vlogs within a fuller we'll just watch them all I've got other customers that have similar SaaS and I would want to be notified on an off flash array for exceeding five milliseconds on any of those logs so I'm just gonna choose them all and this application right now is running on IC 56 okay all right so our threshold has been created if we look at the application right now and so the intentionally seeks what do you put secret six milliseconds so at driver under we're okay that's that's my SLA with Bob if it goes above five then it's going to fire an alert and volume advisor should make a record those CC Klopas what is a Black Friday alright and we set it up for all volumes on that particular array yeah right so this is our application you can see right now we're sitting about 11.1 milliseconds about 30,000 I ops so now we need we need to do is Howard you are asking about this QLS yes all right so I'm gonna go back to storage and I need to introduce some latency through this application and so I've got a QoS policy or Howard you might have wanted to look at this so I'm calling this my latency generator what I'm effectively gonna do here is limit the if' to this volume to 600 which is gonna drive up artificially drive up latency to this volume okay for the purposes of this demo I normally would not but it's only Bob and it's only Bob all right so that that debtors are that are that's our cue us profile what we're gonna do is going to apply that to the volume where this application lives okay so we've got this s FD LM volume click okay if I'm real quick we can see the impact that that's going to have on this volume we can see I ops has dropped significantly by about twenty nine thousand five hundred pressure latency has gone up to about 53 milliseconds okay so right now volume advisor and actually the data collector is monitoring statistics and values and so forth from each of the arrays in the Federation on about a one-minute interval okay so and I saw that you can put also a folder like you can name volume that you want to apply that us as well yeah so you know it it mentioned folders so obviously within storage center within DSM we can create you know for housekeeping purposes we can place all of different volumes and so forth in different folders and so we can apply these thresholds to you know a particular folder that contains the volume so or volumes so maybe consider if I applied it if I said to a folder 600 I ops is that total or per volume I think that so I haven't tried it I think that the way that that's to work is gonna it's gonna ply that value to each volume in the folders that way I would think that would work that that's kind of that's the way I would normally want it to work okay how's this translate into vivo support you is QoS visible and spbm visible and SP storage policy based management I can visualize that the five policies that we support encryption snapshot raid type replication I can look I don't remember off the top of my head if that's the fifth one or not okay there's you know clearly QoS you're doing we do other vehicles as bread yeah but you've all V Vols 1.0 velocity will support oh but you don't give us a three point oh yeah right yeah Howard excuse me um um sorry Jason we've got a big holiday coming up you're below the threshold I need help I got an alert on my phone I got a message on my phone Jason you gotta help me out what we talked about thing that was my phone thing and - I also get alerts as the storage administrator so okay so so if we go back into Dell storage manager and take a look at Bob's been studying at the address hold alerts well and click on refresh instead of safe we should see that indeed Bob Bob was indeed correct our current value of latency on that volume is 53 milliseconds and we've we're well above the five milliseconds so what we want to do and you know normally at this point I would probably look at the application but I already know from an application level it's it's tough it's not much to know about Iometer so we right-click on the thrush on the active threshold alert okay and so this is where volume advisor comes in all right recommend storage Center so volume advisor is gonna say based on the threshold you set up I'm seeing that you know this is a latency based threshold currently on SC 56 where the volume is I'm recommending that we move that volume storage field alarm to the SC 58 I'll flash or a that we also have in the Federation okay am i safe in assuming that the interrelationship between QoS and the thresholds is something that is going to happen in this demo and we can make it not happen in the real world just telling me that the volumes I'm specifically throttling have the high latency is it's kind of like a news story that says dog bites man that's kind of like yeah that was that's the expected behavior the QoS profile is set up specifically specifically just what I wanted to hear that's like yes we made this happen and you could only do that by fat-finger so I clicked that link that says okay let's go forward with your recommendation so these are so that was the place we talked about you could override it right right in this situation we chose to go with the recommendation that da-rae gave us now to answer your question over in the corner next to Howard Territory for your Eric we can absolutely use fibre channel I scuzzy or both this particular area is only set up with fibre channels that's all we see in the pulldown but normally there would be ice cozy in here as well okay for the replication conduit for that async replication I don't need to change anything here all I need to do you know this is simplicity click great and it is going forward now with with setting up creating the replica volume on the on the second area right which cost I've got a you know slide build-up for this it's establishing the replication from the source to the destination async and it's also adding the additional paths from the destination to the storage host all right so we're gonna we should see them showing you the simplicity right it was just one single click so all those things that I showed you setting up the LAN setting up the replication all of that step was just that one click that he that he did write all of these things are happening under the covers but I'm just under the covers right in front of us and all and all of this data all of this is available to me through PowerShell or the rest a VI yes okay cuz just about anything you can do yeah but this kind of automation framework it's kind of like okay get the recommendation and then do a minimum evaluation and then well I mean there's there's two questions you know you get it you get the you should move this and there's two other questions you have to solve you know it's like is where your you want me to put it something that is a bad idea for reasons the system doesn't know about but there's also when do I do it because kind of this the big problem with storage DRS has always been the act of moving it adds more load to the system that's already overloaded yeah well in case we wanted to do it right away because we busted an SLA yeah right but but in my automation occur and my automation framework I might want to say okay do this when the load on the source array falls under this and so it'll happen two or three o'clock in the morning yeah I mean you bring up an important important point to that in a large organization there may be political reasons where you do not want to move a volume from one or a you know from sales to another array that manufacturing purchased oh I yeah I've worked in universities that professor bought that thing on a grant you can't let anybody else on the side can I do I have to wait for what the system generates as a recommendation no no I can generate my own you can do everything you see you can do by your so casa talked about two ways to kick off a live migrate we just showed one of them one of us through volume advisor and you leveraging thresholds the other way is what he called the manual method where I can choose this is my pass volume advisor because right that works off four metrics volume advisor doesn't know if there's a hurricane coming into a coastal region and I want to migrate my application further inland that's very gonna use for example if you're doing a migration right you bought a new array nothing wrong with the old one you just normally want to move a few lawns to the new one right now I'm actually thinking more on the other side because what I like about the recommendation is it's telling me hey wake up there's something not great here yeah yeah but I might want to react otherwise from the point of view that Howard pointed out which is I want to generate more load so maybe instead of attacking the volume that's having problem I want to have the code that will go and offload like the arrest does it snow don't take the big one that's really working hard in fact offload some of the smaller ones that aren't doing anything to you make some room absolutely and that's why we said also you can set capacity and CPU utilization ya know for example one of the use cases I mean you use latency here to make the things much easier yeah yeah but thinking about this I have a system that is 90% CPU utilization on 80 you give me an advice I just bring in a new system then I start to move the things I don't want to move maybe one at the time I'm doing manually but when I have this Federation constructed then I have the right advice right I need this system is 20 percent loaded yeah ultimately I mean don't get me wrong I like it but it needs a policy engine example the thing is what you're doing right now is we just said I'm having problems therefore I'm the one who should move and perhaps the approach is he's having problems maybe we should just make some more room for him so I don't want to move the guy who's generating all the i/o load I want to move somebody else out just to I'm saying yeah anyway I think there are use case whether you you want to use volume advisor every time there is a other case where you want because you know the system better and specific volume what they doing well say I might not know what they're doing but I like you but volume advisories it wakes up and tells me hey you violated yes la you should do something and you can review that so you can keep that as a you have something that is going on right that could be the figure which starts off my little script it says okay alert came in my REST API goes okay who are the second and third highest i/o generators on that volume go ask for another go out the outlier of the top ten and I decide which one so all of these they are an improvement that we obviously can put because the logic is already there so manic that you want the feedback is important I mean from this group we've got it from you know customers and so forth I mean the code is as it is today it's like a monorail it's never complete and and we want to hear and and once you builds the first policy engine people will come back and go you know yeah good you're making us take these four things into account oh so yeah yeah absolutely yeah no never expecting it to be a very good point I think he's you know I mean this this is like Howard said this is great yeah I really like this I'm thinking you've got crawling down you know everybody else is just sitting there like a lump you're crawling fine let's start walking oh you know and you know ultimately you want to say okay so so there's you know these 17 workloads and they have priorities and maybe I want to move the low priority workload and you know there's lots of you know the more you think about it the more complicated the policy engine can get yeah and this is why and it's kind of like with something you know we're gonna do the job that you're crawling fine walking will be and here's a policy engine and you can make these five decisions and then later they'll be more and more complicated certainly a lot of things to take in consideration I mean one of the things you know to think about they leave my workload alone move some other huge workload okay there's there's hidden cost of that terms of when we're gonna run it we'll pick up and move that replicate it into Tier one on some other destination array and a larger work low or larger footprint is gonna do what take up more tier one space during that ingestion that could cause a problem on that or it that's something we have to factor in yeah I think the reckon the intelligent like I just want it to be lit well it's it's good we just want to be even smarter than it is that's just don't already very good points all right did your being on this side of the table is you don't have to actually code anything there's a lot easier than doing policy and constructive policy engine is very good in this case right yeah and we freakin build that in the product will be very nice all right so now we need to get to the most important part where we're doing the transparent we're not having any input and what one wall one other minor point the policy engine can be external it doesn't have to run in one of the arrays in the Federation I'm fine with the VM oh yeah okay so a couple things have happened here number one the workload replicated pretty quickly these being all flash arrays in my VM being about a 35 min there's 35 gig footprint replicated within about a minute or two I also choose the option we didn't really talk about well you talked about a little bit Kaushik stage three of live my greatest phone that role swap or the volumes move or the primary goes to the destination that canary goes to the primary by default we do that automatically and I left this at the default and you'll notice what what has happened is that as soon as we roll swap what's been alleviated the pressure the latency that's being caused the application as a result of that automatic roll stop hasn't improved back within tolerances which is normally why customers would want to use that automatic roll slaughter you have 1.3 1.3 1.4 to really circle is that Jason is that because the QoS setting didn't migrate with it right so the QoS setting is only on the SC 56 or if the if the QoS setting was on the FACS 58 array as well we probably would have got a recommendation because it would have been burdened over there as well now he's asking why didn't a policy I applied to volume move with it across rank iteration because they are per system currently they're static versus nothing saying we can't create the same policy or use PowerShell to blast out yes we in policy do the same volumes but what if the volume doesn't exist on the other one yet well that and therein lies the problem right exile can we apply a policy we can certainly create the definition on another system okay we can't apply it to a volume that the system doesn't own yeah you need to but yeah it's one more thing to build into the policy engine yeah absolutely all right so life's good our application is recovered we have to perform the final step of our live migration which is to go back into a federated tool Dell storage manager let's actually go look at the status of our live migrations I'll click refresh here and we can see that it's ready to be completed hundred percent replicated we're ready to do that cut over and so this is where we tear down the paths from the storage shows to the store source system we shut down the replication link between the systems and these stored shows at this point is running completely on the destination array the one thing that we do and we this is just an acknowledgement here that yet we want to do this and we've actually performed a storage rescan on the storage holes do if we forget that step yeah and we're still operating on the old paths that have the proxy during the cutover we're gonna lose access to our stores so that's just an acknowledgement hate you're doing the final cut over are you sure you rescanned for storage and devices on your host yet we did I did that while we were talking I don't know if you caught that yeah I cut a little right click all right so we've done the the final step now if we go back to storage click refresh the last thing the apps that we actually do is we can delete and remember if your slide talked about it Baloo simply we actually will preserve the original source volume on the original array and just sort of abandon it just in case customer wants to go back and retrieve that copy of data although at this point it's out of date because all the rights that were going to the primary on the destination array all right so that that's the live migrate demo with volume advisor do we have time for live online model failover we have time yeah it's it's three minutes till all right okay yes are you urge to do a demo for me I really appreciate that yeah shouldn't take too long I mean all right so this is actually the same infrastructure the same Federation well we're gonna use the same arrays the same type of workload here you can see I ometer running I've actually got a little Macromedia flash stopwatch in there true so we can kind of measure how quickly quickly we make progress with the auto failover but Bob I believe you you had some kind of something you want to interject about this application requirement well in you know in some cases where we you know need to meet that SLA maybe the team's not around working the weekend so we wanted to automatically failover to that remote site we're gonna skip the recommendation section I got to have it running all the time all right wait all right so that sounds like a perfect use case for live volume automatic failover in case we have some kind of unplanned outage zaxxon on either array where the stretch to live volume is already set up if we look at our replications and live volumes i've already set up the live line ahead of time I haven't configured automatic failover but we'll do that right now that's actually a critical step in order for the demo to work so let's go ahead and enable automatic failover on this existing live volume it's simply a checkbox you and believe how many years of engineering went into well I know there's you know well if it's since the emc acquisition it's at least six months of meetings so in more than six months days well I mean now that we have it and and frankly we've had it for over two years okay it was introduced I'm adding failover was introduced in s cost restore center OS 6.7 customers love it I can save with confidence it's bulletproof every time I demo is for a customer it's live and you know it just works now if I remember working with a lot of content L the ones going and it's coming in the next we think this is always the off table it's coming it's coming and so glad this is happy to see it here yeah you and me both so what we've done is I've enabled automatic failover okay what I want to do is scroll over to the right here and look at a couple of columns here which speak to our status of our volume in terms of automatic failover so I could see that failover state is protected that tells me that automatic failover is enabled and the is protected means the volumes are in sync it wouldn't do us much good to failover to vault a volume that is out of sync in other words we have a data integrity issue so they're in sync and that's it so what we're gonna do let's kill let's kill let's say in this case this is VM one alright I'm going to open I've got a script that will shut down both controllers of that SC 56 array instantly all it takes about three seconds to log in and actually pop but you get the idea there they'll both be powered down and I'm gonna kind of move this so we can see it now what we're watching for is obviously the Iometer activity to slide all the way to the left indicating that diameter and the VM are now having issues of generating reading right uh-huh okay I'll start that as soon as it is i/o drops okay okay all right so I'm gonna hit a key did that work there we go there we go I'll start the flash player at this point both controllers of SC 56 are down okay at this point the SC 58 storage center which is the replication partner or the other end of this stretch line is contacting the tiebreaker service saying hey I can't reach SC 58 or SC 56 what the UC tiebreaker says I can't communicate with it either I think since you are the secondary live on I think the prudent thing to do is activate become the primary so we can you know satisfy a read and write request to come in for the application you can see usually takes in the somewhere in the neighborhood of 17 to 21 seconds I'm not busy talking on stuff and we've automatically failed over the read and write i/o to the other storage center SC 58 for that what are my options for the witness for the witness so the tie breaker service the witness the quorum whatever we're going to call it we don't care it can be anywhere as long as it's not in either of the two sites where the storage centers are that are the two end points a little boy so let's call it third site whether it's a customer's third site we got we got a network connectivity I mean there's a Windows app is it it's a data collector so the date the tie breaker service lives as it runs as a hidden service within the Dell Storage Manager data collector DNA wherever you deploy a data collector which is the federated management it actually has the tie breaker service code in it whether or not you leverage the tiber at your service it depends on you know if you create a live lime and check the box that says automatic failover then that tie breaker services having to be a tie breaker for that volume you may have noticed when we were in the dsm UI there was a column that said is local tie breaker what that means is this data collector that i'm logging with right now that mean is the tie breaker service for that live volume we could have other data collectors understand running within the environment okay that also manage other arrays or even the same ares and we can likewise set up live lines with those other data collectors but we only allow one day to collect or tie breaker service to be you know the mastermind earth decorum right for you know to handle events for an unplanned outage for obvious reasons we want to avoid things like split brain and you know frankly not failing over can be just as important as failing over if the certain conditions aren't met and there you can put in the in the cloud yeah so third site I would personally yep know a short answer witnesses love is perfect perfect cloud is a perfect candidate for it for to be a third site so we all we need is TCP port 33 30 33 to get to the data collector and that's really it okay the data collector the tie breaker service is not in the path of i/o so it's not you know we don't have data customer data flowing through there it's literally just the tie breaker harder missing the witness so you you have we have a one minute more one minute horn well that's the fail or I mean that was the really yeah we should I was exciting I want to fill them D you nice for you if you have right at 300 okay yeah all right look is saying all that crazy about the old Unisphere yeah yeah well it failed over long I don't know you're asking if it's filled back he wants a gentle Unisphere let's show you what is coming in from a DSM prospective all right so this is a java-based UI who loves this all right what we have coming is we're gonna replace the job of UI with a flat UI yes no no we skip that step great we're in a leapfrog straight to html5 this demo is running on pre-release code DSM 2018 okay which has a new html5 UI built into it and this is a preview of that not complete obviously yeah but built in that's in the manager no it's built into the data collector so there's not even a client at this point you know so when I logged into the data collector when I ran those other demos that is a client-server application yeah that's what's going through TCP port 30 33 yeah this is web-based html5 easy UI all right so we can go to this is the landing page and go into storage centers we should see kind of a similar here's our Federation of for se series arrays we can drill down and each one of those you know and start looking at a lot of the same types of things that we have in there pools you know the hardware hardware pieces bandwidth control so that's probably where QoS lives and so forth so while the functionality is different from what we have on unity but you'll see the look and feel is very similar to like the way we panels these and how we show these tabs out here oh it's like I got an IE problem in my in my lab so they did the idea here is that not only it's also too embedded this and to provide html5 and provide the same experience that you have on the Unisphere for unity right so it's bringing these things that the customer that is used to you unity you can find himself also in the vice versa right now and these are all customizable widgets basically right here like the ones which you want these dashboards right we have that on unity and you can now do the same thing on as well and and when you go to cloud IQ as I we spoke about the same design is also in cloud like you so you have a completely end-to-end experience from the UI perspective by the way Becca and you know back on the tie breaker service and the data collector in the cloud if it's not already well-known so the this data collector instance runs on Windows and historically you've been on Windows for a long long time we also have a Linux appliance version fully functional yeah of our data collectors so you know if Windows is an issue or if Windows Windows instance in the cloud is an issue feel free to use they're fully functional Linux appliance and I get that as an OVA and OVA down somewhere just dues to me good news
Info
Channel: Tech Field Day
Views: 9,828
Rating: 4.5 out of 5
Keywords: Tech Field Day, TFD, Storage Field Day, SFD, Storage Field Day 14, SFD14, Dell EMC, Jason Boche, SC Series, failover, storage migration
Id: g68CpbvLSLc
Channel Id: undefined
Length: 34min 23sec (2063 seconds)
Published: Mon Nov 13 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.