Dell EMC VMAX All Flash with Scott Delandy

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
right everybody nice to see everybody here again so just for the folks that are on the livestream just a quick introduction for myself my name is Scott Oh Andy I am part of the product management team for the storage division based out of Hopkinton Massachusetts so part of the the initial EMC team now del EMC team been with EMC and now Dell EMC for going on actually 28 years so 28 years with the with the same organization if you would if you're trying to do the math yeah I was about 11 when I started so so that you know that kind of factors in so quick quick question question for you folks so how many people here have in the past experience hands-on or just kind of a good knowledge of symmetric savvy max from from back in the day so we've got a few hands going up how many folks would say that they have limited knowledge of VMAX what it is what what people use it for types of use cases those types of things so a little bit more more than that so well the the good news is is for the folks that aren't familiar hopefully at the end of this session you guys will have a better understanding of what the what the technology it is why customers use v-max in their environment and what are some of the very common use cases for how that technology gets deployed for the folks that are familiar with the product already the good news is that you don't have to learn anything new the bad news is that you're gonna have to forget or unlearn a lot of the things that you know from some of the older platforms because I think what you'll see is that when you look at the technology and how it's evolved over the last several years there's been some significant changes in terms of the architecture in terms of the software in terms of the management even in terms of the things like to go to market and how we package the system from a hardware and software perspective so we're gonna get into those things and I know I don't have to say this because I know it's gonna come anyway so as we get into the discussion questions thoughts please go ahead and and throw those out I start here just to kind of talk a little bit about sort of the the platform so when you think about v-max and what was previously the symmetric systems these were technologies that were initially introduced in the late 80s right that's how far back this platform goes right and it is still a very relevant technology today for many IT organizations it still represents kind of the core of what their infrastructure is built around and when you think about that from a historical perspective right how many technology examples can you think of that are that old right that have that kind of legacy and we talk about this a lot and really you know you think of things other than mainframes and in v-max symmetric there aren't any examples that that we can easily come up with I'm sure if you guys were to sit around and think about it you guys could probably come up with a couple of other things but it is again the legacy of the platform and when I when I go out and I talk to customers and users and I spend most of my time doing that these days the one consistent feedback that I hear from them is the reason why that they run v-max and why they will continue to run v-max is because of the availability the rock-solid reliability of the platform the serviceability of the system the way we're able to do non-disruptive upgrades both on the hardware and the software side there's just a tremendous amount of trust that people have built up with that platform over years and if there's one piece of feedback that I consistently get from these users is never ever ever do anything to compromise that because the types of applications and the workloads they run are the things that support my business and I need to know that they are going to perform the way they're supposed to and that's why I continue to invest in this type of infrastructure so it's a key piece of feedback that again that we very consistently get from the from the user base out there now one of the things that's changed okay is the packaging of the platform right when you think of you know the older symmetric systems even some of the the older vMac systems when they were first introduced kind of the design goal around that platform was to have this scale out technology right I want to be able to bring something in that's kind of you know what I need from a scale perspective from a performance perspective from a capacity perspective but want to have Headroom for growth because I don't know if I'm going to grow 30% over the next couple years or if I'm going to go three hundred percent so I want to make sure that I'm investing in a platform that gives me the ability to scale that out and get bigger if I need to okay and and we still have users that do that today that that's a continues to be a very common way around how folks deploy the technology right start small but have the ability to get big but when you think of where we are from a performance and a capacity standpoint with the density of the systems that we have available today we're starting to get to that point where I don't know if I need to go that big right so so for example we introduced a smaller package system it's a 250 right so this is a system that has starts with two controllers can scale up to four controllers right roughly about a petabyte of effective capacity can be installed into that system the larger 9:50 system that's a two to sixteen controller system okay that could go up about four petabytes of effective capacity when I talk to users out there I don't find a lot of them that want to put four petabytes of anything into a single system right they're comfortable with you know hundreds sometimes several hundreds of terabytes of capacity and a single piece of storage footprint and from there they want to be able to add on additional platforms so the trend that we're starting to see is that you know the 250 and kind of the smaller version is starting to create a really good sweet spot out there for users that want to go to this kind of out what I refer to as sort of a storage pod type of a deployment right they need some amount of capacity say four or five hundred terabytes right they've got their applications they've got their host environments they get everything set up and they want to have it in place they want some Headroom right they want to be able to add some more capacity they want to be able to to grow the system but they're not looking to take that system and to be able to double the performance of for example right they just want some capacity for growth right so that system makes a lot of sense for them because they can bring that in run it for a couple years and then if they need more they can just bring in another right and be able to run that side-by-side and it's interesting because one of the things that's changed especially as workloads and environments we can become more virtualized the challenges that you had in managing physical environments right where yeah there was a lot of value and associated with consolidating because if I can consolidate that means I'm managing fewer things they're bigger but they're fewer so there's some simplification there's some standardization it's easier because I can run everything across a common set of software a common management all of these types of things were very very attractive but as we began to move into more virtualized environments it shifted a little bit because management got easier because things became much more automated mobility became a lot easier because if I had something that was over here and I wanted to physically move it over here because the application was virtualized I had other tools that I could use that allowed me to do that much easily much more easily and and way less disruptive than some of the things that had been available in the past so so again it's interesting if you look at kind of the sweet spot where we're going with the technology we were very much going kind of down market because that's really where the growth is that's where we're starting to see more and more users looking to deploy again not necessarily wanting to bring in these large systems although we do have lots of users that do that and want to continue to do that but we also have users out there that want to kind of start small and have a little bit of growth but not look to double the size that's one the symmetric level of res at much smaller capacities yes so the differences is it's it's still a multi controller system so you have four controllers up to four controllers within that they all run active active the difference would be as if I had a 16 controller machine ok if one of those controllers becomes unavailable I lose 1/16 of my horsepower if I have a 4 controller machine and one of those controllers becomes unavailable I lose 1/4 so there's some differences in terms of sizing the resiliency and the availability all those things are exactly the same but from a performance perspective is this something that's gonna impact the environment well it depends on how hard you're pushing the machine if your and if I've got four controllers all of the devices are available directly to all the controllers absolutely yeah we'll talk a little bit about that when we get under the covers I've got some architectural slides take a look yes very much the tradition in terms of the redundancy and being able to have components go offline or being replaced or being serviced or whatever without impacting the the availability in many cases the performance of the of the applications right that absolutely continues to be true so when you look at again just to kind of shine a light on the 250 right when we go into environments right where we're doing technology refreshes and we've got we've got systems in there that are four five six years old right you think about the available technologies at the time must most of that was was based on spinning drives right lower capacity spinning Drive certainly than what we can do with the flash technology that we have available today so no surprise to anybody in the room if I go out and I take an asset that's several years old and I bring in something new I'm gonna get a higher performing system I'm gonna get something that's much more dense in terms of the capacity and the performance that I'm gonna put to a foot into a single system I'm gonna get a lot of efficiency out of it as well right so one of the things that we've done is we've shifted away from systems that support mechanical drives to systems that are based on all-flash technology so when we move into these all-flash systems there's some things that we can do around space efficiency that allow us to get more presentable capacity out of the front end but require less physical capacity to actually be in the system so data reduction technologies like compression is a good one if you rolled out compression well over a year ago year and a half ago now and we're seeing extremely good results with our compression technology typically it's about a two to one data reduction for the types of datasets that we typically see going on a v-max type of a platform so think transactional types of workloads we also see consistent performance right so for workloads that weren't compressed with an all flash and we turn from pressure on they perform exactly the same a lot of that has to do with the way that the data reduction was implemented in the system as an inline data service and something that was designed around making sure that we continue to preserve the performance in the little late and see that you expect with an all flash system so it's gotten much better right and and again no surprise to folks in the room right so if you look at Flash technology and how its matured over the last several years certainly since it was initially introduced the cost points have come down with with the associated flash media the capacity points have gotten larger right the drives are denser it gives us the ability to pack more capacity into a smaller footprint yes help me out with this absolutely I've owned a few V maxes well I haven't organizations I've worked for him okay I've owned a sim yeah I've not only I don't have the power I don't have the power at home - I don't have three-phase power you'd be surprised you can actually funny little story so we actually have ways of being able to run these systems we plug them in we go to shows we plug them in off of wall outlet so we can get them to power up so so I've never considered the V Max a scale out platform okay how help me understand how is V Max today scale out when I think of scale out I think of traditionally when I run out of space in the V Max I buy more shelves yep when I think of scale out I think of adding another V Max cabinet next to my existing V Max and manage that as a single system a single separate system know as a young v max arrays as a single system okay yeah so V Max today is now scale out in essence yes well it's always so it depends on how far you go back so when we had the older DMX systems and you're talking G's when was that that you know you're talking like the mid 2000s for something like that that was basically you had a card cajon in there you plugged in these director boards but it will you were limited in terms of how many slots you had within that right with the V Max does we introduces a more modular ability to add what we had at the time what referred to as engines and within the engines we had some associated drives that were kind of packaged together with that where you could have start with a single engine right and get one extra performance right then you could add a second engine you get two extra performance and you could add a fourth engine you get 4x performance and that's where the scale out came from so scale out was the ability to add not just performance but being able to add front-end connectivity so front-end the ability to fan in so I guess the when I talked to my what was then an EMC stealth person moved into the alien's person when I talked about expanding out my v-max it was okay you can expand it out to a certain number of engines and so by your definition of fill out is the ability to add additional hinges correct okay and have a linear scale in terms of performance capacity connectivity because connectivity plays into that as well because the more connections I have in the front and the more front-end bandwidth I have the more connections I have on the back end to the flash and we're back in so it's that you know there's multiple domains to that today it's just conceptually difficult so we're not walking to my datacenter I see a v-max cabinet with engines stack and then shelf next to it yeah yeah though cuz when you look at it physically and don't I don't physically associate that as scale out that that is that is an old packaging that is packaging that goes back to at least two or three generations I'm an architect when I walk into the data center all I see is v-max right all the cool div lag what's behind the what was behind the door I really right yeah I mean the truth is when you say scale out we think of either an extreme i/o kind of architecture or more likely a shared nothing expand to tens or hundreds of nodes architectures so and so there's there's it's not definition it's connotation you know financing went when I when an engineer came to me and said Keith I need to expand the v-max I would normally say okay how many more shells and we add right and he would say we're at we're at the physical limit we need to buy another sports solution whether that's another v-max or so when I've always had this and this is as a reason to maybe a year ago that there's a limitation to how far I can scale a v-max out total of thort Y if so that for petabyte limit that you talked about in the overall system the overall scale out limitation would be around four petabytes yes from a capacity presentable capacity standpoint now with the data reduction that you have right you may not need four petabytes of actual physical storage in that meeting to present the four but from a presentation so once so the the blocks of scale Elvis I can still out to four petabytes yeah before I'm looking at buying a second V man yeah yeah so that's absolutely what I would I would say is the the kind of the deployment and the consumption of that we're definitely starting to see that shift right because the scale yes I want to be able to scale that system oh but now what I'm seeing or users are coming in and saying okay I need more anymore stuff right and I and I have this asset that's on the floor that's now two years old okay do I want to go out and add on to an asset that's two years old that I know a lot of Nia for petabytes of storage is a lot of Tier one store yes they're not that it's a bad thing yes a lot there's no I just need I have no pushback are you I just want to understand the constant yeah yeah yeah absolutely in because the other thing to think about is that if I have a system that doesn't grow as big as these larger systems there are components that I can take out that help reduce the entry cost point so if I'm gonna bring in one of these machines that's designed to go to these 16 controllers I'm paying up front for some of the infrastructure to be able to steal that system oh if I say I'm never gonna go there I don't need the RDMA switches I don't need in the backend connectivity I don't need some of those Hardware elements that are gonna add to the cost of that entire system so the entry points for those systems become less and become way more a track if if again you're not ever planning to push this thing to a four petabyte configuration right that's kind of that's kind of the shift and that's where a lot of the growth is really happening within the platform we still continue to do very well at the high end because there are users out there that want four petabytes of stuff in one thing but but it was always difficult for us to kind of crack into that lower part of the market that is that you know hundreds of terrible I've been environments where they have v-max just for one application because they want the resume they don't need a for petabytes they just want the resiliency so that there's there's challenges absolutely absolutely the other thing they would talk to on just on the flash that we're seeing is that you know when I talk to users out there they're seeing obviously much better reliability with the technology right and that that's no surprise to anybody in the room you go from a you know a system based on magnetic drives that you know are prone to fail especially as those drives get older in the environment we see more replacements because you start to hit the useful life of what that drive is able to do as we move into the flash media the flash media's is more reliable if you look at kind of the sort of the rough mm btf numbers from the vendors the mechanical drives are roughly 800,000 hours in terms of mean time between failures the flash drives are about 2.3 million hours right so you're looking at a drive just from a reliability standpoint is upwards of 3x more reliable than the spinning drives right so so right there you're going to replace fewer of those drives in a user environment but the other thing to keep in mind is that you know even if a drive fails you have redundancy built into that drive through things like raid protections yeah we're gonna talk about raid hooray right so raid is nothing new writes a parity based protection that allows you to have drive failures within a particular group but have things spread across other drives that makes it make sure that a not only do you not lose the data but the application but be the application is still able to continue to continue to run right and the challenge that you have with the spinning media is that as those drives get larger and larger you're at you know you're still limited by kind of the physics of the right you can only rebuild these drives so fast because of how how quickly you can move data in between the drives to do that rebuild process and I'll give you some some field numbers here right if we have you know large capacity drives in a system say a 2 terabyte SATA Drive right and these are older drives if that drive fails and I need to go and do a rade rebuild up against that drive it's gonna take you know tens of hours sometimes multiple days depending on how that drive is set up to basically rebuild the drive and while that drive is going through that rebuild process I have to be careful that I don't have other failures impacting that raid group because if I do I could actually have data become unavailable right so I want to reduce that window as much as possible to limit the exposure of having multiple things fail right when you look at the flash drives that we shipped today the most the most popular flash drive that we're shipping right now is a 3.8 4 terabyte drive right we raid protect that drive if that drive needs to be replaced for whatever reason the time it takes to spare and rebuild that drive is about 90 minutes right so you're going from a drive that you know a spinning Drive that takes days for a couple of terabytes to a drive that's twice the capacity that takes you know significantly less time minutes are you know certainly under a couple hours in order to do that but the key thing about that is that helps translate into overall higher availability because you've reduced that window of exposure for other things to fail which could put you into a situation where data becomes unavailable and it's interesting just a couple of weeks ago we had a sort of a kind of a mini user conference back at corporate where we had several customers come in that have you know large investments in their vmax platforms and these are folks that are shifting into all-flash systems going forward and we just asked asked them a general question you know what are one of the things that surprised you in moving into the all flash and consistently one of the things that we heard was the reliability and the reduction in the number of driver placements they were seeing right because I think you know as flash started to become available in the enterprise there was all sorts of theories and speculation around cell where and how often am I going to rewrite that drive and am I going to be these things out and you don't understand my workloads and we're gonna do all these things and and and when we talked about it and how we implement that there are a lot of things that we do internally within the system that help maintain the durability of the drive right we do write caching we do write folding we do write coalescing there's all these cool little technologies that we've built into the system that allow us to take advantage of these flash drives and to be able to extend the useful life and not burn those out and that's something that we're definitely seeing from our customer base in terms of sort of the real-world experience and the feedback that we're hearing from them okay so moving on okay we will do it this way so there we go so one of the other thing that's a so I mentioned you know the the packaging has changed and moved to all flash all those cool technology things but one of the other things that has resonated extremely well with the user bases is the changes in terms of the package okay so in the old days I would go ahead and I would buy my symmetric right Howard I roll it in and I'd say give me some s RDF right because I need these these applications right here really important I want to replicate those over to a second system and we'd say okay how am I going to replicate them synchronously to a second system in the data center and then asynchronously to a third system right we would take out we would take out the abacus to basically calculate what that functionality is gonna cost you because everything was licensed individually it was all capacity based and it was very complex in terms of what's this actually gonna cost me and more importantly if I we just got real grows and money and gave them to the EMC guys not that that's a bad thing the point I'm trying to make probably not something not that well is that you know the functionality it worked really well it gave them the ability to support the things that they needed to do for the business but it was very difficult to consume the technology right it was very complicated it could be very expensive in the environment and they and they really wanted a better way of being able to do business with us right so we sort of changed the way we do software packages today when you get an all flash system there's basically two ways it comes right we have you know sort of the the starter base package which is the F package which includes all the functionality that most users would take advantage of so things like snapshots things like thin provisioning things like some of our migration software tools all those things are basically included in the system okay the other advanced features are all available through what we call the FX package right so this eliminates all of those individual software tears and gives me the ability not just to run you know s RDFS here and s rdfa here but if I wanted to run SRD have synchronous and asynchronous from a single system I can now do that and I don't have to license that individually right so that's really helped I think users appreciate the ability to have have a more straightforward way of being able to take advantage of the functionality and what I can tell you is that it's increasing the amount of adoption that we have out there because now that we've kind of taken away that that tax if you would that software tax that was built into the platform it made it again something that's just basically included when you bring the when you bring the the the system in again it's simplified things and it's really increased the ability for users to adopt and implement the technology because it's now very much more affordable more straightforward in terms of what they're trying to do right from a VMware perspective right so it's interesting right if you think about virtualization years ago right way back in the you know the early 2000s as it started to show up in in the data center if you think about where first showed up it was test dev tier two types of environments hey you know I have this server infrastructure and boy if I could run 20 20 of these applications on one of these things instead of having to deploy 20 of these individually you know I could get more efficiency I could lower the cost take advantage of some of the functionality around mobility and some of the automation features and sort of how it started and as people looked at the benefits that they were getting from virtualization they said hey why don't I take these Oracle databases or these sequel databases and why do I start to look at virtualizing those right and there was sort of concern at the time about well what about the performance and is this gonna limit me from a scale perspective and what about all of the data services that I want for doing recovery and for being able to you know again do some of the automation around around the the management and and in those early days you know we worked really hard to make sure that as people began to move virtualization from some of those you know tests and Devon tier ii environments to now virtualizing those core applications we wanted to help them transition by using some of the capabilities that uses for years and years we're using in the enterprise which would be mac so we were very much in the forefront of helping users kind of shift virtualization into the the world that we know it today where most of the core applications are running under a virtualized environment right and when you look at you know kind of the reasons why right so massive consolidation we do have users out there that have massively consolidated tens of thousands of VMs into a single platform we have users up there tipping around 50,000 VMs in a single system so again the ability to kind of consolidate and collapse those things down into a into a standardized technology with common management common replication all of those types of things that that's something that we're able to do right the other big thing that that's really helping is the shift to all-flash right so if you look at you know some of the the environments and Jody talked about kind of this workload blender where you know rights can be spiky weed access all these types of things and when in the days when we had mechanical based systems you know there was a lot of effort required to go in there and make sure that these were tuned to make sure that as we were seeing spikes we could go in and identify why those spikes were occurring and what are some of the that we can do to help flatten those spikes out we don't see those issues I wouldn't say at all but certainly as much in a in a in and all flashes system not only does it give you again a very high level of performance in a very predictable level of of latency it gives you the ability to do that without having to do a lot of the administration and management that was required in in in you know the disk space or even hybrid based systems I was out just a couple of weeks ago I was talking to a large customer of ours there they've been rolling in you know all flash systems and you know we talked about the experience you know as they're moving from disk based systems or hybrid systems into flash systems you know what were some of the efficiencies that they were seeing on the operational side and the numbers that they're measuring is they say that when they move a workload from from from a displaced system to a flash system they see about a six to one reduction in the amount of administration effort right and a lot of that is just around performance optimization right if I've got if I got systems that are tiered where I've got flash and I've got mechanical drives and I'm trying to figure out which pieces of data belong on this chair and which pieces belong on this tier I've got all these knobs and levers and things that I have to tweak and pull and try to optimize and it requires some pretty deep skills to go in there and do that effectively moving to all flash gives them the ability to have this single tier very simple to manage matter of fact their comment to us is that the systems are pretty boring because they bring those in they go from a world of you know several to tens of milliseconds down to consistently sub-millisecond all of their users are happy and other than going in and provisioning and storage and doing some basic reporting there's not a lot for them to do from an administration perspective so you know being called boring I guess is nice in one way maybe not as cool in the other way but that's you know kind of the feedback that that that we're hearing the other difference between boring is when it's boring or worrying about going out of business I'll take boring we just want to give you your weekend back I think that's the big thing right so from a data efficiency so so we talked about the cost of the capacity coming down we talked about the efficiencies that are being built into the system with things like data reduction you know other classic technologies thin provisioning snapshot all of those things but what we're seeing is moving again from an environment where you've got physical disks and going into an all flash it's it's it's easily about a four to one space efficiency improvement over those older systems so so again now you've got this four petabyte machine or 400 terabyte machine probably more realistic but you're only using a hundred terabytes of actual physical capacity to be able to present that so it's a way of being able to reduce the acquisition cost and kind of the ongoing support cost by by taking advantage of that and then there's the availability right the the resiliency the the things that we do under the covers to make sure that the system is always up and your applications are always available those continue to be some of the top reasons we do all have a number of integration points within VMware as well I think Jodi and Todd did a real good job of kind of taking through some of the examples of those integration points they're very similar with what we've been doing with v-max for years and years plugins and api's and all of that kind of cool stuff I will say one of the things that we have users that use very very commonly deployed is integration with SRM Site Recovery Manager because when you look at v-max systems in particular there is a huge penetration around how many of those systems are being replicated right so we have this replication technology it's been available for years and years called SR DL right stands for symmetric s-- remote data facility okay been out there since it was actually first introduced back in 1994 if you believe it okay and it was back at the name I'll give you some some color around the name so it was back when you know we were very engineering driven organization and we let Engineers name products and we thought that it would be cool to have every product come out with a four letter acronym that always began with an S so it would be symmetric something something something right that's just how we name products back in the day so we've had s RDF out there for a number of years and just from a branding perspective we've always talked about well should we rename the product and the challenges that there's just so much brand association built up with that four-letter acronym it's very difficult for us to let go and to be able to move on with a new name but where s RDF comes from Sophia I know we've thrown that around a couple of times just to give you again a little color around the history of the name but specifically with SRM what's cool about this is it allows us to use the the the data services for remote replication under the covers with v-max but from an application or from a VM perspective to be able to use all of the native VM based tools to kind of manage and control that environment right so it's one of the things that you know be a huge adoption for it's one of those things that you know in terms of the amount of qualification or the amount of integration that we do it's definitely top of the list from an engineering perspective just because so many of our users take advantage of that capability from a VM perspective so again I mentioned you know at the high end you know the types of customers that are doing tens of thousands you know financial services certainly service providers we have some folks in what I'll describe as kind of the shipping and logistics type of verticals that are doing these very very highly virtualized environments that are consolidating like crazy right so definitely not not a kind of a fake hero number but certainly something that we're seeing in the real world from a from a consolidation perspective now are you seeing most of your customers by the FX software bundle or they vote yeah you want the number it's about half and half so when we look at what we ship about half of the folks bring the system in with the F and the other half bring it in with that facts and the reason why for the FX is because if you're gonna s RDF if you're gonna run the remote replication it's just way more cost-effective to bring in the the FX package because roughly about half of our users do the remote replication so it's about half okay but you know what's cool about it is that when you bump into the FX package and I digress with this but we also include with that power path licenses so power path is our pathing management software users really love it from a technology perspective they don't like to have to license it because if they want a power path everything in their environment that can start to be kind of a costly proposition for them but having that built into the FX slice into the FX license gives them the ability to run that in a good chunk of their environment so we're starting to see a lot more adoption of power path because from a from an automation perspective around performance in channel failover channel recover those types of things it works extremely extremely well and users love it but now again having to license it when I've got other things that are free and are generally considered good enough right that's kind of what we're competing with so the FX has definitely helped with the power path deployments well it's more than that I had to pay for power path it's that I had to pay for power path in tiny increments yep it's like I just spent 4 million dollars and you want another 150 bucks because I'm adding another server it's the nickel and diming more than the total amount of money so what I would I would tell you Howard is that that that that comment is is is is well I wouldn't say it's well understood it's something that we are we are looking very hard at how do we deal with going forward because you know we we still you know we still need to you know be a business and operate a business but we also want to basically make sure that people are getting the full value out of the technology without feeling that like yeah but you also have to understand situations like I was a consultant at Deloitte they paid me more to fill out the PIO to buy a copy of power path yep then for the copy of power path yeah it's one of those things it's relatively easy to get a four million dollar peel but to get a five hundred licenses spread over three years is way harder yeah so yeah it's I would say the prize man we we recognize that we we've we've put a lot of friction in place for a piece of functionality that adds a lot of a lot into the value and what I can tell you is that there are there are multiple work streams going on right now looking at how do we how do we make that better right because that you're not you guys are definitely not the first ones to say I hope the corporate people are paying attention to this right because here's another data point for us so there's not only something that I told Joe to Qi personally so hearing listening and hearing and doing or different things yeah no I hear you I hear you but I you know I this there's enough there's enough noise in the system and enough people saying well that's how the trend is more and more too bundled yeah and you just have to get on the bus yeah yeah I agree with you I absolutely agree with you definitely something you know where we are we are working and you know we'll we'll make better just how and when you know those details are still kind of fact being factored out so let's do we're gonna do a little bit of a geek out here so just because I wanted to make sure that we kind of talked a little bit about sort of the the operation and how a vmax works so so I mentioned you know you've got a 250 and it's a 2 to 4 controller machine and you've got this 950 which is a 2 to 16 controller system when I crack that thing open that controller actually we internally refer to it as a director but to the outside world it's a controller it's a node you know there are multiple synonyms that you can use to describe it but you basically have this this thing that has compute on it okay and the way we have it design is we have a a number of cores that are allocated across that that controller some of those cores the job is to talk to the front-end and basically service i/o that's coming from the host some of those cores are on the back end and used to move i/o back and forth between the flash media and then some of those cores are used for the data services so for doing things like replication or for doing something like file services okay you know if I want to take this block based system and I want to carve up you know 20 terabytes of file data I run that data service internally within the system I know I was talking to some folks last night back in the day they kind of cut their teeth around Solaris system so Solera was one of our early on generation products that allowed us to take a symmetric sat the time v-max today and to basically put some controllers in front that allowed us to do file presentation the new systems we can still do block and file but we no longer have that hardware requirement we no longer have to put something physically connected into the system to provide that file presentation it runs as a data service natively within the platform right but across these cores the cores are dynamic Clee allocated right and what that means is if I have you know core just to keep the math simple if I can do 10,000 I owes up against each of these individual cores and I've got eight cores that means I can push eight 80,000 IOS across the front end I can do that across all the ports I can do that across a single port I basically am gonna go in and dynamically load balance the front end in the back end and even the data services running within the system and that's a big the reason why I put this in is because that is one of the biggest architectural changes that we've made over the last several years older systems didn't have this ability to do that so in many cases there were a lot of rules of thumb that were in the field around how I do performance optimization around the back end the front end of the system and the point that I wanted to make with this slide is that those rules no longer exist right they've gotten much simpler because the system does a better job of basically automating that performance management and taking it out of having to worry about you know how I'm doing my zoning and how I'm cabling things and in how many paths I need from a particular host etc etc that got much better now from an architecture so getting to the kind of the scale out point I was gonna jump forward to this and I said nope I'm gonna wait well we'll talk about it when we get there right so here's that here's that same dual controller system right so this is what we would call today a single V breck I can run that as a single V brick the way the system works is it's a cache based system okay so we have this global memory that's shared across the two controllers within this system and what it does is it caches i/o and the way it works is every time a write request comes into the system that write lands into the cache it's mirrored over to the other cache and then we tell the host we've got the i/o go ahead and you know send your next i/o go do whatever you're gonna do next right so so we no longer hold the the right up until it hits the media on the back end we just take it into the cache and away we go and the reason why that's important is that the disk service time on that is is sub-millisecond matter of fact in most cases its sub half millisecond in order to complete that operation so it's really really fast okay so writes get cash within the system the majority of the read IO also gets cash right so I go ahead and I want to read a piece of data I'm gonna look in the cache and I want to say is that data in the cache right because of the caching algorithms that we run in the system we do a pretty good job of understanding what the active data sets are and where they need to go whether they belong in cache or whether we can put those out onto the flash drives on the back end of the system generally speaking when we look at the real world we typically see systems getting 50 60 70 % cache hits that's across both read and write operations now obviously your mileage will vary depending on the application read/write profile block size I mean all those variables come into place but when I look kind of you know install base-wide more than half of the systems are getting better than 50% cache hits across that so that means they're getting really really good performance whether they have flash in the system or not because most of that i/o is being serviced out of the cache right so same thing so now if I go ahead and I read a piece of data I'll go ahead I'll look at it see if it's in the cache if it's in the cache great right I'll take the day and I'll just present it back up to the host right so again half a millisecond in order to perform that operation if it's not in the cache then I got to go find it now I've got to go and read it off of the media right and this is where the big change happened going from the mechanical drives into the flash drives because in order to perform that operation with a mechanical drive that could take tens of milliseconds in order to do that because your mechanical drives depending on what flavor drives you had could be really really fast right we put the cache and there years ago when you didn't have flash in order to minimize the performance impact by trying to move more and more things out of cache versus having to move the data all the time off of the physical drives the move to Flash's made that much better because now you're only talking hundreds of microseconds in order to basically be able to read the data from the flash into the cache and then back up for the hosts so that's where kind of the performance comes from but the way the memory works is that you know we maintain everything within the memory it is a globally shared pool of memory so it's accessible in a single vibra configuration across both of those directors right they're both run active active they can see each other's cache and again they're mirroring the data between those caches from an availability perspective when I add a second engine into the system the way I'm going to do that is I'm going to have a couple of InfiniBand switches that I'm going to connect those into and I'm gonna run our DMA which is going to allow each of those controllers to access the other controllers over that high-speed memory fabric right and it kind of works that way as I add you know all the way up to you know four engines six engines eight engines you know as I continue to scale that out right and what we see is again as we talked about you know very predictability in terms of the performance improvements as I scale that those systems forward so what was one X with a single V brick is 2 X with 2 4 X with 4 8 X so on and so forth so that's what gives folks that that predictability in terms of the scale and the performance that they can get out of the system ok so from an availability perspective so there's lots of cool things that we do under the covers so from a hardware resiliency perspective you know do you do you know do boards need to be replaced do power supplies go bad to do batteries need to be all that stuff happens and and the system is always designed to be fault tolerant so any of those components or multiples of those components can go offline and can go be serviced without any impact to the availability of the performance of the application so all of that hardware resiliency continues to be there one of the things that we do very differently is the way we do upgrades in the system and I'm gonna hardware upgrades being able to add things and plug new things in and add more drives and add more engines all those things are as you would expect to be non-disruptive but one of the unique things that we do in the environment is the way we do micro code upgrades ok a lot of multi controller systems when they do controller upgrades what they'll do is they'll have their controller they'll add the new code to one controller they'll basically fail that controller under the covers and reboot it so that it comes back up with the new code and they stagger this across all of the controllers that they have in that system so that process can take a while depending on how large the system is and how long it takes to reboot each of those individual controllers when I do that that controller becomes offline so I need to have multi having across everything because when I do that the pass to that controller that's being upgraded is basically gonna go unavailable its status will say I'm no longer here find somebody else to talk to your data because I'm doing something right so it's not again for a lot of environments it's okay right because you know they have maintenance windows and they have things that they can get away with that you know they can they can go through these processes and for the types of things that they're running they can negotiate maintenance windows and be able to do that in in our world and some of the environments that we run in that there's just no maintenance windows there are no off hours to basically be able to do this and if you think that some of these highly consolidated environments where I've got tens of thousands of applications running on a single system the ability to go out and find a maintenance window right hey can I shut this whole thing down or put you in a degraded mode for some hours of period of time in order to do this it's just really really hard to so one of the reasons why users love their v-max is because of the ability for us to do a non-disruptive upgrade the way the process works is we'll go ahead and we'll apply the new micro code into the system and then while IO is running we'll go in and we'll pause the i/o we'll reflash the the processors and then resume the i/o we don't ever go offline the system never goes offline during that process and the time it takes is is nearly unnoticeable from the host in the application and it goes across the entire system so because of that it makes it very easy to upgrade to the latest versions of code it also makes it very easy if for some reason you need to downgrade to an older version for whatever you know for whatever reason that that might happen but when I talk to users this is definitely one of the things that they tell us is a differentiating factor in their environment they were able to do these types of code loads and the fact that you know they get their weekends back right we hear this all the time you know I had this thing in here and you know every time I need an upgrade we'd have to come in you know Saturday night and it would take eight hours because we had to sit there and be on site to do this and we have multiples of these frames and it just becomes you know not just operationally a difficult thing just people just get sick of doing that and they don't deal with that within the world within v-max and one things that this really helps us out with it's an important thing to point out is that you know you have a software out there you have software patches right there we have these things called e packs where we find oh we hear something within the code we need a function a pack out there and we need to get this fixed applied in the field because there are users that are potentially exposed to this this bug that we found and in some environments they look at this and they say ok I know I have to apply this patch but my maintenance window isn't for another X amount of weeks right so I'm gonna run in in what I know is is is a mote and unless it's an absolute critical pass which is hey you have to shut this down and you have to get this applied but for some of these patches I know I'm running in a mode where I am potentially vulnerable to this until I can get to that next maintenance window so I can go ahead and apply a fix to the system the thing that we see is because we've taken a lot of the pain out of doing those types of upgrades our user base not only do they stay current with any software patches that get released but they stay very current with the latest software releases in general right so they can take advantage of the fixes that get bundled into that but they can also take advantage of the newer functionality that gets rolled out because again it's easier for them to apply these types of upgrades in the environment right so so very consistently one of the things that we hear all the time as a as a feature that users love above the v-max it's not just you know the availability and the resiliency and all the things from a hardware perspective that's there but it's the the differentiation that we've done around the software and how we're able to do those types of upgrades right so yeah with regards to the process for that upgrade yep were we're feasibly upgrading all the controllers at once yep what's what's the with the fail back process if there's a defect in the code that we didn't know about so how how long does it take to get the software to the point where you're switching back same amount of time so what we do is we flash the code across all the processes at once boom and then if we need to go back in we can revert that process in the system those things are generally done by a part of the organization that we refer to as REM pro remote proactive services and this is a team of people that you know they do this is all they do this is so in just there this is this is like a medical process this is like going in for surgery we just don't take the code and it's not like you know hey there's a new click here to install the software and you click there and you're where you go that's not how it works right there are pre checks that are done through the system we run all these scripts we validate those upgrades we make sure that all the everything is healthy in the system and in there's all kinds of pre checks that go into place before that code gets applied into the system but once again it's applied there's there's no disruption to the environment to actually apply those and in the rare case a very rare case that you need to go back for whatever reason there's the ability to do that that that's something that's been built in and it's kind of a legacy thing that was put in years and years and years ago but it's still something that you can do today rarely rarely really ever happens but it's definitely one of the mechanisms that's built into that's it so the actual upgrade is not end-user perform abbu it's performed by the rim probe folks so for the for the general user community not like the secret society of other folks that are out there yes those are things that are done by the EMC folks that's part of the the you know the the standard support you know process that they get when they bring in when they bring the system in okay good I was just gonna say and being that it's a v-max and given what you're probably running on there you probably want that anyway yeah yeah this is again it's like wouldn't we it's it's that like going in for surgery where you know you walk into a different room and they keep checking your are you who you are right right all those texting like well they just checked it there oh I'm gonna check it again and they're gonna check it 10 more times it's that sort of you know pre check that goes through the through the process and again very consistently users will tell us this is this in their mind is is a key reason why they why they why they leverage the technology and they know it works they know what we're actually know there's a very solid process that's baked in around that so Jodi showed the slide so I won't go too deep into it but we've got our local replication and the ability to do I'm going to call it snapshot based management fight off on that one you will see no ICBM on this slide we also have you know the integration the integration within within that for no ICBMs there really dangerous talk about the bombs just yet so protect point is is a really cool feature and I and Jodi did a good job describing kind of the under of the covers of how it works the use case for this there's a very this very specific use case for why somebody would deploy this type of a technology and that's for large-scale databases okay if you've got a large database and you need to back that up you know you've got to worry about your backup windows right if I've got a 40 terabyte Oracle database that's gonna take you know tens of hours in order to back that up now I take tens of days to back a could I so I have a customer that I work with down in South America and they had a 40 terabyte database and it was taking them about 30 hours to backup which exceeds a 24 hour backup window so they were over a day in terms of potential data loss just because of you know the backup windows that they were given because of the technology that they had in place they brought in protect point into the environment and that 90 that 30 hour backup window went down to 90 minutes right because what we're doing is we're basically sending a snap creating a snapshot on the array kicking that snapshot off into data domain into data domain it looks like a traditional backup so it gets cataloged it gets managed and it gets be duped it does all that cool stuff that the data domain does but you're no longer traversing the backup infrastructure for doing those backups and more importantly for doing those restores because you're coming off of dedicated fibre channel sand connections which have higher bandwidth and again you're using some of the efficiencies connected between the the two technologies but that again a big big use case for us is for those large scale databases this starts to make a lot of sense especially if you're trying to exceed some of those windows in terms of not just the backup of what can be recovered but the other thing I want to talk about is the remote replication so yes RDF symmetric remote data facility you know we'll discuss if anybody has any ideas around a better name for the product we're we're open to that we'll see what we can do but so we've had sync and async modes of replication available for what seems like a hundred years right and the way the replication has always worked is I have a source copy and I have a target copy one copy is active the other copy is offline and I write to the source and then I write it over to the target and if it's within a singer' synchronous distance I can put both copies in the same place and acknowledge the i/o so if something goes bad I've got a secondary copy that I can recover up against and the potential for data losses is not there because everything we run synchronously if I have extended distances and I have to worry about the latencies associated with moving that data between those two sites now I may need to go to an async mode because I can't slow the application down with tens of milliseconds of additional latency in order to put that that copy there but the trade-off is is I'm getting a copy over hundreds of thousands of miles but now my recovery point could be seconds or minutes behind because I'm running that asynchronously that's how this stuff is work for years and years and when you look at kind of your choices right you've got your RTO which is your recovery time objective how long does it take to recover a particular workload and then you have your RPO which is your recovery point objective which is if I do need to recover what's my exposure for potential data loss when was the last time I made that copy that I can run up against is it seconds is it minutes is it hours as days and depending on the type of replication technology you have in place backup snapshots remote replication that gives you some variation between the RTOS and the rpoS and by the way when it comes to backup for a lot of environments you know having one copy every 24 hours is more than what they need right for these types of you know workloads that fulfills perfectly fits perfectly in with the service levels that they're trying to provide from a recoverability but on the high end right you know if I've got mission-critical applications trading applications for example I don't want to lose you know any of these transactions because those could be really really important so that's why I implement some of these things around remote replication with synchronous the new thing that we introduced actually not new anymore it's been out there for a while is the ability to do active active replication so instead of having one site that's active in the second site that that's offline in there as a recoverable copy I could have both sides as being active active which means I can read and write to both of those systems simultaneously so now if something goes bad I no longer have to recover the application I don't have to restart the workloads I just continue to run up against the other side that still remains available and that could be if a server fails if a storage array fails even if there's a site failure that gives me the ability to continue to stay up and running right and what we've done is we've taken this active active and we've integrated in with with vmware especially with things like ESX clusters it works really really well so what I can do is I can take a cluster and I can stretch it right and I can have both sides accessing both both parts of the storage simultaneously doing reads and writes so again if the cluster goes offline if the storage goes offline if the site goes offline there is no RPO there is no recovery point because the application continues to resume but one of the other interesting use cases for this is when I run active active if I want to do something like a V motion a storage V motion well actually a VMware V motion those run much much faster because I no longer have to copy the data the storage from one site to the other because it's already there so if I have a VM that's running on site a and I want to move it over to site B and I have this s RDF active active under the covers it's just a matter of saying take this VM and just move it over there and it takes literally seconds in order to be able to do that so what could take you know a long time hours depending on how big that virtual machine is that you want to move you can get that down to seconds by already having the data there because again you don't have to go through the process of sucking all that data up and pushing it over the network over to the other side because you're already running it there as a recoverable image okay so a lot of adoption for this this is really a cool piece of technology the one thing that I would say is that you know back in the day to set up s RDF you had to have kind of a PhD level you know training in in in you know Sims CLI and just the the science and kind of the wizardry of being able to go ahead and get these things set up and what we've done is we've gone down a path where we've really really simplified the setup in the automation for SRD F it typically takes a couple of minutes now to go ahead and set that up which could have been you know again a much more labor intensive administration effort to do that in the past this left consulting you guys made it unprofitable sorry Howard you know what we'll find something else to make complex for you we're good at that we can do that okay so so here I play this late man so so I did have copy data management in here so this is snapshot based management because I absolutely get your point rights this is what happens when marketing gets involved they want to use all these cool names but I I agree with you we are talking about more snapshot management and automation at the application level but you guys know all right you guys make it they're making all kinds of copies and and I think just for the folks that don't deal with deal with this the reason why this is important that you go home you talk to the application guys you see are you making copies of yeah because I'm doing it for a test on Twitter for dev doing it for training I'm doing it for a backup I'm doing it for you know a hundred different reasons right and I'm using I'm doing it at the application level because hey I've got this you know it's a it's a forty gigabyte database and I'm gonna make ten copies of it so that's four hundred gigabytes of capacity storage is cheap it's four hundred gigabytes it's half a terabyte I can go down the Best Buy and get a half a terabyte thumb drive for you know I don't know nine bucks what's the big deal right and the challenge is is that it's not just this one application that's doing it it's every one of these applications hundreds of thousands so a lot of the capacity that gets chewed up for this replication really really starts to become expensive and what we do under the array with the snapshot technology is we don't make full copies like they do at the application level we make pointer based copies which means that the amount of capacity you need for those copies is a fraction right on your full copy it's just how much of the data changes five 10% of the day so it's a way more efficient way of being able to make copies from from a space perspective that's kind of what the idea is behind by copy data management I threw this and this was just to see you know who are the the folks that were around for the 90s Saturday Night Live and remembered the copy guy making copies remember that for the younger folks we can google that so well making copies is the point but the other thing that that's improved is not just the ability to make lots and lots of copies but to be able to take those copies and to be able to run IO up against those and have really really good performance not just good performance for running the IO up against the copies but because it's no longer a full copy making sure that you're not stepping on the performance for that application so that's kind of a real big deal right couple you know where we're at the the election a little bit past ten o'clock but we have some really good customers out there these are these are a customer that I work with for a number of years now RC wily they are a basically a furniture company out of Salt Lake Utah and rich Sheridan is one of the guys that we deal with there and you know he's been a you know v-max customer for a number of years he swears by the resiliency he was an early adopter for the all-flash he runs a nearly 100% virtualized environment today he's seen lots of improvements the big one is he was getting you know he thought he was getting really good performance with his v-max systems on the hybrid side and actually he was he was getting four or five milliseconds and if you've got a mechanical based system that's really really good performance he moved into the oil flash in his four to five milliseconds dropped to 0.4 milliseconds so really really good results in terms of what he's seeing and then another customer and this is a new england-based customer who also we've worked with for a long time Joe Pass Atari and this is for sinuous health if you don't know who these guys are it's probably a good thing these are the guys they basically manage it administrative about 50% of the dialysis care that happens worldwide right and they use v-max and lots of their parts of their business but they also use it to basically manage their their their patient systems and what's really important to them is they they basically have all these dialysis clinics and those dialysis machines are really really expensive right so what they have to do to the businesses they have to make sure that they're getting patients in and out as quickly as possible to make sure that they're maximizing the utilization of all of the dialysis care that they can provide so how quickly a patient checks in how quickly making sure that everything is ready to go I mean that's what they're they're really focused in on it's just being able to get patients in and out the challenge in their environment it's interesting is because it's not like a traditional patient records type of an environment where you know if you're if you're a patient and you go see your doctor you probably go see your doctor maybe once or twice a year unless something you know serious is happening these guys are a little bit different because in their environment their patients are coming in three four times a week for care and they're coming in for the rest of their lives so the volumes of information and the amount of growth that they're seeing is just exponentially high and again they rely on the v-max because of the resiliency of the platform but also because of the Headroom it gives them to be able to scale and support that future growth
Info
Channel: Tech Field Day
Views: 4,083
Rating: 5 out of 5
Keywords: Tech Field Day, TFD, Tech Field Day 16, TFD16, Dell EMC, Scott Delandy, VMAX
Id: mq_ZKKNjK1Q
Channel Id: undefined
Length: 65min 40sec (3940 seconds)
Published: Fri Feb 23 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.