Dell EMC Introduction to PowerMax with Vince Westin

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
my name is Vince Weston I'm a Technical Evangelist for the PowerMax group as part of the storage team at Dell EMC and I look forward to hosting you all for the next two and a half hours or so there's going to be a deep dive on PowerMax we certainly welcome your questions I know that you all won't be shy to ask them so we're looking forward to that as I like to say I've seen the slides in this deck I know what they're about so I'm here to answer your questions and provide you all information on them the slides will be available afterwards it's all going to be on the video so we welcome you to be very interactive on this in terms of what are we doing with v-max v-max runs the world right and our expectation is the world will run on power Mac's going forward again we expect to do asset roles swap out all the V matches move them over to new power Mac systems and and continue to support the world running its business on our tier zero storage architecture so let's talk about why power Macs what are people looking for why is it the power max is particularly interesting we believe that the best-in-class performance that we bring is one of the key reasons folks are looking for power max up to 10 million I ops 150 gigs a second yes those are marketing numbers no the average customer is not going to see them but the numbers are real we do get them out of real boxes without special configurations and it does mean that we can deliver a whole lot of performance for customers we've got this massive cache we can consolidate lots of workloads we can make all those things work really well and we're to talk a lot about how we focus on performance and availability and all the things that v-max does so well we've designed this specifically to be able to consolidate everything not that'll all fit in one array but in some cases it will and we can do IBM Z IBM I open systems of any kind all together on a single platform and that means we're doing that with compression now with deduplication with data at rest encryption with dense packaging and all replicated and managed easily across multiple arrays per site across multiple sites we're delivering again six 9s or better availability including real and you that we'll talk through in a few minutes as well as active active and remote replication at so that's why people are really looking to buy power Max is you know the the larger version of what we've been doing on v-max improved with things like enhanced compression and the addition of deduplication so what are we focusing on what else is going on in power max that makes it really interesting one of the things that power max is really focused on as a change is driving down response time right so what we're looking at doing we've been doing caching of iOS and V Max and insa metrics for ever alright so we've been getting hundred-percent writes in cache in general unless you're flooding the system we're generally getting 54% or better reads across our average systems mainframes it's often 88 90 % open systems it's usually in this 50 60 percent I guess I'm just getting some get down into the Mia forty percent range it's not not generally much lower than that but a lot of cache it so when the reads come out of cache of course their memory speeds already now we've moved to all nvme PCIe back-end all right so it's PCIe hardware and vieni protocol so it's all direct CPU talking to drives and pulling it back so we've got minimal latency maximal throughput driving latency out of the infrastructure we're doing intelligent data reduction anding if we're gonna go through the details behind all of these over the future slides but I wanted to make the key point here and one consolidated place for you so the intelligent data reduction means that we avoid compressing the hottest twenty percent of your data and by avoiding that what that means is 80% of your back-end read iOS in general will go to the hottest 20 percent of your data so we don't have to spend any time on compressing the data when we read it off those drives so we now avoid the latency of a decompress on all of those iOS right so focus on the latency we have Hardware there to do all the deduplication and compression we have new service level management we'll talk about how to manage real tiers of data would be on flash we've got a number of partners in the service provider world who are interested in actually providing things all the way from diamond to bronze and they need a way to sell bronze that means they only deliver bronze even on an all flash environment so I'll talk about what that means and then we've got a roadmap to take things even further right so storage class memory as a tier of storage as a way of putting capacity in the frame not just as a buffer or a cache but actual data storage will be coming to the box in early 2019 as we announced at deltek world as well as nvm you over fabric on the front end right so you'll be able to nvme nvme into the cache and nvme all the way through to the drives so you'll be end-to-end nvme and be able to do that to storage class memory on the back end so you have a chance to really keep driving latency out of the infrastructure now when you say PCE back-end PCIe back-end you're really not talking cluster intercommunications you're talking you don't have a PCIe switch between the Clerc controllers do you we actually do have it for the controllers going to the drive yeah but I've got a diagram will drill into the hardware on how we're doing that cuz I knew you were gonna ask thank you so as we go to market you know Caitlin said she wasn't into the lot of marketing but I do have some of the nice marketing slides in here to help with the positioning and talking about where things are laid out and how this all works so as we look at what we're doing with Power Max we have the 2000 and the 8000 the 2000 goes to two bricks and the 8000 goes to eight bricks so the names are pretty easy to remember the essential software comes with everything it's essential and the pro software is the option that's above and beyond so every box has snaps every box has compression every box has dee doop every box has ndm and open replicator support for data migrations and all those things every box has the ability simple copy data management and a number of other things and all the management tools and all those kinds of things are all included and then the pro version allows you to add all the versions of s rdf so you're doing Metro and a and synchronous and adaptive copy however you want to use it it includes there it includes the embedded Nazz it includes storage resource management it includes the advanced version of our copy data management which means you can use as many copies you want and manage it all through the one tool and then we've got power path which power path either on physical or virtual host for 75 systems and so we believe power path has a number of unique abilities in how we're going to move forward and how we're going help our customers do a better job of managing storage we got a slide to talk later about some of the unique things that we're doing with power path in these environments okay you say yes RDF comes with the pro right what are the limitations in terms of Licensing so with so there's the essential software that comes with the base of every box and there's no limits on that stuff you can either license the individual pieces of Pro separately which I would generally not recommend because once you do one or two you're gonna wind up doing them all so just go and so even if you do like two versions of already F it's more expensive than just buying Pro and being done when you license Pro it's license for the entire capacity on the frame you're welcome to use it however you want no restriction and when you mention pal puff you said 75 so that's 75 endpoints 75 votes 75 whatever connection right whether it's a giant 32 CPU a X Server or you know a small VMware box whatever we don't care 32 physical hubs CPU cores in all don't matter I appreciate the simplification but the nickel and diming with power path still offends me because it's it's not the I need 76 hosts when I connect day one uh-huh it's the I'm connecting for more and have to issue a Pio for that piddling amount of money to get the four additional licenses that it owns me kaitlin's getting your feedback there in the back on the pricing and I'm sure she'll take that into account as we look at how to do things in the future there's a hole as you know there's a whole lot of challenge in how you take different pieces of a company and bring them together and we worked to do this to make sure that the first 50 host or 75 hosts didn't have any of that we're looking at what we can do to expand that so I appreciate feedback oh no I'm with you on way better than it used to be but but if I add for a hosts it costs me more to issue the PIO than the dollar amount of the po2 lights ins power path I'm sure I would say the same thing so again there are no limit there the only thing that has a limitation here is the 75 licenses for power path the copy data management is unlimited SRM is unlimited all this manages everything in the box so everything else is not capacity bound or anything in any way go use it have fun we've tried to get to the simple licensing if the v-max group owned all the software in power path I think that would be a different license key we're negotiating so as we do things scaling across the systems the 2,000 schools up to 1.7 million I offs in the tuber configuration a petabyte of effective 64 front-end ports which can be fibre channel or I scuzzy no mainframe support it's 2 to 4 controllers and fits and a half rack right so small dense easy the 8,000 10 million I ops for petabytes of effective capacity 256 front end ports up to 16 controllers and all in two racks so this is events double the density of the you've actually linearly increased your your capability in I apps here by going from two bricks to is it 8 bricks yeah so it should be a factor of four but it's a factor of like 7 or something like well so the CPUs on this guy oh hazards are different than the CPU singh in this game will drill into those yeah so it's it's fairly linear on the bricks here it's not linear between the two because this guy is 50% more cores per brick hi guys right so instead of 12 cores I've got 18 cores so yeah it's a it's a nonlinear thank good catch all right so what do we do the nvme and why are we doing nvme and what's going on with all of this the the goal of nbme was to lower latency in the process we also got several other benefits so the flash optimized protocol allowed us to do some things around latency for reads and heavy writes to drive more I I've got to slide in a minute that'll talk about what happens with latency on random reads as you push rights to the drives and such but getting the scuzzy protocol out of the way and getting nvme in there and doing memory to memory protocol gives us a much better chance of driving i/o to the drive without waiting for any other pieces it gives us more queues a more accused means better processing so for those of you who looked at nvme writes cuz he has one real cute on a drive nvme has 64,000 queues on a drive and each of those kids can have 64,000 items now if you've got 64,000 things sitting in one queue your response time is going to be horrendous you're never going to use that but because you've got thousands of queues you have the ability to set up queues for every indicator so since we have each director or each controller acting as multiple initiators talking to the drive I can have all of the paths alive for the drive one time and that allows me to have separate queues and not worry about queue management if I'm moving things back and forth between the various controllers alright so as my workload moves around I'm able to keep all that balanced and push all the i/o back and forth to the drive so I get more bandwidth again when these were drives it was a different story now that they're all solid state we treat these as memory devices we get a whole different paradigm and so that's what we've done in changing nvme has really moved these from being drives to being media devices that are randomly accessible these are also dual ported drives we believe we're the first enterprise storage Ori to incorporate them obviously that's a short race everybody else will join in rapidly but because we're using the dual ported drives and we're using dual connections from everything it allows us to be driving i/o more consistently other drives because of those queues again I can be pushing i/o to both the ports out of a single director and then I can do it from both directors to both ports and so I scale up the i/o this allows us to do some interesting thing around raid granularity reliability serviceability i/o balancing we're going to talk in a minute about how we wire all this together but essentially one of the fundamental changes we made when we went from the older V maxes to the V Max 3 and B max all flash is we went to or we did the thing called local raid right we made a shift where instead of having the raid be distributed across the directors across the entire box we distributed it across the two directors and a brick and managed the raid there right and that allowed us to do a few things around rebuilds that we're pretty fantastic we shrunk our rebuild times by almost an order of magnitude just because everything stayed local you weren't doing queries across the global cache all that stuff went away moving this to nvme allows us to do things that even a little different now with a single group of drives we can have half the capacity active on one director and half on the other because they get separate queues and they can manage it all separately and they drive all that so we've actually cut the rebuild times by another 60 Plus percent while increasing the I ops four drop all right and driving down late and see all at the same time so you're creating two name spaces on each drive and assigning one to each controller we're actually creating 64 namespaces on each Drive and the setting 32 of these and the unit of the de andrade is a namespace yeah essentially well I actually got a slide on the raid okay coming up and about wait for that yeah so we have the controllers this isn't this hasn't changed a lot but the idea is we have the ability to do CPU pooling all right so as we have different controllers with different number of CPU cores all we do is change the number of threads in each of the pools it's very easy to adapt to new chips it's very easy to scale the architecture it also allows us to gain benefits from things like if I have the same things running on both threads of the same CPU I have more data or more code in the CPU memory cache that actually does make some difference I can also do fault isolation so I can do something like gee I've got a problem here in my back-end I can reboot the back-end pool while the front-end keeps running all right so I actually have fault isolation I can restart chunks of software because they're independent so it gives me fault enhancement it gives me performance enhancement it gives me a bility to scale and I do have the ability to go in and grab threads and move them from one pool to another if I so choose all right so how do we build the hardware for this thing because we're only into a few slides of hardware and then we're back to the software because in today's world software really does run everything right everybody wants to buy s RDF and Carmack's is what they buy to run it on hey it's the it's the hardware that supports the software that drives the business that makes it all work so we scaled this out as a multi brick architecture each one it starts with a dual director engine the engine has the CPUs the cache the access ports the nvme drives for vault and the connections to the nvm he drives on the backend the daes and nvme drives are also part of that brick and we have a virtual matrix connection that allows you to do InfiniBand connections to the other directors and then we have ethernet phy management so it's a fairly simple set of components the other piece you see here is that's the standby power supply all right so we actually have batteries in the box for each of these because we have data sitting in cash that is not written to disk alright when you write we write it to two copies of volatile memory we acknowledge it and move on in the event of power loss the batteries here hold the director up we vault to flash Lix in the director and shutdown all right so the DA's are not powered by batteries the DA is just hold drives but the individual directors that hold cash have vault selects and we vault to those so it gives us the ability to live through power outages even like we had that wonderful power outage here in the Northeast what was that bad 15-20 years ago right and it was down for 72 hours we had a grid outage well if you have things living in batteries where the batteries lived 20 for your second outage was all the data you had buffered in cash that was gone right so now you lost essentially all the contents of the array because you didn't know what made it to disk and what didn't and you don't need a secondary disaster like that so we vault everything before we shut down completely the engine Hardware view on the front side you've got fans you've got power so we have actually the interesting thing we have for power supplies here right not two but each director has dual power supplies one off a and one off beat so you have to lose three power supplies before you're down to a single point of failure on power right because if you lose these two yeah that director goes off on this guy's still got redundant power and going so I can lose two power supplies here plus one of my power feeds to the wall and this guy is still up and happy and running and so we've designed this to be a paranoid level of I get power no matter what behind these fans are the directors the directors pull out the front of the box you notice there's no cables on the front there's just the fans so when I have to pull the big director out swap it right I don't have to go past the whole bunch of cables I just pull it out the front swap it and push it back in guess what we learned to do that because we used to pull them out the back and the pull them out the back you got a whole bunch of cables and when you put the seee in there under urgent I got to do this I popped the wrong cable you know things go badly because you already have something pulled out right so we found ways to fix all those problems there's nothing nothing like learning from experience we have an 11 slot director that allows us to do all these backend slots so we have the management modules that control the way the directors boot we have the vault slicks for putting data on we have the InfiniBand connectors and then we have all the front-end ports as well as the drive ports all right so lots of connections going on here we actually interestingly these little management modules control the booting of the director so they talk to each other over Ethernet they're actually what's drives the box as it boots up until it's operating and then the directors take over and across the fibre but because this is effectively an MTP computer where you're sharing the global cache and everybody has to play nice you can't have one director deciding to come up and just start doing things in global memory while the others are in a different phase right so we have these little guys able to communicate otherwise outside the fabric and say yeah ok we all agree we're at the stage now we can go to the next stage right and synchronize all those kinds of things so loads of fun there all right the bricks we talked about CPU cores earlier this is 12 4 and this is 18 core right so that's the big difference in the performance you're seeing between the 2000 has the ability to do mainframe it also has the extra course other than that the front-end port counts are the same 33 points per brick 33 ports per brick things scale up and have fun in the past the 950 only had 24 ports per brick here so we're getting better ports as well as better density in terms of the director architecture a single director looks like this we have the CPUs with their memory PCI switching obviously this is kind of a overview oversimplification because there's a whole lot of parts and that's a challenging to draw PCIe talks to the system interface board InfiniBand to the other directors we have pci going directly to the other director in the same brook which we use for raid communications when we're doing rebuilds across drives in the same brick and we also use it for other pass-through kinds of things and in the mm since we don't actually have an external fabric alright these ports talk to the two other directors in the other brick and this is the way to talk to the director in the same breath so we emulate a fabric without acting and having to have any room for a switch or the cost of a switch we do all that in separate ways we then have the data reduction module so this is the hardware that provides all the new data reduction this does compression it'll do the LGS compression actually a whole slide on that so I'll just wait for that we have a nice hardware module does all our compression stuff we have the vault slits this is where we store the data in case of power failure we have yes those are dedicated to vault they're not counted as part of the usable space in the system they're just part of you bought the system and based on the cash you put in it we put the right amount of vault in a way we go management modules I talked about this is also how we connect in for service for our external management module so we have the phone home with the esrs gateway we also have the ability for the local customer service guys to come and plug in these would be the Ethernet ports that they plug into so they can talk to all the directors and figure out exactly what's going on in the box so all those kinds of connections are provided through the management modules we have the backend i/o modules so I've got PCIe coming in here in the next slide I'll talk about how that cascades down all the way to the drives and then we have the front-end modules doing again I scuzzy doing the fibre channel doing ficon however you want to connect things right Vince do you have any feel for what percentage of Mac's customers run I scuzzy so I have a number of very large customers meaning probably five to ten percent of my very large customers who either have moved to a buying a number of all I scuzzy boxes or are planning to head that way and then I have probably 80% of my large customers who say no and the primary reason they say they're not going there is that the storage team in the network team don't play nicely together and are run by separate managers right yeah I kind of knew that partner right well and so or they already have fibre in place well who would buy a new fiber channel infrastructure right you don't want to go implement a whole new infrastructure we again we have some folks who are moving to all I scuzzy because they're kind of done having two infrastructures and the storage and network guys roll up into the same management structure and have clear agreement on who's going to do what and understand what the bandwidth needs are and such because the first time the the storage guys come to the IP guys and say we need a network that looks like this for redundancy for I scuzzy well first of all the storage the storage guys are talking about things they don't know as well as they should maybe in terms of IP but the network guys like I've been doing this for a long time I know what you need you know what you're talking about just sit and shut up right and then of course they do it wrong yes got if it's the only thing that I would add to that is as Vince said we do have a handful of very large customers that are going that way and as a result what's that what what that's done is sort of shifted some of our development efforts and priorities to pay much more attention to I scuzzy just in the last couple of years and I think we've done in the past so when you look at the automation and the reporting and some of the things that we can now support specifically for ice because the environments I think it's certainly much better than we were before so there is there is interest and there is some pull but as Vince said it's not it's not a what I would consider to be a a trend with the the ice cozy interface has virtual front-end ports you can have virtual multiple IP addresses on the same port on multiple VLANs and you know you can manage it really well it's not like the the early versions of ice guys he was yeah we can have one address on this port do whatever you want with it right it really does act almost like virtual arrays on the front end and allows you do some very nice things so it's become of real interest to a number of the big companies for how they're going to do things so in terms of the rest of the architecture drilling down to the Dae right we talked on the last slide about the CPUs and their memory and the PCI switches and the backend i/o modules I dropped those down onto the bottom this frame here so we can draw the rest of the connections so each back-end i/o module has two cables each cable is four lanes of PCIe and so I've got this cable here going to one link control card and this cable going to the other one right so everything's cross connected and then each of the link control cards is connected to one port on each of the drives all right so this is all redundantly connected if I lose a link control card I still have full access from both back-end modules to every drive if I lose cable here I still have full access through that same card to all the drives so if you look at this one director I actually have four connections to each drive like that to back-end modules with two ports that I can go through and then what we do is we set up separate queues as if each of those is a separate logical option so that if I lose anything along the way I can drop some of those connections but the rest are already there so I'm not doing discovery and trying to build new PCIe connections and do other things as there are failures happening I simply let some goop drop away and bring them back and there's another code that's not another director here right which is totally connected to all these link control cards and all that stuff well this is the director so it's two separate controllers I have two slaves okay right here yes then there's another whole director that has another set of four cables right right so a Dae has eight connections 16 32 lanes to two controllers right two breaks yeah except in the 8,000 where we share one between bricks so it actually has double that okay yeah so the each link control card actually has eight ports on it and we only use all eight ports in this certain configuration in the 8,000 well yeah everything's direct connected there's no daisy chains there's none of that stuff it's all directly works we're done we're done in doubt the Y's until like it should be yeah yep all right so now we get to the virtual matrix so our matrix has the ability we have the cash slots we have the track tables right on a single brick now we add a second brick what do we do well we add a fabric we cross connect everything we extend the fabric we extend the metadata we send the cash we extend everything across and then if you add more right we just scale that on out to eight so you have the ability to start at one or two or three and add in increments of one or two or four or whatever until you get to eight right unless of course you start in the two thousand which case you can go from one to two because you don't have the fabric and in terms of scale you mentioned earlier that the scale was not linear in the 8,000 it is so on the 8,000 if we do this with two bricks we drew a linear line that says two four six eight being the yellow dots in here that you can somewhat see are the gold dots and then the a cache miss right reads off the back end kind of go like this very close to the gold and then the light blue is the 128k read off drives and it's within the testing margin of error on the on the test to say yeah this is basically a linear scale right some test this will go a little below some tests will go a little above it depends on the past version of the test but you're within a few percent of of linear across the bricks these are cache misses so cache hits even better than that cash cash it's you get much bigger numbers and you have to kind of hide a bunch of the fabric pieces you can't see some of this as well right it just it changes things so we use the cache miss for this because it does the best job of testing how well are you scaling in those pieces and yes this is my test I get to decide what to test now I have you go back to the director view where I had all the hardware this one said thank you yeah I'll be posted up and you can have pictures and talk through them and whatever else you want there you go all right so as we look at the scale right this gives us on the mm the ability to start it at a single brick alright so we have the standby power supply we have the engine we have the D the two DA's and then we do the second brick right on top of that and in half Iraq you have a full two Burak system again no fabric switch because we use the InfiniBand connections and then we talked to the director in the st. the other director in the same brick over the cross memory interconnect which is a PCIe connection I've got two daes which support 24 drives per Dae so I have 48 drives per Rick and I scale that up I can go from a 1.9 47.7 terabyte drive our expectation is based on what's going on in the industry by the time we do things in the first half of next year we'll be offering a 15 point 4 terabyte version of the MDM you do a 4 to drive so that you know the drives keep growing and the interesting thing is as we look out right a year after that it's a 30 terabyte and a year after that it's a 60 terabyte and here after that it's over a hundred terabytes and you know what do you do with a single raid group that gives you over a petabyte of effective capacity when somebody says I want to buy a small box right you don't you tell it to them because the failure domain is just way too big let's see the rebuild times on these have been scaling up so nicely that it's going to be interesting to watch where it goes I mean our rebuild times on the seven point seven terabyte drives in this box are under two hours so you know given that the failure rate the replacement rate is every two and a half million hours and we have zero heart failures so far in the flash drives on these include neon v-max 3 and v-max all flash right until you get to the first heart failure you don't have to worry about the dual failure so we're feeling very comfortable still with 7 plus 1 raid 5 on these drums there's something there it says one gigabyte nvme what does that mean where does it say that all right as I can lie I was wondering about that 24 drives comma 1 gigabyte nvme so each of the each of the PCI links is a 1 gig each Lane is one gig it's fine but there's four lanes right in general going on things so sorry I didn't forgot somebody put that in there yeah that's how they spec the drives expect it as though the nvme is one gig but because it's multi lanes on each of the and then of course you have dual ports so you get tons of bandwidth not saying that the drive can sustain that but we have lots of bandwidth to the drives what's the rebuild time again under two and a half hours for a seven point seven terabyte drive yeah it's it's gotten amazingly fast again the nice thing with what we're doing with power max is the rebuild now uses both directors at the same time the benefit of the nvme drives the rights a new more io and you've got the benefit of now using two directors instead of one so things just accelerate and are one of them in terms of the 8000 it's the same kind of thing you start with an engine you start with the two daes you have your power supplies and such and we want to grow you add the fabric switches up top and then you add a second engine and one more Dae so Dae number two here's actually shared half the slots go to engine one and a half good engine to and why did we do that because when you build out two of these it fits at a forty you rack and the truth is we don't need this gives you 32 drives usable fervor act plus a spare right so you don't need more drives than that in general to drive the capacity it's a similar Drive count to what we've got on the 2000 but it gets us all in a tight rack which means we scale this up I can get to this eight brick configuration and keep the whole thing in two racks it's really dense the customers really like that the other option was right to do two bricks each and then you do three in the first rack and three in the second rack and two in the third rack and everybody's looking at you like you got three heads it's like yeah okay we're just going to skip a Dae in each gun on each other every other one to be happy close enough now the interesting thing is when you if you look at the v-max all flash 950 you can do nineteen hundred and twenty drives in that thing right this is two hundred and fifty six drives plus spares it's an order of magnitude lower and drive count but we just also why the power for the maximum system is 20 kPa instead of 34 all right you drop the drive count drown you drop the Dae sighs down you drop down all the power for all that you drop down the floorspace okay you get benefits that the only challenge is you have fewer drivers but these drives are so fast fewer things to fail fewer things to fail and honestly in the 950 we didn't have anybody buying them and he drives right there were still some people were saying I want to buy 1.9 terabyte drives because I want lots of drives cuz that's how I get performance we've pretty much gotten them over that and then we'll either keep my 950 s or though by this I'm glad to hear that you know we've talked about the fact that we moved away from Drive somebody pointed out last year the number one selling external storage type was hybrid this year the number one selling storage type is likely to be hybrid next year the number one external selling storage type is likely to be hybrid right the world is still doing a lot of hybrid stuff but not in our market right we've moved to all-flash we're not going back we do want to make sure we address how people are doing lower-cost things in this market and so we've done again weave in deduplication we've got better compression we're doing things to drive down the act to make the economics work better we're also looking at how do we incorporate things like becoming much larger drives as you look at the 30 terabyte drives as you look at the 60 terabyte drives or the 100-plus terabyte drives how do we incorporate those and make all that work in a way that's meaningful and can help us to drive down costs somewhere obviously it's only worth it if drives down cost somewhere back there you said you've never seen a hard failure in all the years of v-max flash all flash so we've been we've been selling v-max through with flash drives for the whole field base well I eat me keep me honest on this six years 10 years 10 you know that's a 10-year halter huh no I mean we really said flash over v-max yep no your general over a decade Vince Vince it started in 2008 when we introduced the DMX we had a number of customers that had it right system up with flash content Eurasians yes so it's been a decade and in that time so in the news I don't know there may have been one and I haven't I haven't tracked it for that for the DMX for the v-max ones and twos once we went to v-max 3 and we dropped our rebuilds and changed our sparing types and all that the things are so low we have not seen a flash drive get to the point of failure before we copied all the data off and we're on our spares right so we've had zero flow so you've got preventive maintenance - two flash drive is failing and like it's starting to fail we get off of it on the hard drives part of the reason we don't have hard drives anymore is 90% of the time when the hard drive said it was gonna fail is I think I'm feeling right and it was gone and all you could do is rebuild and of course if one of the others did the same thing and that was a problem and when did they get the most stress during a rebuild because you didn't use the hard drives if you didn't have to at any time but during rebuild you have to hammer them all and so what you would see is one would fail and another would go oh I'm not feeling so good so please write get off that thing before there's a second failure we've had second failures in mirrored pairs on spinning drives right it doesn't matter what size the raid group yeah it doesn't matter what size the raid group is it happens the MTBF number is our mean time between replacement numbers I should say on the flash drive is three times what the replacement is on hard drive thread so instead of eight hundred thousand hours is two and a half million hours and they're not hard failures so we get proactive indications we replace them and it's just done and that's the reason we do seven plus one we also do it because most of our customers who really value this data are running s RTF and they got a remote copy and we can use that to rebuild the spare if we needed to all right so we've got belt-and-suspenders we're only so worried about dual flash powder all right so let's talk about the hardware section of this before you move on okay is our is s RDF smart enough now to send a drives data over that link or you ester DFS or are you replicating the volume back so when we do s RDF we mirror and we mirror individual tracks between the arrays we don't care about the underlying drive technology I have a certain set of tracks that I've designated as a LUN that I'm mirroring between these two frames right and if my local raid can't provide access to that track I will read it from the remote drive at a track level granularity okay track is 108 okay so if I if I have a failure or somehow of my local raid and if there's one track five tracks whatever because I mean in the event of a failure it's usually down two tracks if I have a few tracks that I can't read I reach across the link and read those automatically never ask anybody again it's treated as a mirror on top of the local raid over the track site 128k yes weird right and I have I have the which one is valid where at all times and Cacambo systems right so this is that's a cache to cache thing I don't care about the underlying so I can have hard drives on one side and flash on the other and my video application we're gonna call that a solo or again all that's below that's inside the box and we don't care so I started yeah just manages it the logical track level SRT F will compress the data and send it and uncompress it but the actual back-end storage of compressed or D dupe data is doesn't matter to RDF it's all hidden so if the data is d duped it's possible that the logical track could actually be sent across us RDF and it could be duped away or something like that right alright so if you look at refresh technology we look at the old twenty KS amazingly the old v-max twenty KS are now end of sale and almost ends of support life they've got what a year to go Scotty something like that yeah so next September of those are going to be ending service life so it's time to get going these boxes typically had a lot of spinning 300 gig drives so an 800 terabyte box right was a couple thousand spinning hard drives that took up nine days of system floor space and was just a monster on power and cooling and all that stuff we can replace that with a couple bricks in an 8000 so half a cabinet here ten times the performance forty percent lower cost of ownership over ninety percent less energy over ninety percent less floor space and ninety percent ninety eight percent fewer drive replacements right you got 1/10 the number of drives at three times the reliability you got over 30 times the resilience of drives in the new configuration so it just makes a monster difference in terms of how often anybody touches this array to service anything because the drives have always been the most common point of failure and now the drives are all solid-state - so the whole thing solid-state so we don't have these guys running around swapping drives every day I mean keep in mind a thirty times reduction means if there was a guy there once a day's swapping drives he now comes out once a month right our CDs get to go do a lot of other things rather than swapping drives it's a really good thing all right so that's the hardware discussion ain't in any real quick um when's it appropriate to use a power max where um over extreme IO and would be a better fit and what about maybe the overlap between the two okay so as we look at the tier zero Tier one tier two classes of storage right we put power maxi in tier zero which in our mind means there are things like the non-disruptive upgrades meaning we load code at once and it all just goes and there's no nothing ever goes down nothing ever loses light right that's a fundamental thing of tier zero we are able to support all the platform types so whether it's IBM I IBM Z open systems whatever it all comes it all plays and we have the massive scale of this so we're able to give that the resilience on the two boxes is similar right the if you look at what we're doing for data reduction they're both doing d dupe and compression I would say that the extreme IO guys still win on D dupe all right we're not as good at it we're trying to learn but we have it and we're getting going if you had a VDI infrastructure where you wanted the best d dupe ever you should always be starting with extreme online right it's a great thing to do if you've got databases that tend to compress really well and not D dupe as well all right d max actually has better D dupe or better compression technology and we'll talk through the compression hardware and a few slides so we think you know if compression is your key thing the P max is going to do a better job of getting you value on compression then you're going to get from extreme you've got a couple of options it really depends on what do you need if you really advanced replication right if you want RDF if you want Metro right now your choice is paramount is that hope yep thank you although allow me to say I question anybody who puts how well at deed oops at the top of their ym by in the storage list I I don't presume to tell my customer how they could choose their storage that's my job but but yeah I sometimes have questions about where their priorities are but I try not to voice those too loudly because you know as the saying is the customer's always right and certainly the customer is the only one in the room with paycheck for me right I can't buy my storage so if he I have to either mm pretty small we make what kind of trade into I yet on my say x500 have you seen have you seen our so we'll talk about that the last slide in this we're gonna talk about we're doing trade ins and credits and all that stuff it's awesome alright so parallax and storage class memory and where we're going with that because we talked a bunch about what we're doing with nvme and how we've changed things and we're we're doing with storage class memory in the future so I have one of Kaitlyn's wonderful slides here that talks about the the magic we're unlocking with nvme as we talked about before with sass there's really a single cue right so you have all this wonderful power and all these CPUs and all this stuff getting it into a single cue to talk to this wonderfully fast flash drive and it was kind of silly and so now by moving to the nvme world we have multiple connections from the CPUs to the drives and we can really push the drives and get the extra performance out of them and that allowed us to allows us to open things up allows us to drive lower latency better throughput even on the same NAND architecture right it allows us to get more out of the flat fries or guys we're treating them as memory alright now the next thing we do is we coming along and we say okay now let's add some storage class memory to that and now that we've got an architecture this design so we can get the most value out of the back end let's go ahead and put storage class memory drives behind that so we can push them I mean I think you can see if you took storage class memory drives and you put them behind a SAS architecture right it would kind of be a waste the SAS protocol time is longer than the response time on the storage class memory right I mean the the protocol overhead would be silly so this really allows us to take advantage of things like the reduction in response time you get from moving a storage class memory and to make that point very clear the light blue line here is the SAS base NAND media the Gold Line is the nvme base NAND media and the dark blue line that goes up somewhere out here is the 3d crosspoint source class memory so this we eliminated all the other pieces this is just a drive on a cable talking to a CPU drive Yeah right this doesn't have all the magic of PowerMax in next to it or anything else this is just looking at what the drive can do we see the same things when we put them in power backs but for the simple version of the view this is just a drive directly on a on a controller and go all right and what we see is as you add right so this is the right pressure on the drive so as you read megabytes per second of writes your read response time goes up and the read response time on the SAS drive is going up it's again I've got the 1q and the reads and writes are going in the same queue and I get things fighting with each other and and fairly quickly my response time moves from 200 microseconds up for 1/2 millisecond right when I change over to an nvme communication on the same basic Drive I'm able to first of all drop out the SAS protocol overhead and then I'm able to extend out what I can do on the drive before I start hitting the actual issues of what can I push into the NAND media and then with the media I start rising up the curve because I'm now waiting for the rights to happen in the backend and buffer its random reads this is random reads with sequential rights the rights can be sequential random doesn't really matter of it yeah these are just large block right and then just random reads going to places in memory because the drive internally is doing garbage collection and other things right so as I'm doing this my drive is doing lots of backend work and so the more right side get put in here at some point the response times on reads start going up so this allows us to because our drives tend to run 50 to 100 megabytes a second of writes this allows us to double the right pressure I've got going on in the drive in an array and yet have a huge difference in the effective response time in the backend right which means a difference in the response time on random reads out the front to my users this one down here 20 30 40 it depends on which drives are getting in firmware running and what else you have going on but yes so 50 mics right which is why I said the the protocol latency here for SAS was well over 50 microseconds and the latency on this drive is well under 50 microseconds so if you ever put one of these on a SAS protocol right you double the response time on the drive just from the SAS protocol so there's no way you would ever want to do that and the knee of the curve on the 3d crosspoint drives is somewhere around one six one seven one point seven gigabytes a second before it starts going up right I say these things are you know these are one right per day drives and this is like jokingly call it a thousand right four day drive because it seems to act like that even though in the real world it's more like sixty or eighty rights a day but it doesn't matter once you're over a ten it's huge so this thing just eats rights for breakfast it's memory it's real random memory and then in just can't handle it the same way because you have to do the block erase right you're doing 32 mega or large or block erases in the NAND before you can write to it so you've got different ways of managing things and SCM doesn't have garbage collection you know it hides it all much better let's say that much much better yeah again it's designed as memory it's designed to run at these kinds of speeds and so you've got just a whole different bandwidth capability for writes so it just really changes the world now having said that these drives are not cheap right the current pricing from Intel is four times or so the cost of the nvme drives or the nvme NAND drives so you have to figure out how much it's worth to you to get that latency difference and and you couldn't get enough anyway so the largest drives when we start shipping these the first half of next year the largest drive will be shipping on these will be a 1.5 terabyte drive so a single raid group is 10 terabytes yeah and it says won't you 32 drives maybe a 40 terabyte brick you spend a lot of money for those drives in that brick to get 40 terabytes of usable flattish but it makes a great way a great place to put your hottest data and oh gee we already know which data is hot we might be able to use that alright so we have the wonderful intel octane boards we have this up here in part because everything runs on Intel and chips in our box and we're looking forward to this but also because Intel has been a great partner they spend money with us at Deltek world and such right we're very closely tied with Intel and the graphic up here does a nice job of giving you an idea of where they came up with the 3d crosspoint name right because it really does look like a 3d set of 3d lattice with Kinect of points to drive all the memory so what we've been doing we did discuss E and hard drives internally then we did pass them and single layer cells move to nvme and man now and early next year storage class family all right no news there this was all announced at deltek world in beginning of a poor end of April beginning of May but there you go so that's our plan and that's where we're going and that's how we're making all the magic happen we'll talk about how we're going to manage things across sem and env Ameen and in a bit
Info
Channel: Tech Field Day
Views: 10,071
Rating: 4.8481011 out of 5
Keywords: Tech Field Day, TFD, Storage Field Day, Storage Field Day 16, SFD, SFD16, Dell EMC, Vince Westin, PowerMax
Id: d6gR4kGntv8
Channel Id: undefined
Length: 49min 8sec (2948 seconds)
Published: Thu Jun 28 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.