Introducing Dell EMC XtremIO X2 with Todd Toles

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
good morning my name is Todd tolls I am the CTO of the modern infrastructure team on an infrastructure team my focus specifically is high-end storage and today we're gonna talk about one of our high-end storage products called extra mile x2 I've been here since 2007 2007 and was one of the original extreme IO team members in pre-sales role so also what I introduced right I mitzvah Schneider I'm the chief architect for extreme my own I've been with extreme layoffs from 2011 late 2011 different roles within the engineering and currently having the world for about an year and a half of the chief architect so first off let's go into what is x2 steven is right right so we've done a lot of work on the monetization of extreme IO 2x2 however some of the core pinnings of x1 have been continued to x2 right so things like consistent performance everything we're doing and as an x1 and x2 is all done in life right so all our data services the data reduction be compression deduplication snapshots all of that is all done in line right so efficiency right so as Vincent Vince mentioned earlier in the in the v-max all flash presentation extreme out gives us unparalleled efficiency and we'll kind of talk about that and talk about some improvements that we've done and X 2 and finally application integrated copies which is a stellar function of extra mile let me give you an example so one of my customers runs an Oracle database it happens to rent pricing for a fast food company every time they have an application profits an Oracle database running Oracle applications every time they have a application problem and make a copy of that database in the past they can make about 4 copies because I had to make a physical copy right that's being very expensive last I looked at making 80 copies with extreme io so 10 terabyte database 80 times about 800 terabytes of space 13 you itch in an extreme i/o configuration so very dense but all integrated with the application so they want to make a copy they it's completely integrated it drives through VMware and they instantiate another copy to their farm the developers can take a look at it when they're done with their testing they get a snapshot and everybody goes back to normal so that's one of the things that we deliver with with it with extreme i/o at Bolton x1 and now here in x1 and x2 so what are we doing right so our design goals right so you always have goals when you do a new mom major release so our goals were certainly to take the best of extreme I'll and make it everything better and we looked at and we listened to you the delegates here as well as our customers and said alright what do you want what what is what are you looking for in the next evolution of extreme i/o as we've moved x2 we kind of settled on five areas of innovation those 500 areas innovation is we've heard loud and clear we wanted the ability to go ahead and be more flexible in our ads right so in the past you basically to add an expert or a pair of X bricks which is 25 drives at a time next to we said all right let's just make a little simpler right so we can grow certainly can still continue to add experts but we can also add drives so we're gonna drives in as few as six drive increments let's talk about that as we go through we wanted to innovate on the software because that's always important right that's that's where the really the secret sauce is we're doing commodity hardware nothing proprietary and the hardware itself so we wanted to innovate on that and zippy is going to talk a little bit about that one of our feature is called rate boost that's innovation on that the second is we wanted to double down a nice CDM because we think it's an important use case because customers have lots of copies of data right so if you talk to some of the industry like IDC for example would say 60 percent of data is copies so we want to innovate on that and we wanted to take advantage of the customer example I gave you before is an example of that we wanted to you have a completely html5 UI we've delivered on that we've actually deliver on that both an x1 and x2 and remove the last vestiges of Java from the product because we know everybody loves Java and all the security issues associated with that so yes I am NOT a Java drinker I'm a soda drinker so I've been no Java I'm no Java bandwagon for a long time I didn't finally we want to innovate on the areas of replication so take and leverage what we're already doing with our snapshot technology and lever and leverage that to do native replication or summers they're looking go ahead and do that alright so let's look at that kind of the architecture right so things that are the same from x1 to x2 right so it's far out scale out our metadata and our content dressing everything we do is done in memory we're not doing metadata D stage or metadata cache where we cache to metadata things get busy we have to D stage that we don't want to do that and the reason why is consistency of i/o so if we're looking at consistency of i/o across the platform we want to know it's very deterministic and we're giving you some very excellent an x2 for example and I'll show you a couple examples of that we're delivering response times in the point-to-point three millisecond response time so very quick for more nuts response time standpoint and of course we want to do that while keeping all the data services intact and and doubling down on them so doing some enhancements in the compression area for example all right so how things work right just kind of a review for those of you who are not familiar with extreme i/o our technology is all based on content and where we have an incoming data stream and we run it through the content through that's called the content engine so the first thing we do is we do a fingerprint hunk of all the hack but it's a fingerprint right so we only want to make sure that they use unique blocks are written to the flash media and written once we're not one of the things that you always have to be concerned about in all flash is write amplification right so if you're writing data to disk and then later on you're going and doing some processing you're coming back and you're moving things around you're one thing one of the thing you're doing is you're stealing way performance if you're doing right man plication the second thing is you're consuming flash media which endurance becomes an issue and then finally we want to be we want to be very on a performance be very consistent so we do everything so that we only write the data to the flash media once all right so kind it's kind of takes the box data all right so I've take some blocks of data and just some random much our hashes are much longer in this but the purpose of discussion let's make them very small we go ahead in this case we're going to take just the first byte and kind of look at that across and we're gonna use that to distribute across the environment that's the core of how extreme I'll works so that we like to say everything in balance across the environment and four is the CPU memory and disks and this is the core technology of how we do that we palet we use that hash to go ahead and balance across our environment that's got taken a fancy ation of that or look at that so one of the one of the things is we of course support multiple experts remember everything is as multiple controller so in this case it's two experts four controllers over 50% of our install base is multi controller so it's not just hey we're doing we just have a single expert with two controllers over 50% is multiple controllers in the environment let's take that incoming data stream and I'm taking kind of just a nice first bite I'll just look at it and though the algorithm does a lot more than that but for purposes of illustration we'll just use that first bite so I'm gonna take that first bite we're to distribute it across the environment such that each in each of the controllers has a small component of it right so let's split out so that evenly balanced across the environment but now take that same data stream that came in across the environment and now if you notice there are two that are the same right there in kind of a little darker blue here in the environment and we're gonna kind of process them through right so we're redistributing them and we say all right based on the first bite here's what I want to do so I'll go through and say all right this particular block goes here goes here so forth and so on and as we're doing that as the i/o is coming in we're not doing it any post process we're not doing it later on we're doing it as we go but you are doing it in one of those controllers I'm sorry you are doing it in one of the four controllers and the two bricks down there because you're making it look in the diet in the animation like it's an external process no no yeah it is it's happening across here but but for example if these if these strings came through right let me kind of go back through here these strings came through here they could be coming across multiple interfaces there don't have to come across all in one stream all in for us one interface remember everything gets all balanced in pavement I get it yeah there's no magic cloud absolutely no it's it's the true mile operating system that operates in a in a distributed fashion that makes sense yep okay so toad the assumption there is each brick controls in an area for the address space of the range of fingerprints that you you build yep when you expand and you by adding new bricks then clearly there's a distribution that needs to happen to reallocate the address space ranges so how does that happen if you're going to talk about that I'll obviously not preempt but what happens for the stuff that's already written and how does that look up for stuff that now is potentially on the wrong brick managed acceptance would look up or is it actually let's talk about I think it's the next opportunity because it's a important question because we're talking about multiple controllers here so first off let's take let's take the easiest scenario which is not the question you ask but I'll talk about it first what happens if we fails I'm taking the reverse right you're saying I'm adding well also we have to subtract right it's hardware eventually it's going to fail so let's take and let's say for purpose of discussion this particular controller fails what happens is these will be redistributed across the environment I mean what what we will do architectural II is say all right hmm this guy has to support the back end disk a little bit more than the other ones because he's going to be he's going to be talking to the back end dae i'm easier for me to draw it we like whiteboards hey I think you've got through that without doing whiteboard yeah we definitely have to do we have to fix that all right so X break 1 X brick 2/3 controller 1 I will try try to try a little bigger for those of you on the web stream my apologies all right and we have of course multiple the backend disk and in X 2 we haven't got to but we'll talk about a little bit uh now where you go between 18 and 72 disks but let's just say let's them have 18 then we'll talk about your expansion so we have 18 drives in here right so we're we're full all right so what happens here is let's say this particular controller fails what happens is its memory space it's a mcc's tree mile there is kind of a front-end minute minim and backing processing it's our CD is our name for it and what happens is we go ahead and redistribute that so we know that for example if this storage controller fails this guy has to do more work on the back end right so it's not smart for me to just go say hey you know what this partner pair needs to take all the work that's dumb right we don't like that so let's say let's go saying that this guy's going to do some more D works and back-end work these guys will take some of the C work from this guy and the midplane work so what will happen is if for example before we were at let's turn do this to make my math easy right so if we were at 50% 50% 50% and 50% from a cpu perspective so it's 200% if I take this guy away each of these guys if I do my math right should be about at 67% so we rebalance across the environment so it's a little it's a little different than a dual controller architecture right so I'm dual control architect if this fails I got to take everything over here right so I'm doing intelligent failover to not overfill over logically as well as physically so let's take your second question what's your second question tonight can i yeah I did something just to clarify so in this specific scenario that Todd mentioned so obviously we don't relocate any data so we use the redundancy of the other storage controller to access the data what we do use is the architecture of extreme area with the two layers we have the logical layer which was also segmented and distributed and the fill and the hash space you you can call it layer so the hair space must stay here so this controller does more work now but in order to balance this because we have this ability of scalar architecture we can move the logical pieces and spread them around and make the impact of this failure to be 1 to 1 over N and not half for some of the IO years so that's what he's talking about so here there'll be no data allocation obviously yeah so it's ok Eggman will be but when you talk frame sorry I know we we had very much leeway but when you look at V Max and you lose the director then you lose a 1 over end of of of your performance you're talking about something that gives a similar sort of architecture where loss of a an individual controller doesn't doesn't adversely affect you only lose that one fraction yeah but well what's the actual impact when one occurs in terms of performance then how long does that relocate in that that rebalance process take and what is the end user so what we're talking about here about is only kind of rebalancing of logical space this happens really quickly it's it's not about moving data so it's a it's a it's a very quick process it what's really quickly because there's no hashes yeah no no it doesn't doesn't it's only lazily reconstruct these like microseconds is it microseconds before it happens a bit more it's more of a I assume seconds or something yeah so so basically what happens is we go we need to go ahead and reap and redistribute across and REE protect journals for example that has to happen before we take any more i/o is that that's the got to make sure that that's what I was asking the question the metadata as V mentioned loads later right so that all we have to do is basically say re recompute where things go move them around and then go ahead and read make sure every predicted journals and then I basically start out so that's a short process right so that's measured okay so sc2 on brick to fails yes reads to the data that's on brick to are processed by ASC one yeah because that's the only place the Davises yes truth cause almost all multi controller architectures so writes but brick to was responsible for some range of the hash press so you're temporarily reallocating some of that hair space to other bricks no hash space is still managed here so we have a space and logical space okay the head face must be attached to the must be on a controller that physically sees those right so the logic base this this is this is kind of additional optimization to really balance the resources that we move the logical space here yeah okay I'm I'm let me go through yeah so I write data that's in the hash space is managed by brick - yeah sg-1 still handles that 5 SC 1 brick back in yes yeah like the D which is the back in the backend the workload but remember okay but in convert what we've taken from as the connect to host overhead and the compress overhead but not the actually we still host because you still want to take IO here because I hosted IO okay yeah yeah the problem is you've talked about there's a front in the middle end in the back end right what you haven't said what the front end does what the middle end does and what the back end does you're assuming we know okay so you have the front end is only handling the the target the physical connectivities metal host then the logical the the logical layer the most important the metadata layer of the logical space what we call the computer model the C module dependent in that Amanat terminology this is the this is kind of what handles the addresses and everything okay and snapchat layers and does and this will the metadata handling in the yeah cash generation generation it depends yeah it will happen in that lab later on you have the back end well you have the hair space and this has spaces attached to a specific set of drive there so obviously the data eventually will get here and will be the earlier the earlier explanation sounded like if I did a rolling upgrade the hash space would end up getting scattered around that sense you don't want to do that right this is very different from scale out in that sense so going back to the creation of your hash is a few questions first of all is your algorithm that creates hashes mathematically guaranteed to create a completely balanced approach across all bricks yes yeah guaranteed is a hard word but sha-1 does that pretty natively Barrett is a sha-1 is it yes a version it's a it's a version of shower and depends on the data how do you can't guarantee yeah but the it was so salient and also the I'm assuming it's based on fixed blocks of what size 15 16 K yeah we'll set a next to change yes excellent next to X x1 was 8 X 2 is 16 and we'll show you is II will show you when we talk about right this that we actually did some interesting things that actually improve performance on all smaller block iOS is normally you think you make a big blocks a bigger block size your your smaller block than that io performance doesn't do it well swims we actually are better x1 was originally full wasn't this was two generations yeah your dinner isn't judiciary is the software I'm sorry another question that I missed okay getting back to your original question I haven't forgot it yes I was alright so so there's two permutations right so they let me take the ease I hope is the easier for mutation that's the other one was easy but it's ok let's first let's add drives so let's do this 18 and add 6 Drive so now I have 24 drives in the configuration I think we move data don't know no way we don't we don't move data because underneath the scenes all we do is inside extreme io basically the algorithm says all right I'm going to look for the emptiest stripe ah and by adding drives I now have an empty stripe so I don't write I don't actually move data around there's no point in moving data around there's no advantage from a performance point of view or from a capacity avoid of you to add to add to actually move the data around so the Kazakhs tends to the brick but not to the drive as opposed to say SolidFire thank again the the content addressability solely the content addressability is a pointer to a brick it doesn't extend all the way down to the SSD drives the drives their expertise a big pool of difficult as opposed to say a SolidFire from a drive in a node no no no we we don't have we don't want to have any button like we want to use that all the resources evenly and evenly distribute it what will happen is in this space is they will get because they're impure stripes and emptier stripes get they get new data they will get more data new data it will balance over time yeah so rather than forcing the move for no royal performance advantage we just say let us just naturally naturally happen let it happen yeah that makes sense alright now let's take your front your real question which was I want to I want to add an expert so I'm gonna add that expert here in the environment so I say alright I'm gonna add the X brick in the environment and I'm gonna have 24 drives because I've already added drives just just to show you in the environment all right so here's the interesting thing that happens so what we do from OCE our online cluster expansion point of view as we go ahead and add the compute first right so if we think of the are the C and the D we basically add and do these first and we add these we have the CPU we have additional CPU right away um so and then what we're able to do is the on the D point of view we're able to reallocate the data across in the background so we actually get more performance once we had the expert and during the time of migration we get as good or better performances which we had before because we have the additional CPU and remember SSDs any toughest all flash array your bottleneck is always the as always the CPU never almost never the drives in the environment so that's that's what we do so for example if this is ah let's see here and I'm going to make my math good here so if this is 90% full and this is 90% 90% we're set full I'm gonna drain I'll drain and then I'll should go I do my math right I might fly I think I should be going on to 60% something like that I will be rebalanced across T if I'm at that's environment we actually do move data because we want to take advantage of this the SSDs in this environment so we do move the data so what what triggers that move what your triggers that move the the online so the online expansion process is is basically triggered in that we first add the we is a control event some ways our service professional services group we go ahead and add the experts first okay wait wait did you just say to add a brick to an existing cluster I need professional services yes okay and and if you still want to upgrade an extreme IO solution you still have to call support as well yes yes we is it is on all of our high end platforms it is delivered a hundred percent by yes professional services across all of all of our high-end platforms can you talk about some of the automatic housekeeping processes that actually run as part of the extreme i/o subsystem would you about housekeeping process I'm sorry crc checksums ghost is I mean yeah I was gonna talk about that little later but I'm sure if you wanna cover it later we can we can so I have a whole section high availability in to me that all files in high village so if you don't mind if it's okay with you I'll defer that another quick question so at the forty thousand foot view yep so with the first generation of extreme IO like for example the XMS yep stream management system you could have a virtual appliance and also a hardware based solution strip as that changes they're still the same same I will tell you most of our customers are on virtuals but you can't have a physical as well remember it's a control plane right so I think of it more like a like a cisco san switch right or brocade have a control plane and you have the data plane the control plane actually go away and and the i/o processing continues to happen for example for using replication recovery point replication right now for example replication still continues to happen it's and and when that comes but you know when when the FB when the XM s comes back right you just lose actually you have lost access the control plane then we just go ahead and instantiate that make sense what about you have automatically back that up as well - and what about integration into the rest of the management ecosystem of the EMC sure so we talked a little bit about in the previous presentation the v-max presentation we talked a little bit about Viper controller your breasts are we have full support for that we full support into virtue of the vsi plugins for v center if you realized plugins also full support for that full support for Microsoft hyper-v plugins let's see here what else am I missing Splunk support for that and through OC code we have to store for things like doctor doctor and cinder send your support for OpenStack for example so if I've got v-max all flash and extreme IO I can use the same management components that I use for v-max or flash also with you can do at a viper controller level you can abstract that you extract the layer if you want that isn't that as possible yep you've done this prime we've done for provisioning right so if we're looking from a provisioning point of view if that's what you would wanted to go ahead and do that from a element manager point of view the element man is different right there not a common element manager between them and what's the a limit for the number of extreme i/o systems I can have with an XM s 8i that was same as x1 and x2 that has not changed so just to finish up on that one there is some physical data movement if you've got a reasonable degree of right activity depending on what it is there will be a natural sort of drift for the one that's a more empty to fill up because you'd be writing to that now because a certain percentage by just mass would would fall onto that so how much of that are you letting just happen and how much are you directing in the background is physical movement because there's a trade-off there that says if as long as you're not massively full if you were 99.9 percent it would be a problem you'd probably want to move it but if you're lower than that you might want to just let that drift occur because it would save IO at the back end I think if I understood your question correctly so I think there is there are two different aspects of it so first of all we are looking at the whatever different bricks so each brick is responsible for the capacity brings with it and then you go and you take the entire half space and you partition it into chunks and each such brick gets the chunk corresponding to his relative capacity okay so that's a that's a that's a fixed kind of responsibility for that state until you might scale out or something and it means that any hash for each hash according to this partitioning it will get to that place it means that the portion of aisles that will go to that to that a capacity element will be according to its ratio within the the cluster that this doesn't relate to the fact now to the fact that if those right by are free initially and then they get fooled so what happens let go over a third example of scale out so when we did this scale out operation we actually told the system okay partition now the the headspace according to the new couple to the to the portion of the capacity so every now every one third of the right will get to this cluster and these will keep continue getting also one third of the lights in the background on the online we have also the process that relocates all those hashes that that have been here and now according to the new scheme needs to move over here it will be moved it along the time thanks to the fact that we added here or control of the impact will be very low we do it anyway very very efficiently so the impact on the user will be as low as product so I get that but some of those will occur naturally because some of those will be rewrites of existing data that push that to that now rewrite of its remember it's stored by hash yeah new rights go 33rd now that's a real impact then so every single write that potentially changes data is a new completely new hatch so you can't expect that rewrite of existing data will actually dynamically move content because it'll rewrite the yes let me let me go with you so if I have for example the rewrite of an address or something and so what is your question will I gain from the fact that I wrote yeah yeah over here so the answer would be yes but you will yes I wouldn't get right out of here you because because what does it mean that I overall the data assuming it's not a digit our deco if the data here yeah and write a new data he'll so the data from here won't hmm right no need so you'll get it naturally and so that sort of maintaining snapshots and so I am right then sorry just to finish it up because I think this is quite poor but and so I was right originally sorry Howard and Burt what that means is as you write data the date will naturally drift to reduce purse itself simply because it's been rewritten so as a percentage of stuffs that you just never need to logically move because it's depending on the amount of right activity it will rebalance naturally so then there's a question about if you weren't at that 99 percent full on the older blocks if you run it like 60 percent then there's almost no need to rebalance cuz it'll just happen naturally before you answer that you could let me expand on it yes because it's a it's a content-addressable back end we add brick three brick three gets a section of is now responsible for a section of the hash space is that responsibility only on the read site only on the right side or on the read side because it seems to me that having hashes that brick three is responsible for that are on brick one because they haven't been moved yet would add complication read that's just a meditative read though isn't it redirect no latency yeah you all right so obviously reads that will come here we will find out that we don't have them and we'll need to relocate them okay right good so we do have because it's kinda release its content addressable yeah there is I want to move the day that they're pointing is the event we don't want to stay in a state of more we haven't moved the data we want to end up yeah it about if it was a if it was a more traditional back end it wouldn't be much less important to move the data that the things Chris is referring to right yeah a lot of metadata driven content dress blog because it's content addressable architecture and there's the content is in the wrong place and therefore those redirects so I'm not gonna find let's move on cuz that's not wrong I'm not arbitrating to say we'll have a discussion with her later but no so that about I think you've answered what I was trying to get to so rather than for the sake of things carry on yeah so our goal eventually who do balance the capacities we need to remember that the capacity is not necessarily proportional in any way to the bandwidth you have especially as you grow you scale up the capacity so you have you might maybe you don't increase your I ups but you do go up in capacity and you wanna reach a point where all the drives are evenly loaded and if Bizzy gets the same rate to really have yeah the food performance for the system all right moving on alright I think we have to actually talk through this pretty well on the slides or on the discussion so I'm not going to I'll just get that all right so the data service is everything we do always in line that has not to change that's true on x1 that's true on x2 I would just did some little work in compression all right so what do we get from a from a performance standpoint right so certainly predictable performance right so this is an example of the web UI have more longer session I just want to show you a little bit about on this right so this is doing about cost 68,000 some change at both point 4 14 mil 0.14 milliseconds it's pretty good or two point three one three gigabytes a second this was happen to be booting desktops right 2,500 desktops simultaneously I actually have a four thousand example which I don't think I'm gonna get to but we you can certainly do a heck of a lot of whether it's be VDI or vsi from a high performance here as well
Info
Channel: Tech Field Day
Views: 4,032
Rating: 5 out of 5
Keywords: Tech Field Day, TFD, Storage Field Day, SFD, Storage Field Day 14, SFD14, Dell EMC, Todd Toles, XtremIO X2
Id: awtWh4J-P6Y
Channel Id: undefined
Length: 32min 25sec (1945 seconds)
Published: Fri Nov 10 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.