Pure Storage FlashArray History and Overview of New Technology

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so my name is Maya and I'm part of the flasher a product management team on the platform side and what I'm gonna do with the first job I'm gonna introduce two new exciting announcements on the on the flash array product side but before I do that I'm gonna walk through a bit more on the flash array Johnny so let's get into it all right since your storage is inception there's been a focus on get to getting flash media mainstream into data centers which have been historically predominated by spinning discs right through each one of our generations we've got improvements in density and improvements in performance tracking along the evolvement and both in storage as well as a computing site with with these generations of flash array what we've offered to our customers is seven generations that they can seamlessly upgrade within without zero down with zero downtime and at pretty much everything included in their Evergreen subscription with every upgrade they've had access to the latest Hardware all software features that were enabled on flash right right so I think that's that's been pretty phenomenal and talking about software innovations the purity operating environment is pretty much the heart and soul off of flash array and it's had a pretty rich evolution on its roadmap as well there's been data service data protection services such as synchronous replications there have been expansions into cloud integrations that you just heard about with cloud block store and there has always been pushing the envelope on the performance side with recent announcements such as direct flash fabric that we announced earlier this year so this year we celebrate our 10th anniversary and with that what we've done is pretty much lured a 10x better product along those those parameters that you see at the back so I'm going to switch gears and talk about what's up for next new media types are evolving and on the both sides of the spectrum right with with obtain or storage class memory we have the opportunity to to break barriers on performance end of the spectrum on the other side of the spectrum with QL scene and we have the option to break economic barriers for our customers with lower cost per gigabyte with that I'm going to lead into the first announcement that we have today which is direct memory technology and this expands read caching functionality on our existing flash array axis right and it's going to get about up to 50% performance improvements more on the latency front to our customers the second key announcement that I have is flasher AC so flasher AC is a new product family that we're introducing in flash array it's a capacity optimized 100% nvme offering that will expand all the rich data services that you've seen with flash reacts so similar experience now put it over to use cases outside of tier one what you can expect is six nines of availability pure one cloud management all the API integrations that we have as well as the predictive support that we've always been proud of with those two announcements but those two announcement what it does to our product portfolio is it expands the spectrum right on the top you have SEM expanding use cases on to the performance realm and at the bottom a flasher AC can address all the use cases outside of tier one right and a lot of our customers have been giving us this feedback and we're very happy to come and enter the space as well next up what you'll hear is a two-part presentation so I'll have Steve McCurry who give a deep dive on direct memory technology and then we'll follow up with Pete who's going to give a deep dive on flasher AC so all your questions on each one of those products until then thank you thanks bye good afternoon everyone my name is Steve McQueen I'm a technical marketing engineer on the platform team and today I'm going to talk to you about direct direct memory specifically direct memory using storage class memory so kind of as Maya Quint went through and talked about the history of the product right we can kind of step back and take a look at what has happened from performance standpoint as we've gone through that same history right we all know where we came from the legacy disk arrays and we started introducing our products right we added flash to that then when we started adding flash we started building our own platforms and of course we started doing direct flash right so within each step of this platform we innovated somewhere to get some additional throughput some additional performance so direct flash allowed us to reduce the latency about 50 percent right in January of this year we announced direct flash fabric which is essentially nvme over fabric for us the initial offering being in via me rocky right we had already been using Indy me rocky with the direct flash shelf since about this time last year well actually a little sooner than this time last year but that offered it for the front end so it starts to give us some options where customers can take advantage of lower latency zhh connecting to these arrays and starts to open up different use cases specifically around disaggregation we're building out rack scale type of technologies of course there's still the other barrier is the read barrier right how do we how do we increase the read performance a little bit more so what we're announcing or what we're doing with direct memory is we're adding storage class memory into the platform as a read cache to give us yet more performance improvements so now we start looking at the end in Layton sees we can really reduce these down into the 150 200 micro second type of Layton sees so we're getting two neared as like performance on a shared storage array so it opens up a lot of new opportunities and it provides a lot of new use cases for where customers may have may have not looked at this type of technology in the past be very thing those are the those real agencies or read and rightly so so direct memory is going to address read latency specifically so this is typically a read latency chart that we're talking about here yes good question but you already have like a write cache right right right indy ramble right ash is already SEM or so no the nvram is actually dims right Pete DRAM yeah so those are at the top much faster so again direct memory this is actually a part of a family we talk about to recommend I supporting multiple classes of flash with direct memory like SCM as well as normal let me get into that the answer is yes will actually have multiple types of flash in the system direct memory as well as regular flash as I go through it that'll become more more evident if it does if it's not please come back and make sure that I clarify that so from a direct memory read caching technology direct memory cache is actually two different components we'll talk talk about SCM and we'll talk about purity optimized part of its hardware part of its software so it's two pillars that actually make up the product the first of that being a hardware portion which is storage class memory so when we look at access to an application application and it's actually trying to read information right the closer we can get that to the CPU obviously the better performance so we go from SRAM cache to DRAM SCM becomes that middle ground between DRAM and flash now there are two different types of storage class memory that Intel offers and those people in the industry offer as well one is a dim that uses the basically VRT interface on the newer platforms the Intel platforms that is in the microsecond I'm sorry yeah yeah microseconds or seconds or microseconds of latency the difference or the thing about that is it has to go on the processor board so you're swapping out DRAM for storage class memory it's not as fast as DRAM but it is persistent so there's some trade-offs there there's also some things about it has to be four to one as far as if you're doing a 512 K storage memory you have to do 128 K DRAM so it's really kind of suck complex it's just not very flexible in the way that you configure it a little more harder to consume the other one is the drive the opt train drive so that's what we've introduced is the envy Ram nvme obtain drive has a module that plugs into the flasher rate so what it is is a high speed caching system that reduces latency for high performance applications reduces read latency for high performance applications so hinged on storage class memory or Intel technology caching so it's looking for things that's being repeatedly you repetitively used over and over right and again read latency we've already got the NVRAM where we're dealing with right so this brings us into the read the key to this is the applications that we're targeting are those with high locality somewhere where the data is read multiple times over some lifetime or some short period of time so that we're utilizing that so analytic type of workloads right it may we may expand it into other workloads eventually but initially it's the it's these high locality type of workloads and again high throughput latency sensitive applications this would be available to anyone but you would actually look to use it if you were trying to reduce the sense of the latency in the application to either get some type of advantage on how quickly you can return the data how quickly you can get the information back or shorten the run of an actual job okay that is your mechanism to populate the direct memory read cache I mean there's plenty of things you can do with recache income which are sophisticated some which are not I mean I just trying to understand what level of read cache capability are you building into the system ok can you ask the question again I missed the very beginning of it there are many different types of read caching available in the world yeah Oh currents and sequential prefetch and things of that nature right you guys do that not not today right and so we're using an LRU algorithm so it's least recently used yep any any block that's been you know it however many fit and whatever amount of direct memory we have yep as long as it's being referenced actively enough it'll stay there correct it's not good talking of data to to read cache or anything like that I can talk through that a little bit say that woman that last part one more time sorry my heroes mining of data or you know winding of data like a one reciprocally to the recache no we don't support that today no it's literally looking at the locality of the data and how frequently it's you so we're not going matric memory module you're actually replacing and SD in the yes reacts yes we are going to replace an SSD with the direct memory module is it before after the duplication and compression we can so we all more detail on this pal go ahead answer anyway is actually going to be after the deduplication but it's going to be uncompressed so we're going to have uncompressed data in the cache but it will still be a block that could be a deduplicated block okay so again we'll talk through some of these details hopefully it'll become I'll get through the I'll get through the glassy slides and we'll get into the role really of the road meet of the action here in just a second so it is the module slides into a slot in flash array so you are swapping out a direct blast module for rec memory module that's one pillar that's the hardware pillar the other part of this is the direct memory cache software which is part of purity optimize your D optimize of the two pillars of purity optimize that we'll talk about this week are the direct cash direct memory cache software as well as always-on QoS so in this case this is this is the algorithm right this is the part that's choosing what data gets put into the direct memory cache what did you do before you had direct memory for read caching did you use it from do you sphere am we use the ramp yes so we do have we do have caching that happens in the system today it is used in DRAM of course that DRAM is used for a lot of different things right and it contains metadata as well as data so what this is essentially doing is replacing the DRAM for the data portion so for the data blocks if you're putting in data caching being done with SCM as seen will be doing the data caching today will be good moving forward if you don't have it it will still be the DRAM that does it I'm sorry do you send not read a direct memory will not cache metadata no miss that thank you direct memory will not cache metadata metadata will still be cached in memory okay that is correct cache operation so this will hopefully explain a little bit better right so at the top level we've got purity running right that's the software obviously we're still doing our rights the way we're always doing we are still putting all of the data on the direct flash modules so the direct flash modules is still the persistent data it's still the data protection we're doing dee doop we're doing encryption we're doing all of the other data protection data protection data reduction algorithms on that data so that's where the data is going to be persistently stored any changes will be made there basically what we're doing with the read cache as I mentioned we're expanding what cache we have available today which is in memory which is very small gigabytes into terabytes of data that now become read cache basically we're gonna have a three terabyte and a six terabyte configuration so these direct memory modules will then contain that hot data so as the data becomes used multiple times it will be placed into the direct memory so now that when we do a read we're going to go to direct memory to do that read so how that shortens it up all right today we would do the read io metadata lookup read from flash where we would decompress and then return the data essentially we're going to have no need for decompression and because we're not reading from flash which is in the hundreds of microseconds we're reading from SCM which is in the tens of microseconds right so we're able to reduce reads service latency and to end on the system from about 200 to 300 microseconds to 100 to 200 microseconds kind of depending on what's happening and there's a pre-start pre-staging and pre loading information that thinks is going to be hot data based upon prior reads of certain things we've seen it done an Oracle in the sequel context soon so when you initially start it up now there would be no prefetching there'll be no pre reads right it will accumulate hot data as the read process goes on now there's no time involved in it so it's essentially a set of buffers if it's replaced by something else that's being used more frequently then something could we'll call it aged out I hate the word age because the LRU doesn't really use an aging cause that's what the timers look like if yeah it's really a set of buffers so I've got six terabytes and I've read a three terabyte load in there and then I don't use it and then I read another terabyte I've got the room to do that and then I go back and start reading that other three terabytes again assuming that nothing's pushed it out it would still be there so it's not specifically tied to a timer it's tied to the buffer slice and while not being tied to a timer doesn't have a timer so that way stuff that hasn't been read in a while as then stale or not in the well in the braise the case of does data become stale at that point how often does it go and update itself from what's actually sitting on disk at that point right so to say if anything there's an ism it's not a right mekin it's another right mechanism anything on disk changes that block that's in read cache will no longer be valid for a read of that data so that particular block would here no exactly so that particular block will age out over time eventually there'll be enough data pushed in that it ages out over time but let's say you're reading something and you read it once and then you don't read it again for a period of time right is that interesting enough for us to put into cache and by a period of time I mean you know about our probably not right if it's in another two or three seconds it's going to be interesting so it's really a set of buffers again there's no time involved but because there are buffers and how frequently read we read we're going to do essentially a look-back period to say oh we've read that that's interesting we saw it again let's put it into cash versus oh we read that it goes through the buffer system it's no longer interesting we read it again it doesn't go into cache so it has to be frequently enough read to be populated then there has to be enough of that populated to push the older stuff out should I be curious around what the garbage collection and how it's doing that far as far as blocks the second disk because I I'm over the past several decades when other people have done similar type of approaches they had various mechanisms in place for you know detection of that hot data and how long it's gonna stage around there will have you aside from the occasionally checking on disk to see if the the you know seriously checks some change and we have to reread that information then pulled off yeah so latency flips that you'll experience because while things may look really really low then you'll have some spikes that go when it's actually in production yeah and I think the difference we're talking about is the idea of a tearing technology versus a caching technology but their brain there's now similar both yes yeah there's the tearing part but then there's actually a separate caching component but yes seen what can look like artificial performance problems that aren't actually there because of how it's actually playing the data back yes yes I would agree with you so yeah so we're the the way the algorithm is were concerned so it just an LRU algorithm right the key of how we populated is is it interesting enough so we're not prefetching today we're not going out and looking anything that tells us specifically this data because it's been read in the past needs to go in here it isn't matter if this is the amount of cash we have we'll fill it up once it fills up we'll start a victim stuff and we'll put new things in the way it's playing out with the stuff we're testing now it's working well for specific workloads there are other workloads that it may not work as well on as we work with customers on this and we see different workloads there are things that have already come up like can type in a volume to this these are questions we've had asked by customers right so these are things we'll continue to investigate as well as do we want to prefetch some data right now you know those are the types of things that I think will continue to kind of see what's happening and see how this evolves Pete would you anything to add to that we can already tell that the algorithm we've selected actually was derived from analyzing customer data so we can tell it'll have a wide applicability to a lot of customers and in pure style that won't be the best thing for a small subset but it'll kind of hit the sweet spot and like Steve said it'll evolve well we'll take feedback from arrays and from customers on how to kind of evolve it one question that that first stack there is that a read hit or a read miss that would be a read miss and that would still be a read miss look like a read hit would be the right read hit on cache would be the would be this would be the stacking without direct memory where does a read hit there's a read hit in in an innard flash or a and D Ram yes you don't have that you know that what's a read hit in RAM would late service latency so if we have a read hit and Ram it's gonna be obviously less than under microseconds so yeah I think the concept today is there's there's a cache that exists it's not even worth talking about it's small it's trivial so I would actually challenge you to even craft a benchmark to tell I've seen it there we can craft the benchmark that would access the same block five thousand times yep and so if that's all you do you'll see it if you access a couple other blocks you'll probably blur it out it's it's strip it a really tiny data set really really small when you read it over and over yeah I mean but it's it's like like he says it's not going to be a real-world application maybe a benchmark so I guess that bleeds into the next level use case so I'm from kind of the ERP space where we're looking at in memory databases etc and looking to extend in memory databases and DRAM to storage class memory and where those advantages are there and where the vendors are working with Intel to optimize those applications to better perform because of these exact challenges when I'm when they're expanding the idea to an entire storage array with mixed workloads do you guys have numbers on kind of the the what the hit ratio is for this expanded level of storage class memory in the store jury okay because I can clearly see the advantages for direct attack storage I'm not as readily seeing the advantages for if I have you know a hundred different workloads spread across the hitting a storage array in the and how that would help my overall performance in a data center okay I'm gonna skip this slide for just a second because your question is actually going to be the next slide and then we'll come back to this one this is fairly evident so this is an analysis pure meta data of our customers our customer base right looking at our customer base today we started tracking data locality basically how frequently data is read in a particular area on our systems our customer systems with like four 110 I believe somewhere in that time frame so we're able to take that information and see if we were to put cash in a system and these are mixed workloads right this is not a specific workload we have we really have no idea because we've actually had this discussion is how can we tell on a customer rate what work load would benefit from this we basically have to kind of figure it out as they as I go along right now we're working on things that will model that better but 80% of our arrays that are out there and are fleeting this is not an analysis of all then this is the higher end because we expect customers with higher in to raise to care about this we've got an extant or next twenty you probably don't care about performance eighty percent of the arrays could get twenty percent lower latency whereas forty of them could get thirty to fifty percent lower latency based on the workloads they're running today we're going to have a tool if you've got an existing array we can model this for you we are going to have tools that will allow this to be done within pure one where we can model this type of data for customers as well so this is one of the things that's really nice is again based on the algorithm the things we've done this is what we expect our customers to see we've got a couple of betas one of our customers had pretty high locality on their beta they were getting about a 30% cache hit in some instances not for long periods of time with the data that they had on there they went to 90% and it produced so latency by about 50 to 60 milliseconds on the read service right which is kind of what will be expected so to answer your question yeah we we actually have a way of looking at this can you tell it to not cache certain volumes no we don't have that ability you don't have that ability today no straight if it's if it's hitting the algorithm and it's being read frequently it's going in is that something we could look to add in the future absolutely this will come a lot of this will come on use cases and customer feedback so to Keith's comment there is hey I want I can't pin mysap HANA today but I want to but I also happen to be running exchange and a bunch of other applications on a particular array I want to I want to try to prioritize that more but the answer is no not a theoretical II though I can see where if your SI p hana dataset resides on a flash array and your whatever other workload that is intensive for cache read cache elements it sits on a different flash array even if they're clustered and you have this storage class memory sitting in both of these arrays you could theoretically pin it because this castle don't catch the the Hana data that - that will cache the Cassandra I heard you say buy an array for each application that's what but but I'm a sales it's called odo right-sizing you laugh at that but no it's very world that's what we do we buy a rave or ASAP because we can't solve this problem particularly if he is either direct attached storage or we buy an array for s AP HANA but Toretto that we won't have it installed and solve this problem either direct attack storage in theory storage class memory if I can get the density that I need yeah is this when I surpassed the density that I need that I start to look at yeah so so I kind of ignored that part of the question but since you brought it back up I will I won't ignore it you know we talked into hell when we started talking to him about some of this stuff you know they were interested in some of the workloads we've run with si P so our solutions team did a lot of testing with Si P but not specifically with this as the incan storage memory for the compute node right this becomes a tear I think we've got some stuff on that this week that we're going to talk about so it's really interesting the economics they've done the storage class memory in the server itself becomes a very very interesting use case and it's much faster right but it's not the same necessary necessarily speed and cost of what's happening in the server itself if it's DRAM but there's still some obstacles to overcome such as what's the data protection mechanism what happens when that fails how do you get it out there's a lot of a lot of serviceability OMC your tactical problems like I could have a race car if I want inside but I need to replicate data off-site and si P isn't the best mechanism for me to replicate my date data that software mechanism is not good enough I need source class replication to solve my data protection problems and my snapshot problems and all of that ASAP hasn't solved that so there's definitely a piece to this I'd also like to know like what type of other HPC type of workloads benefit from the trade-off of not having the data local inside the storage array but also having the advantages of a store jury I would say a particularly viable model of that is high frequency trading yeah that biomedical engineering HPC under petrochemical or seismic elements because the the if I can get to a point where I can move I can be abstract my computer way from that storage class memory and get some of the same benefits a little bit law relating a little bit higher latency but I can attach different workloads to I can use this and for what it's for then I can see some advantages yeah I think well we're trying to address is that middle ground right let's go let's give you some real agency performance to kind of address that direct attached storage market where you have the benefits of the storage array as well right now that's kind of the model that we've been doing with envy me aware fabric as well as direct memory will evolve this clearly you know it's it's an introductory product from a storage class memory offering so it'll be interesting to kind of see where this goes so when we talk about these numbers down at the bottom here I kind of skipped over the slob that explains this this is the lower latency achieved to get 50% lower latency you have to have a hundred percent hit rate right so we see what is the expected hit rate of the fleet expected hundred percent hit rate it's not really high I was gonna put a slide here it said 60% of the time it works all the time it works every time have as much cash as we have this space India Ray had that before in the past yeah from a real numbers perspective right this is our read service latency this is with a hundred percent hit rate this is pretty consistently 50 percent all the way across the board so really as the AI ops go up it doesn't really change the scale over there that's like 244 240 + 480 or something like that it's so this 1940 it's like 300 something those are a little bit higher just because of the size the blocks but still consistent doesn't matter what block size we throw at it if the hit rate is consistent it's linear right you're 1500 percent hit rate you're at 50 percent if you're at a 50 percent hit rate you're at 25 percent it's purely linear from a scale respect that should look the same if this were 64k or hot yes sign it since it doesn't show 64k no yeah no no it does and if it's a hundred twenty eight I mean we've got it all we just picked four and thirty yeah but it's it is the numbers change clearly from a read service latency because the read service latency is gonna be higher on a higher block size right but as far as the percentage and the hit rate it's consistent across all block sizes and this is all free tell me what's really representative I work in platform I don't do pricing I have no idea I actually don't know what this stuff calls to be honestly I don't I don't want to know for the most part so direct memory modules insert into the flash array right so these are PCIe connected devices and sorting into the flash array they go into the modules that we have on the shelf today you know this is kind of a again a testament to when we build our own platforms and we build our own products we have start having options so this is the first offering of something we plugged into a flash array that is not a direct flash module now it's still very similar right but it is a different purpose and serves a different purpose from the purity perspective as well as I mentioned before two different d'etat sizings and 878 I'm sorry Wow wrong world that's you the three terabyte or a six terabyte four by 750 those are the offerings today long term and we kind of danced around this so not long term I guess to kind of wrap it all up we kind of danced around this and a lot of the conversations we've been having about those type of applications it's where do we see things going from a high performance application standpoint you know we've seen this real transition to - for a couple of reasons economies of scale right having the type of storage platforms not the right word but the type of storage software out there that can give us or a done and see in the type of services we want to see as well as you know the types of performance that world shifts back and forth from time to time the nice thing about you know shared storage is it is shared storage you get the economies of scale you get the data reduction you get a lot of the benefits but you start compromising on performance so you have to find that balance what we're trying to do is bring back some of that remise from performance so how can we start to see direct-attached storage applications run with with a storage array with shared storage so that's the model I mentioned earlier I talked about this aggregation whether you want to call it this aggregation or you want to call it RAC scale or RAC deployment we've got customers who are starting to do this right they've looked at where they've had you know just straight nvme devices in servers in a rack and there with envy me over fabrics they're like we can get the same type of efficiency so we can get the we can get the same type of performance or near that performance with better efficiencies and better management now when we start to increase the read latency even more decrease 3d latency even more yeah we don't increase it sorry decrease the reader latency anymore increase the performance we've got yet another way that we can look at charting to change these applications yeah and also adding in the elements of benefit of of a consolidated storage array exactly yeah yeah with having those benefits thank you you completed my thought I didn't actually say that thank you so much
Info
Channel: Tech Field Day
Views: 4,306
Rating: 5 out of 5
Keywords:
Id: gJz1Q3CtvNA
Channel Id: undefined
Length: 32min 20sec (1940 seconds)
Published: Tue Sep 17 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.