SolidFire Solution Overview

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so we're going to start with a pretty quick run through of the SolidFire product for those that aren't familiar with it some of the key differentiating features in it and then the second half we'll get into really the architecture underneath these features but we want to understand before we talk about how great the architecture is what is it that we can do that has all these customers buying why are they buying this instead of any other flash or disk-based storage system they can do it what is so unique to SolidFire so SolidFire at a high level it's a scale out high-performance storage system so we're primary storage we're a scale out system we'll talk more about how we scale out today it is the largest and fastest all flash system on the market today that's not necessarily the most important thing about us but we certainly think we have claim to that with scalability to over a hundred nodes in a single cluster up to three and a half petabytes of capacity in 7.5 million I ops all industry standard hardware 10 Gigabit Ethernet for the connectivity we also have fibre channel that Adam will be talking about but 10 Giggy is always for the connectivity nothing proprietary nothing too exotic all 10 Gigabit Ethernet very rich feature set we'll be talking about some of these in more detail but but certainly we think we have the the broadest feature set in the flash space today for an enterprise storage system but when we talk about the things that really make SolidFire unique they really fall across these five key areas areas that we think are very important in the next generation data center scale out guaranteed performance automated management high availability and in line efficiency so let's talk about scale out first SolidFire uses a node-based scale out architecture the building blocks are these 1u x86 nodes sooner or later we might actually have one show up in this room I think we're waiting for FedEx all right so we'll bring we'll bring that in later and you can poke around on it 1u x86 servers CPU memory 210 gig ports 10 two and a half inch mlc SSDs in it again nothing particularly exotic cluster them together over a 10 gigabit Network all of our software magic running on top makes a single logical managed storage system a single pool of capacity in a single pool of performance in this cluster of course the great thing about being scale out is that you can then add additional nodes and add additional capacity and yes we can actually do this yes this works and it's been in the product from the beginning completely non-disruptive data is redistributed in the background we rebalance the capacity rebalance the performance non-disruptive li and it happens pretty quickly how fast is pretty quickly so it takes lower priority to active io obviously so it depends on how heavily loaded the system is but and the other thing you'll you'll find is that the larger the system is the faster this actually happens but in general minutes so you can typically add a node within 15-20 minutes and if you're adding multiple nodes it does it all in parallel so it actually doesn't really take much longer to add multiple nodes to the system the really cool thing about this that that we've actually been surprised to see is that customers are almost as excited about the scale-out capabilities as they are about the scaled-down capabilities which seems strange because people always just get more data right well yes and no yes they're getting more data but sometimes the data is going to different places different locations maybe a different data center maybe somewhere else within the data center and what we've actually seen is that the ability to take a SolidFire cluster and just as easily as you can add nodes remove nodes from it and send them somewhere else build another cluster we've seen a couple you know real-world examples of this where in one case a customer had multiple data centers they had more capacity they needed in one they were running short on the other now if we were any other storage company what would they have to do well you can't really do anything about this so I've got to buy more storage for me I don't know why you think that's a good thing Dave I I don't know but the customers seem to like it so what did they do they removed a node from one they shipped it a couple countries over this was in Europe they put it in the other cluster and they were up and running in in you know 24 hours and you know that level of agility is really important another example we saw is they had built a very large cluster they decided that they actually wanted to move a bunch of that data somewhere else they took out part of the nodes of the cluster built a second cluster migrated the data and you know they had they had already split that in and again this is saying that is really cool about scale out storage that you just can't do with the traditional array you can't take a VN X or V max take a chainsaw to it and send half of it somewhere else well you can you can expect it to work yeah it just may not work too well when you're done it is good stress relief though yeah as a start for a new cluster them absolutely so you can take you know minimum of three nodes or four nodes to start a cluster remove a couple of them put them together and as we'll talk about later you can now mix and match nodes of different capacities and generations in a single cluster as well so lots of cool stuff there so one of the great use cases for that I want to steal too much of Adams Thunder is you start off with a current generation of node maybe you start adding in denser nodes and then later on you pull out those smaller nodes and put them in a lab and use them to spawn another cluster somewhere else but really really cool and that's when we talk about storage agility that's a lot of what we're talking about being able to use and repurpose and deploy your infrastructure in ways that previously had not been possible what are you using to measure I apps so we tend to use a tool called VD bench mainly because most of the other tools like Iometer can't actually push the performance that we actually can push through the system we have a question from Twitter vet could you theoretically run a one or two night cluster what's that could you not really so we need three nodes for a quorum and our recommended best practice is four or five nodes and it mainly has to do with as we'll talk about with high availability the ability to sustain failures and keep running you know it's something where we recommend a four or five node starting configuration um if you look at where this scale out architecture positions us in the market you can draw the disk base guys out there upwards of petabytes of capacity not that much performance the other all-flash guys down here in the bottom left corner so millions of AI ops and this is just straight from their spec sheets but really not that much capacity typically most of these guys have a hundred terabytes or less effective capacity available today so again very good for performance constrained applications point Solutions places we're just trying to drop something in and make an application go faster but not really good at the core of the infrastructure places where you really need scalable capacity SolidFire and we have three nodes available today reaches the capacity of the disk based systems with the performance of the all-flash systems and being a scale out system you can land anywhere on any of these lines you want and now with our mix node capability you can really land anywhere in here that you want as well and texture is flexible enough that you can expect over time there will be lines that go like this lines go like that and we will encompass a broader and broader set of capacity and performance over time okay yep the ends of those lines use 100 node cluster hundred nodes yep yeah but the ends of everybody else's lines are the maxes the max right max you can buy for them too yeah I'm just applying thee yeah actually got 100 node well I said we test at we tested about right here today right most of our customers are about right here today but the important thing about that is most of our customers are beyond where they could get right without having a bunch of those guys yeah actually the that's exactly what I wanted you to do is go and here's where we tested - yep so the second thing and this is probably the thing that we historically have been best known for is our guaranteed performance capabilities our guaranteed quality of service and if we have time in the demo section at the end we'll actually demo some really interesting and new things we're doing with this combined with VMware but this is really enabled through something we call an architecture performance virtualization we get in the architecture section I'll touch on some of the how how we do this but the simple way to think about performance virtualization is separating the provisioning of performance from the provisioning of capacity in a storage system basically every other storage system on the market today you provision capacity and how you provision that capacity determines the performance you get what type of media you provisioned it on how many discs are in there how fast those disks are what raid level they have how fast the controllers are in front of them how much cash you have sitting in front of those through the combination of all that that solution and that volume and that input that particular application will get some level of performance the really complicated thing about that is that performance can easily change tomorrow if you go and provision something else in that same configuration and so performance management has always been very challenging in storage and approaches to quality of service before this where all was fairly basic they were typically one of two approaches one was rate-limiting where you just say you know what I want to make sure this volume doesn't go any faster than this so that he can't be too much of a hassle to the other things on the system and that's ok but the real problem with rate limiting is you know it's nice for the other guys it doesn't really help that's being limited because you know he knows what his maximum is doesn't tell them anything about is minimum and actually managing that across a lot of different applications can be very challenging the other approach is prioritization and prioritization works well when you have a very small set of constrained applications where you know the relative priority of all of those things and you can make relative judgments to make sure that the high guys are high and the low guys are low and it's not really a problem if the high guys are noisy and cause problems for the low guys when you move into these more cloud-like environments number one you've got a lot more applications and even large virtualization environments have this rom you've got a lot of applications it's hard to assign relative to priorities to all them everybody thinks their thing is the most important and in the service provider case a lot of times they don't even know what the applications are and so maybe they can make somebody pay more to be high and less to pay less to be low but if they can't tell the high guy how fast he's going to go he's not going to pay more than that so if you can't actually give them metrics in numbers and minimums and SLA s around performance the quality of service isn't really that meaningful and so we've designed our system so that you can actually allocate performance independent of capacity for every volume in the system with a very fine-grained model that has minimum maximum and burst and of course that minimum is the key actually knowing what minimum level of performance you'll be able to deliver for that volume at any given time sometimes the max is important so right limiting candidates for when you want to make sure that people are only getting what they pay for exactly and so particularly the service provider realm actually saying you paid for eleven thousand I ops you're going to get on a consistent basis a max of eleven thousand I ops if you want more than that anymore but because hard rate limits actually cause problems for a lot of application workloads things that are very spiky if you think about like a VDI boot storm or a database checkpoint or something that you don't really want huge latency spikes on we have a burst capability as well we use a credit based scheduling mechanism and if you're running under your maximum you build up credits and you can actually burst for a period of time higher than that similar to the bursting models that you can get in network quality of service again all kind of very new and interesting on the storage side of things the performance allocation is also completely dynamic so you can take a volume that had a thousand I ops turn a dial give it five thousand I ops and literally within seconds gain advantage you just said you can you can show what someone gets as a minimum mm-hmm but you can boost yep but you might want to know what the exact amount was after a period of time yep and show you exactly so part of our API and reporting capability is the ability to get the data out that includes data on how much did they actually do maybe you want to do usage based billing and chargeback based on what they actually did not what was allocated to them you can measure if the SLA is were met because in our system gives you a lot of different options for this you can under provision the minimums the sum of all the minimums you can over provision the minimums you can do a lot of different things to both help yourself or get yourself in trouble but our reporting actually can tell you back out of it did everybody get what they were guaranteed did everybody get their minimums like they were supposed to or for some reason did they not and you can actually see it see that in the reporting day excuse me can you do yeah sorry okay and can you get a pool of lance and make the crew ask for that pool oh yeah great question not today now where we typically see that being done is like for a tenant or a customer in a VMware environment what's usually done is creating a data store having the QoS on the data store and then they really do get a pool for all of the virtual machines in that data store arbitrary grouping of any LUN on the system into a QoS group is something that's a roadmap item we don't have a lot of demand outside from the VMware for that today though so okay okay Dave yep the tables suggest that you can adjust the the number of eye ups per block size is that correct well not exactly so we when we talk about eye ops just to kind of keep things simple we talk about everything in terms of 4k IATA but what you're seeing down here is obviously different eye ops have different weights and slaves it's okay translate it tells you so if you know this your average and this is totally based on averages if your average workload is based on 16 ki ops this is what that actually translates to and that's really important because some other approaches to quality of service that just bound it based on a ops if they do one mega ops they can you know even though you only gave them a hundred high ops they can crush your storage system so it's very important that you actually expose this is how what we call the QoS curve works at different IAP sites it's not linear because you know there is some savings that we get from larger IOT's but there is additional cost and really where we see this in practice and this is great illustration of this we do this demo all the time when you have an environment that's relatively noisy sharing a set of resources could be a storage controller certainly very apparent when you're staring the same set of spindles you have a couple of guys that go crazy start setting a workload on it everybody else gets affected in this demo we turn on the SolidFire quality of service we put for each volume in the system we dial it in in this case to four different performance levels and literally within a few seconds everybody gets exactly what they want exactly what they're paying for whether it's a service Rider model or an internal chargeback model and if an application that's down here at performance three is getting maxed out and really needs more you can literally turn a dial and give it whatever performance level you want so dive just just to clarify might be on my stick looking for the close call calls call your service is on a volume or one when you say volume what is it so it's basically the same thing in ours it's a LUN so you know what we see is a virtual disk now it's things like VMware like to put many disks inside of that if we get to the exciting demo at the end of the day you'll see actually how we break out and do it that was one of the things like vm looking out from the way yeah with something like um how do you grow cloud stack or OpenStack the volume level corresponds directly to a virtual disk level with VMware it tends to correspond more to a datastore level but there's some interesting things we're doing there I don't want to ruin my surprise oh sorry all right any other questions on the quality of service piece it is really cool a third thing shared nothing high availability we'll talk more about this when we get into the architecture in terms of how this contrasts with other approaches to AJ but suffice it to say the the mechanism we use for data protection high availability and SolidFire involves a not a traditional rate algorithm but a distributed replication algorithm that distributes redundant blocks within the cluster so that we have multiple copies of the data the data is not replicated within a node but across multiple nodes allowing us to survive of individual drives as well as entire nodes and importantly allowing us to very quickly and very easily redistribute data when a failure occurs and that's a not a perfect illustration because it's actually a mess rebuild process when a node disappears the data that was on that node is actually spread the other copies of it are spread across the remaining nodes in the system and we do a mesh rebuild process and this literally lets us rebuild from a drive failure in about five minutes which is extraordinarily fast and the the scary thing is as the cluster gets larger that gets faster on a really big cluster we can do it under a minute extremely extremely fast and once that rebuild is complete we've restored the full redundancy in that system and you can sustain additional failures and when a failure occurs all were basically consuming is free space that was in the cluster there's no dedicated spare drives there's no dedicated spare nodes this system just rebuilds as if it was a smaller cluster one drive smaller one node smaller whatever failed and the really cool thing about is we handle every failure in the system basically the same way whether it's a hardware failure in a node a hardware failure on the drive a backplane failure a network connection failure that makes a node unavailable even a software failure that something is crashing and it just can't restart and doesn't come back up we treat it the same way we kick it out we rebuild and we restore the redundancy in the celery done C is one level replication or two or three or today it's today it's two we will have an option for three in the future but based on our metrics the speed at which we rebuild means that our essentially two-way replication has equivalent or better chance of data loss than raid six in a typical system so what is the interconnect using to manage that 10 gig Ethernet so all of these nodes are connected with two 10 gig Ethernet ports and that's what's used for both the storage network as well as the cluster interconnect okay so that means that the the interconnect would be some sort of 10 gig Network mm-hm pretty much anything that you can yeah typically you've got a cluster plugged into 210 gig top-of-rack switches and that's that's it yeah so you limited by the number of ports that you can plug into and then you'll have certain latency steps based on how many yeah depending on how you build your network you know you and like so typically a cluster is plugged into one switch and with particularly the 72 port 10-gig switches available today you can build really large clusters even on a single switch architecture but you can also go Leafs pine add one more hop and and still have very good Layton sees customers are supplying their own they are working hardware for this right exactly so we are network agnostic we work with the Rishta and everybody else just wait use well sure we like a risk to them yeah well clear really well but gotta pay the rent but that actually solves a problem in that you don't have to play the game of you know you guys like Arista but my networking silo like Cisco and screw Arista you know like you know and we don't have to sort of negotiate that fine you know use a nexus we don't care yeah we don't and we've worked with pretty much everything under the Sun in the field today and it all works pretty well cool so do you see the two-port 10 gig as a bottleneck yet it's not that it's not the bottleneck yet we can see it heading towards being the bottleneck as other parts of the system get faster we actually have 40 gig in the lab today and put the price of 40 gig ports and switches coming down it's a reality pretty soon here so now I would say that even when we see it approaching the bottleneck it's only a bottleneck of the theoretical of the max throughput of the system yeah most customers probably won't be approaching that anytime soon so even then they may not choose to go to 40 gig um next to us very core to the architecture in line efficiency compression deduplication thin provisioning will cover more of the how in the architecture session but suffice it to say the architecture was designed from the ground up around being able to do these in line in real time without significant performance impact so these are always on we do global deduplication across the entire cluster so any volume any piece of data do you duplicate it across the entire cluster real time compression group um space-efficient snapshots cloning all of that stuff integrated with this as well now hash algorithm is skiing 256 good question I followed the discussion on that guys I beat that one into the ground can we talk about it for like 15 more minutes that would be great that would make me want us to be killed by an asteroid before there was a hash collision yeah and we definitely would be I've run the numbers just a just a very you know very simple comparison and we can give you all the breakdown on and exactly what we did here but to actually demonstrate the the efficiency of both the the data reduction and this is by the way very conservative estimates for data reduction 2x compression 2x deduplication so nothing you know 20 X ridiculousness or anything like that against a traditional disk based and this is actually I think a hybrid sand configuration and what you actually get for a Quillin amount of capacity in terms of space power cooling reductions performance increase so you know iraq vs 5u 52,000 i ops vs. 375,000 i ops 90% less Rackspace 78% less power and cooling etc etc and and we've priced this out and it's it's pretty competitive there's not a huge difference in price between this so particularly when you look at guys that are deploying this at massive petabyte scale the operational savings cost savings add up real quick so yeah the OP X savings and that's important to point out the op x savings are really gravy on top because again the capex is pretty comparable to that configuration from a management perspective we really done a couple of things one is drastically simplified this is something I talked about in the software-defined data center section the other day you can't automate anything that's overly complex the best you can hope to do is put a layer of indirection in it that hides the complexity and simplifies it but it's actually better to start with something that's far simpler to begin with and so we've we've done that you get a global unified pool of capacity a separate global unified pool of performance you just provision out of that your individual volumes as you scale your cluster those pools just get bigger there's no magic there's nothing nothing fancy there's no separate pools of different capacity types there's no separate raid groups there's no separate raid levels there's no aggregates there's no volume groups there's no five levels of things you need to setup you plug in the system you turn it on and you just start provisioning automatic load distribution across the cluster no hotspots whether the cluster gets bigger gets smaller you're never to manually place or reprovision things single click provisioning of volumes very very simple and that self-healing capability we talked about today automatic load distribution is by your hash kind of algorithm or something like that with a consistent hashing yeah so the data and load are distributed through the through the hashing across the cluster we also have some things that take into account when you have variable size nodes and other things to keep things balanced Tribute it by the hash algorithm good questions so the the way to think about it is the hash algorithm is random it distributes the blocks across the cluster in a random fashion and it actually ends up distributing the load in a very random fashion as well you could say well what if the actual load coming in isn't random doesn't really matter unless they're just sitting there hitting a single fork a block if they're even if they're hitting say a 1 Meg section of really hot you know data that 1 Meg section is actually going to already be distributed across the entire cluster so we don't have to do anything special to distribute a redistribute based on load the distribution of the data actually ensures that even distribution how do I access the cluster then yeah so the the connectivity is I scuzzy we have fibre channel very soon Adam will talk about that but just to keep it with I scuzzy for a second they connect to a specific volume through I scuzzy we have a redirect or service that balances those connections across the cluster does that run on a single node or is that it does but it fails over to other nodes but the redirect or service doesn't handle any i/o it just redirects connections to other nodes they make the connections to the individual nodes and then they do their i/o through that particular node and we talk through the back-end network to the rest of the cluster so the hosts talking to all of us possibly talking to all of the nodes or just a subset of nodes or yeah so it depends for a single volume they're talking to a single node ok if they have multiple volumes those connections will probably end up getting spread across multiple nodes ok and then those so single volume was associated with a single node and then the node manages any other distribution exactly first so the actual data then is distribute over the back-end network ok ok so it's been my experience that a lot of enterprise customers have been leery of I scuzzy for whatever reason over tier one applications you guys have people running tier one applications apps absolutely but yeah Oracle s ap you name it and a lot of them and the same one and and you know I think they've been leery for good reason in the past I scuzzy has you know a little bit of a murky history but a lot of a lot of that has changed particularly the move just from one gig to 10 gig has been a big part of it people actually building decent networks as opposed to just putting it in you know plugging into the wall into a very poor networks another piece of it the maturity of the I scuzzy initiators has gotten dramatically better and compatibility is really a non-issue at this point which is another big piece of it and the other thing that I think is is tainted I scuzzy to a certain extent is the fact that I scuzzy was associated with lower end storage systems they were less powerful storage systems and so people had performance problems they upgraded to a fiber channel system and everything got better may not have actually been the protocol choice per se might have just been the fact that they got a more powerful storage system but for us like when we look at fibre channel versus I scuzzy we see no performance or reliability difference if anything there are certain things that are in ice Guzzi's favor at this point and if you really want fibre channel and we'll talk about it later Adam we'll talk about that later all right I don't really know excuse me you support ECB this isn't a breach I'm sorry service after data center bridging Oh Dec B um we've tested with it we don't have it enabled for customers right now because quite frankly none of our customers run DCB today and we haven't seen much benefit from it okay so it really only helps if you've got a network that's really noisy and has a lot of packet loss and if you have that you probably have other problems so yeah we don't we don't have that on today and since they're dedicated ports you can just use pause per priority pause isn't necessary yeah I mean it makes it makes your Ethernet fabric lossless but again that really only helps if you have a fairly lossy Ethernet fabric and most 10-gig fabrics that are well built don't have you know much packet loss but now you're making the bandwidth Trump's QoS argument but again it's not it's it's really not either again unless you're over subscribed to the point where you're dropping packets you're not going to have that issue I'm not going to talk too much about our UI very simple very easy to use most people hardly ever touch it once the system set up because our focus is really on automated management and this is really where people get really excited about the changing in management of storage for SolidFire all the capabilities we talked about everything we do is expose through very easy-to-use rest-based API that's completely comprehensive you can literally take a SolidFire node out of a box put it in a rack and never log into anything at all just use the UI for the entire lifecycle of that system and I know you can do this because our own tools are just built on top of this API there's nothing our tools can do magically that you can't do through this API and all of our integrations which are really great are built on top of this API as well so our VMware plug-in which is great that you might see later our integration to OpenStack which is open source you can go actually see that in the repository today is all based on this API cloud stack and so we've taken the API we've integrated into these management stacks so if you're using one of these platforms if you're using OpenStack VMware cloud stack today again 98% of what you need to do is just taken care of by that platform all the day-to-day provisioning and management activity hey Dave can we stop for a second go back there's a question on the internet do you guys have for like VMware and such do you guys supply your own PSP your own multipathing yes we don't we don't need it the stuff out of the box works fine with us round-robin or yeah so there's without getting into much options there's different options depending on the environment okay and we work with all of them cool thanks yep nothing nothing special there we do have an optional VMware plug-in that just means you don't have to go to our web interface you can manage everything through the VMware UI and of course we support all the VMware primitives for block acceleration so with that I think I will hand it over to Adam to talk about some of the new stuff in carbon then I'll come back and talk about the architecture
Info
Channel: Tech Field Day
Views: 12,673
Rating: 4.7948718 out of 5
Keywords: Tech Field Day, Storage Field Day, Storage Field Day 5, SFD5, SolidFire, Dave Wright
Id: tfP4q3DIvz0
Channel Id: undefined
Length: 28min 23sec (1703 seconds)
Published: Fri Apr 25 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.