Ceph, Now and Later- Our Plan for Open Unified Cloud Storage

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right hello everyone can you hear me okay all right welcome thanks for coming my name is sage Lyle I work for Red Hat and today I'm gonna be talking about Saif now and later our vision for an open unified cloud storage system just to give you a bit of an outline about where we're going now I'm gonna start by talking a bit about why we developed Saif why it's open what unified storage means to us talk a little bit about what kind of hardware you can deploy stuff on its relationship with the OpenStack community software I'm also going to take a minute to talk about why people don't use Saif because I think that's a useful retrospective exercise and then we'll get to the fun stuff talk about what's coming down the pike roadmap highlights and community updates so the the current releases Saif is jewel it came out in the spring we actually started working on Saif over ten years ago it was originally designed as a distributed scale out file system but one of the very frustrating things over the past five years whenever we talk about stuff is that we had a stable object and block interface but we didn't quite have that and that stable file system yet as for the last thing to mature and so we'd always have to show slides like this that described rate escaped weirdos block devices awesome butts ffs is only nearly awesome so the big milestone with jewel is that we can now declare that stuff it's fully awesome file is fully stable and ready for production so we're very happy about that but but stuff gives you this this unified source platform right so you have the rate of gateway which gives us three and swift compatible object storage with object versioning multi-site federation and replication the block interface RVD gives you a virtual block device with snapshots copy-on-write clones and multi right multi-site replication across clusters for disaster recovery and for the file interface we have self S and distributed POSIX file system that gives you scale out metadata coherent client caches and snapshots on every directory and so all of this in one all this is built on top of the REA DOS ongoing greatest platform which is a software-only distributed storage system that's self-healing self-managing based on intelligent storage nodes that figures out how to distribute your data across racks and racks of storage devices so in a nutshell that's that's what stuff is I think most of you probably already know that since you're here at here at OpenStack but I want to sort of step back for a minute and talk a bit about what stuff is about what motivates the design of Ceph and the development activities that happen so first and foremost sathya's distributed storage of course but it's designed from the get-go such that all components will scale horizontally so step is really about scale about cloud scale it's designed to have no single points of failure it's a software only solution we don't rely on specialized hardware and in that sense it's Hardware agnostic so you can deploy it on commodity components that you of your choosing we provide object block and file interfaces in a single cluster so it's unified and whenever possible we make the system self managing because when you're operating a system at scale things are gonna go wrong and you can't have operators going in and having to intervene whenever there's a small issue but last but not least Saif is open source and that actually is one of the the most important features of South I believe so I want to take a moment to talk about why Saif is open source and why that's important open source of course is important because you can avoid the vendor lock-in you can get supported from stuff from red hat from Sousa from half dozen other people you also avoid the hardware locking because you can choose your your software solution and then buy hard work remove it or whatever vendor you choose whoever gives you the best price or the best reliability performance has the effect of lowering the total cost of ownership for you these are sort of the beer benefits of open sources and the the free as in beer but they're also the freedom benefits because the system is open source you have transparency you can actually go look at the source code and see what it's doing if it's actually doing what we say it's doing instead of just taking the vendors word for it you have the option of self supporting the system if you want to develop the expertise in-house you don't have anybody support it you can just go fix the bugs yourself and go off to the races but most importantly because it's open you have the ability to add extend fix and improve the system which is really what open-source communities are all about it's a moment to talk about what unifor unified storage means to us so there are couple sort of key advantages of having unified source system the first is that you get the simplicity of deployment so you can deploy a single set cluster and you can run object block and file all against that same same storage platform in your infrastructure and as a result you get efficient utilization of storage so you know to do capacity planning for your block separate from your file from your object and finally you also get simplicity of management so you have a single set of skills that you have to train your operators on and develop all your tooling around to manage your entire sargent for structure at least that's that's the unified storage story I think the more experienced operators in this room might question just how true and valuable this really is and so I want to sort of take a moment and you'll forgive me the political inventory and call it bit of BS on some of the rhetoric here I'm so the first two points simplicity of deployment and efficient utilization of space I would call this half true for small deployment when you're setting up a small cloud it's absolutely valuable that you can set up a single stuff cluster and you can put everything on it you can have your object and your block in your file all mixed together and all just work and you don't worry about it and it's very very easy but once you start to scale to cloud scale I guess is everybody in this room obviously is planning on doing then it's it becomes important usually to to optimize for that the type of hardware that you're deploying workloads on so for block Worth roads you might be using flash and for optic workloads you might be using spinning disk and so forth and so when you're operating a scale you're not necessarily running all this off of all your object your block mix together but that last benefit is still I would say completely true where you if you're deploying the Ceph software across your entire infrastructure even if it's in separate clusters or separate Hardware pools you still have that same set of management skills that you have to develop and a single set of software they can flow across across your infrastructure and this is actually valuable on the development side as well because when we develop new functionality and Rados that all of these interfaces are sitting up on like eraser coding or compression and so on we can take advantage of that for all those different use cases without having to read will meant the same sorts of capabilities three times so you have a choice in many operators will build specialized clusters that are sort of tailored for a specific use case especially when they start to scale very large but you don't have to do that staff is very flexible in that you can also define ratos pools within the same set closer that are allocating storage to specific storage devices so it's designed to be a very flexible architecture I mean in particular it's designed to be Hardware agnostic it's a software-defined solution in the true sense of the word so if you have high-performance workloads and you can run Ceph on flash appliances and it's going to go really fast we did a number of reference architectures recently a Red Hat did I should say with Sam two songs and disk and Intel although I didn't have a pretty picture there Samsung has this great to you box that's just packed with the newbies they got 700 some thousand I ops on 150 terabytes sandesh has a similar solution that's much more dense not quite as high performance but it's designed more for capacity my storage it's much more cost cost compelling so if you want to run stuff on flash then go for it and it's gonna go it's gonna go really fast and these guys have done a lot of work to optimize stuff to make it perform well in those environments if you want to run stuff on SSDs and hard drives you can do that as well and this is what most people do you can go out and buy hardware from pretty much any vendor on the planet and you can put stuff on top of it you can also find reference architectures from all the major players where they've sort of pre tuned them stuff and you can find out how it's going to perform when you buy it you can even go to somebody like Fujitsu and get a turnkey appliance that's a rack scale and ready to go and just plug it in and got set pre-installed or you can run something like opening pute hardware from somebody like penguin and off to the races as well so you have a lot of flexibility here and finally you can even run Ceph on an actual hard disk this is sort of bleeding edge here but Western Digital Labs has a prototype hard drive based on their HR bike Hylian platform where they essentially added an extra ARM processor onto the PCB on the hard disk that runs Debian Linux and we run the Ceph OSD actually on the hard disk and they swap out the state of interface for dual USB Ethernet and in the chassis as you swap out the SATA backplane for an Ethernet backplane and so you can imagine just racks and racks of hard drives plugged directly into the network running stuff on the hard disk so this is a prototype their next generation is gonna move to arm 64 and they're sort of working through all that all the issues around you know building a completely new product but the other hard drive manufacturers they're looking at similar design so it's very exciting to see this full breadth from all the way from hard disks to like really high-end stores systems and you can run stuff on all of them which really underscores one of the key advantages of having free and open source software and that it leads to fast and open innovation so obviously open source software means you can enable innovation in for just by cloning the code and going and hacking on it but it also enables the hardware innovation because those audience and Owens can just get the code and start hacking on it to support their platforms without having to have some cumbersome business relationship with a proprietary software firm to get access to this code and deal with all their licensing and NDA's and all that stuff none of that friction is in place for open platforms which makes it sort of ideal for pushing computing into the future a good example of this is persistent memory so persistent memory is coming there's 3d xpoint from Intel micron we actually have envy dense today although they're still pretty expensive but these things are going to be really fast somewhat more dense and very high endurance they will be expensive but most of all they promise to be very disruptive as they sort of turn all the assumptions that we've built all their storage designs over the last three decades on their heads by having this completely new persistent storage medium to design around so until recognize stuff is sort of a key platform to experiment with here and they went and developed something called PM store which is a prototype back in for the OSD that's designed to store data directly on 3d crosspoint memory I'm using their nvm L library it hasn't moved past the prototype stage but I think it's a good example of how innovation for these new hardware technologies can happen very efficiently in the open source space and that's really what we're here to do so in the Ceph community our goal is to create an ecosystem around stuff that allows Steph to become and I guess to the the Linux of distributed storage and we mean that in a couple of different senses first open source and open development is critical we also want to build a collaborative environment where we have lots of different organizations contributing development effort to improve seth and build upon it and make it make it better and finally we're we want to build a general purpose platform so in the same way that you can run linux on embedded devices on your phones and also on big iron in the data center we see seth as being a general-purpose storage platform that can run on hard disks commodity servers and on high-end and you know flash optimized platforms as well which sort of brings us back around to OpenStack so Stefan OpenStack get along very well there are lots and lots of integrations on the redose gateway side with our object interface ros gateway talks to keystone to do authentication it provides an s3 and Swift api doesn't necessarily talk to Swift so much as you can use it in place of Swift because it speaks that interface on the block side we have drivers for cinder of course and glands for managing all your images and Nova knows how to start up your KVM instances so that they're backed by virtual disks stored in SEF and all works very seamlessly and that's why lots of people in OpenStack stack like us but I think that the most exciting piece to call out here is that the new there's a new Manila driver that lets you orchestrate file volumes stored in South efef and that can be plumbed through to your to your virtual machines and that's new in whatever is I think was nothing the talk of it yeah new but very exciting but cinder is really where we're safe and OpenStack have really shined if you look at the user surveys over the last I guess three and a half two and a half years seth has consistently been adopted to be adopted by roughly half or a little bit more than half of the affected planets out there rivaled only really by LVM which is ephemeral ephemeral non reliable storage on the local disk which makes sense is really addressing a different use case anyway then sup our vide is worth's reliable in the cluster i bring this up not to pat ourselves on the back although I might do that too to point out that there's still a whole lot of other store systems that people use with with Seth and I think a useful exercise for us in the community is to look at why people are still choosing a lot of these other storage platforms for their clouds when Seth is supposed to be the be all end though and be so great and so I think the important exercise is to just as to ask the question why are people not choosing stuff and I think there are a couple important reasons that we need to pay close attention to why that's the case the easiest one and the easiest one to write off is simply inertia people have been buying proprietary appliances for decades and it takes a while to change with happen even if you had a perfect storage system it would take time to to change those buying habits and actually have people adopted that's not particularly illuminating but we'll just get out of the way another reason is performance I think Seth has developed a reputation for not being as fast as other store systems I think partly this is due to history Seth has gotten a lot faster over the past several years and people have been using Seth with OpenStack for a long time and so I think some of these are opinions are outdated but there's also a kernel truth here the the back end that the Assefa is used to write data to local storage is definitely aging and there's a lot we can do to improve that I'll talk about more about that later another reason is functionality sometimes they are just things that Seth doesn't do that at the store systems do do I think that the big one in that category for me is quality of service we don't have Gillis and Seth yet other systems do and sometimes you need that an important one obviously a stability I think one of the reasons why Seth and OpenStack work so well together was because OpenStack was kind of growing up at the same time that Seth was and so as we were sort of working out all the kinks and making the system stable so is OpenStack and the people deploying OpenStack had a high tolerance for that sort of failure and stuff at the same time so it ended up working out pretty well I think we've done pretty well for ourselves but really you shouldn't take my word for stability you need to talk to other operators and listen to all the other talks where people are talking about their seth experiences which sort of brings me to my last point i'm listening to other people talk about seth at events like this it strikes me that a lot of the issues that people have with seth aren't necessarily that it's doing something wrong that it's broken but just that it's really hard to use distributed storage is complicated and we do as much as we can or we do a lot to try to hide that complexity but we by no means hide all of it and stuff compared to a lot of other technologies it's just it's very difficult and I think that's sort of more than anything that's one of the key areas where we as a community need to do better and where you as a user or an operator community can help us identify what we can do to do better which sort of brings me to the more interesting part of the talk what are we doing about all this where are we going from here so the great thing about stuff is its software and software can be upgraded so we have a regular release cadence ceftaz named releases every six months in the spring in the fall the spring releases our LTS releases which means we do regular back ports of bug fixes so you can run them for multiple years without having to upgrade our current releases jewel we're just coming up to the craftman release it's gonna be out in the next couple of months and luminous is gonna come out in the spring of next year so let's talk a little about what's what's coming what's new and Kraken it's gonna be terrible and wonderful and scary and all the good stuff is actually gonna be the most fun to announce the release of cracking of any other stuff release but besides that the big headline feature in Kraken is gonna be blue store so currently the OS DS use something called file store to write all their data to the local disk they write em s files and exif s that's it on the local disk blue store cuts out that entire layer and writes the data directly to a block device so we use an embedded key value database currently rocks to be although we could swap something else in later for the metadata but then we all that data goes straight to the disk blue store can combine hard disks SSDs we can even use in V RAM or persistent memory for some of the journaling functionality so it's sort of targeted at all the current generation technology that's out there but it has sort of a few key headline features the first is that we're gonna have full data check sums across everything that's written to disk which means that whenever we read something out this we're always going to verify that checksum before you return to the rest of the system this is going to be huge it also features inline compression if you enable it driven by policy on on client hints or pool whatever you define so we can use the liver snappy which is gonna reduce the amount of storage you have to buy hopefully for certain workloads but the biggest thing is that blue store is roughly twice as fast as file store that's true for SSDs and hard disks and large i/o and Somali oh and no give or take it's it's much much faster we get better parallelism and efficiency on fast devices we eliminate the double writes where we used to do data journaling on general device we don't do that anymore it also performs very well even when you're using a very small journal so you might use SSDs to accelerate those metadata updates and that those journals can be small and hundreds of megabytes instead of gigabytes like they are right now lots of people are working on this it's been a real pleasure to work with people from SanDisk and more Antos and CTE on developing this new feature and it's doing quite well the biggest question probably in everyone's mind though is when can I have it the current master is doing quite well we nearly finalized a disk format and in kraken we're gonna have a stable code by base that hopefully won't crash in a stable disc format it's it's almost certainly again still going to be flagged as experimental because it's a sort of brand-new code and you don't want to go putting your production data on it just yet but we do want as many people as possible to try it out on your maybe your dev test environments here performance testing environments the goal though is that for Luminess this next stable release that will have a fully stable and and version that's ready for broad adoption we hope to make it the default instead of file store but that really depends on how the next six months ago we need to make sure that it's stable and that we trust it because your data integrity is definitely more important than you know getting a future out the door as quickly as possible a big question also that comes up is how do you migrate from file store if you are using it it's and it's really pretty simple you just take the existing file service to OS DS and either evacuate them or not and just kill them reprovision the same storage devices as blue store and let the regular step recovery take over excusing the other new new item coming in Kraken is a sync messenger it's actually been in the country for a while is experimental but it's now the default this is a reimplementation of the network layer and stuff it features a fixed size thread pool so you don't a lot of the thread thrashing that you have with the legacy implementation it behaves much better with TC malloc and it's not at fault that's good one of the nice things about async messenger though is that it's a new software architecture and it abstracts out the the transport layer the actual bits part that sends data over the wire and so we have the normal implementation that uses sockets in TCP but we also have two experimental backends I'm motivated by the fact that when you do profiling on SAP on high-end source devices you see that a lot of time is spent doing TCP reading and writing data over the wire excuse me and that's these these two big Peaks that you've seen this Ling graph so one back-end is based on DP DK this is an Intel library that lets you move the network driver and the TCP stack out of the kernel in userspace with the different threading and memory model them gives for a very very low latency Network IO that's a pretty cool prototype that's there the other one is based on our DMA so you'd send data over the IB verbs interface for our DMA Hardware we keep the TCP that connects in therefore the control path mostly just because they were able to put this protocol prototype together and in less than a month I mean make it work so there's one that do it a proof of concept here this is all contributed by the X guy folks they're doing good work so some interesting stuff on the network side coming down the wire honestly the networks that really isn't the tallest polling intent as far as performance goes so it's fine that's all that weighs out we're mostly focusing on getting the storage stack to perform a bit better so moving forward to luminous this is coming out again in the spring so the first sort of big thing coming in Luminos is that the multi metadata server in south of s is finally going to be completely stable in in luminous so you can have scale out metadata force ffs and sort of fills in the last missing piece for completing the full scale out story force F object log file so that's exciting the other big thing in luminous is going to be a ratio code overwrite support so radius pools have supported a racial coding for a long time but the current implementation only lets you append to erasure coded objects it turns out that simple to implement and it's completely sufficient for radio Skateway workloads for your using the s3 protocol to dump objects into another cluster but it doesn't work directly for our BD and for Seth that's where you have to modify existing objects so EC overrides will enable our VD and set the Festa directly consume those are a circle in rules it turns out this is really hard the implementation requires a two-phase commit to avoid the sort of EC equivalent of the raid Hall and it's also complicated to avoid an inefficient full stripe update if you do a small 4k right serve in the middle of a strike and also the implementation relies on an efficient implementation of sort of an internal primitive that lets you move data between our internal objects energy the roll forward roll back efficiently I mean that's only done efficiently in blue store and we were trying to do this on POSIX it was very difficult to do this with the file system but it's hard but it's going to be huge it's going to have a tremendous impact on our TCO I'm going from 3 x4 up because of something more like 1.3 1.4 and we believe this is really gonna make our BD read again sorry other new stuff in luminous we have a new demon in South called set manager motivated by the fact that the set monitors currently sort of corralled everybody in the cluster they currently do a lot and it turns out that a lot of what they're doing really isn't necessary they're spending most of their time storing and aggregating stats about all the placement groups so you can do things like DF that really isn't critical for the function in the cluster and this has the effect of limiting the scalability the monitor cluster and in turn the overall SEF cluster and so Steph managed removes these sort of non critical metrics out into a new demon that's much more efficient and it's going to enable us to stream things off to things like influx to be your graphite or whatever else for your all your pretty graphs and it also has a it's a good point to do efficient integrations with external modules even modules that are written in Python I'm sort of easing the barrier of entry for for adding intelligence of stuff so it's a good host for integrations like API endpoints so we've taken the existing calamari API and we've just sort of plopped it right into step manager with almost no modifications and it's gonna enable some coming features in the future like things like set top and RB top that you identify which objects are getting the most I oh and which clients are doing the most iron the system better introspection and Instrumentation I mean it's also gonna be a good place to implement high-level management features in policy for example you know fully waiting up OS DS or identifying those teas that are flapping all those sorts of policy based decisions where you want to sort of automate management of the system we would be able to be put in set manager if you want the other big thing that we're working on right now is quality of service I mentioned this is sort of one of the big functionality future gaps that we currently have and the goal here is to be able to set policy both for reserved and minimum I ops for particular client or workload and also specify proportional sharing for the excess capacity initially we're gonna do this based on just the type of i/o so you can have client I Oh backgrounds for bio and recovery and I'm to do better performance isolation there we can extend that to do QoS policies associated with different pools in a ratos cluster but eventually will actually extend that all the way up to the client so you can say that this particular VM is going to have this many guarantee die offs across the system and everyone else fights over the scraps the implementation is going to be based on the in clap in M clock paper in OS di 10 which features an i/o scheduler and this distributed enforcement mechanism that Maps very very cleanly on deliver a DOS but the basic idea here is as you can see the sort of ugly graph I took from the paper if you have a one client it's gonna sort of consume all the I ops in the system if you have a second client come in that has a higher personal priority it's gonna take the lion's share of the I ops but that first client is still given of sort of a guaranteed minimum number of I have some assistance so that I can have that sort of minimum level of performance one of the other things we see particularly with our VD is concern around latency so obviously when you move to a share store system you're introducing latency into your system because your rights are going over Ethernet and they're getting multiplicative multiple times you have to wait for the acknowledge ins to come back that's sort of a fact of life and contrast if you're writing to a local SSD in your hypervisor you have very good performance very low latency that much as expected the problem is that if you use local storage devices if that SSD fails or your client post fails you lose all the data obviously that's not what we so what naively you would like to do is have some sort of write back cache so you are writing to your local device and you sort of asynchronously flush things out to the cluster the problem is that typical right-back systems are unordered so if you if you do that and then you'll lose your client cache the copy that's stored in the cluster is gonna have out of order rights and it's gonna leave data in an inconsistent state that's gonna look corrupt from the file system or applications perspective um so right back caching doesn't doesn't quite solve the problem so what we're looking to do in ratos gate or in rails block device is to create an ordered persistent client write back cache where we're careful about the order that we write things back from the client cache to the cluster so that even if you lose the client host or the client cache SSD the version of the image that sort in the cluster is in a fully crash consistent state it might be stale but it's crash consistent so you get low latency writes to look less SD you get that persistent cache you get fast reads from the cache you have this ordered write back that gives you a point in time consistent RBD image and this sort of fills out the spectrum so currently I have to queue this between a local SSD with low latency a single point of failure and no data if you if you lose the SSD and sort of the CEFR BD full replication where you have high latency no single point of failure and perfect durability this gives you sort of a middle point where you have a single point of failure only on the most recent writes for the last few seconds and if you crash you get a stale but fully crash consistent copy and we think that's is actually going to be useful for a broad class of applications that don't need that sort of perfect perfect durability in the system but one of the biggest things that we try to keep in mind on the team is that in the future despite this thing in a cloud conference where we all talk about V ants most data is gonna be stored in object stores you know all those cat pictures and videos and whatever else it's gonna be sorta object store it's not block devices or file systems I mean so ratos gateway is sort of a key strategic component of this F system and there's lots of stuff that were working on to enable all that know things like a richer coating multi-site Federation is big tearing features are gonna be really big but one of the sort of the cool new things that's coming and it's actually in prototype state and cracking but it'll be stable and luminous is ratos gateway indexing and this was actually grafted on to the new multi-site Federation feature so with rate of Gateway multi-site you have multiple set clusters and multiple Rios gateways for each cluster each of those sort of is defined as a zone and you have the rate of skate waste talking to each other to do asynchronous replication between sites we've extended this so you we built a plugin that essentially looks at the system roughly like a replication system except instead of replicating all the data is looking at the same logs and so forth but it's only replicating the metadata and it's pushing it into elastic search where you can go and query based on file type or object attributes or whatever else you put in there and in fact although I showed here in a separate cluster it turns out that these zones aren't actually don't have to map to a cluster it's just a set of pools in an existing cluster so you can have sort of an extra ratos gateway instance in an existing cluster that's just hashed with feeding all this data into elastic search where you can do your queries so this is something that people have been asking for a long time it's pretty exciting and I think all all sort of wind up here by talking a little bit about development activity process so we have a growing development community number of contributors isn't increasing sort of linearly for for several years now and the amount of code that we're able to produce it features relative movement that's been increasing as well thanks to the effort have a lot of different organizations this is a sum these are the top contributors for the the current crack and release lots of organizations have said Red Hat but I think that the cool thing here is not just the number of organizations but the breadth of types of organizations so you see sort of the usual suspects with all the cloud service providers and OpenStack companies here you also see a lot of cloud operators people like dream post $0.10 devotion you also see telco telecoms and you see OMS we're building hardware and enabling staff to make use of that hardware and you see storage companies and in fact you see some very old traditional companies and source companies you know seeing staff is sort of something that they need to pay attention to so I have just a few takeaways for this for this talk I think the first thing I really want to drive home is that nobody should have to use a proprietary source system out of necessity there's no reason why there should be an open solution that provides everything that we're here to fix that and furthermore the best storage solution should really be an open-source open source solution and that's that's part of our goal seth has been growing up but we're certainly not done yet there's still a lot of work that we can do on the performance front scalability features and ease of use but we are highly motivated and we're very interested in working with with everyone here to sort of get here so how can you help as an operator first and foremost you can you can file bugs I've heard of got all kinds of issues here and the obsession and elsewhere in the hallways about problems that people have hit with stuff that I haven't actually heard about before and so I admit I don't religiously follow the bug tracker but I think a lot of these issues I simply haven't been reported so communicating with the upstream develop the community can be very helpful there also features that are very hard to use and contributing documentation is a very low barrier way to sort of help help the ecosystem and you can blog about your experiences what works well and what doesn't and if you do all these things you can build a relationship with a core team that helps you become involved us understand what pain points you have and influence the future direction of the storage system as developers there's even more you can do you can go and fix those bugs that you reported and we love that and you can help us design the same functionality and actually go and implement it and that's very important it's also equally important to help integrate stuff both at their platform so we've done a lot of work obviously with OpenStack and Stefano of the sec worked very well together but there are a lot of other emerging infrastructure systems that that we can work with like Cooper Nettie's and whatever else you can help make stuff easier to use this is really one of the biggest challenges that we have as a community and you can help you can participate in the monthly developer meetings that we have they're all on video chat we have a Mia and a PAC friendly time scheduling and so we highly encourage you all to get involved and that's that's all I have thank you very much and I think I have time for questions yes yeah so the question is about third-party software using liberators directly instead of rtw or bd7 thefts we're seeing that more and more often where we have people building integrations directly with liber8 us instead of I'm sort of the other higher-level interfaces and then we love it I think it's great we liberate us is a much lower level interface but it gives you a lot of power that the sort of s3 type object storage interface system do so we've seen time series databases we've seen or tribal systems we've seen databases recently we did at integration with rocks TV so you can run the rocks TV directly on the brightest which means you can run my sequel on rocks TV on the brightest of the databases the service thing if you wanted to I'd love to see somebody go implement that yeah other questions yes yes so the question is about limits of scalability five petabytes ticking petabytes the system is designed to you know scale infinitely of course that's that's never really true in practice and the limitations on Ceph scalability are really run the number of OS DS in the system not the number of bytes so if you have big devices you can store a lot more bytes the biggest stuff clusters that I've been directly involved with building our test clusters that we did at CERN and I think we were in the neighborhood of four or five thousand OS DS they'd earlier done one that was more like 8,000 we've just done a whole bunch of work and crackin that's reducing sort of the map sizes which were sort of limiting our scalability and so we should be able to go much past that but we haven't had a chance to retest so it's hard to test big clusters when you don't have big big clusters um I wish we could afford one but we don't and but I eventually I see us going you know 100 petabytes and and well beyond that I don't honestly but I'm it's hard to actually build those and the the biggest one I personally used was around 5,000 those T's if I'm remembering correctly yes but I've heard of ones that are bigger I don't know how they went yeah other questions all right thank you very much

Info

Channel: Open Infrastructure Foundation

Views: 4,559

Rating: 4.9200001 out of 5

Keywords: OpenStack Summit Barcelona, Architect, Upstream, Project Technical Lead (PTL), Cinder, Manila, Sahara, Swift, Trove, Sage Weil

Id: WgMabG5f9IM

Channel Id: undefined

Length: 36min 2sec (2162 seconds)

Published: Thu Oct 27 2016