Today's ZFS Michael W Lucas

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

MWLucas' books are wonderful as well. At least the Unix ones, haven't read his fiction.

๐Ÿ‘๏ธŽ︎ 3 ๐Ÿ‘ค๏ธŽ︎ u/JFoor ๐Ÿ“…๏ธŽ︎ Apr 06 2020 ๐Ÿ—ซ︎ replies

Thanks mate :)

๐Ÿ‘๏ธŽ︎ 2 ๐Ÿ‘ค๏ธŽ︎ u/Sixkillers ๐Ÿ“…๏ธŽ︎ Apr 06 2020 ๐Ÿ—ซ︎ replies

Home builds used to be glitchy, some still are. Blue screens of death were the norm. Cheap supply it just browns out, Occasionally. Cheap SSD, just has trouble with cached writes if power is flaky. 64 Gig really smooth canโ€™t pass memtest for 24 hrs, but who runs memtest as an OS. ECC yup thatโ€™s extra cost control, donโ€™t need it. Iโ€™ll back off over clocking when I see artifacts. Data integrity, if there is corruption Iโ€™ll just data hoard something else. No big deal. Backups ? Just hit reset, look itโ€™s back up.

Great video

๐Ÿ‘๏ธŽ︎ 1 ๐Ÿ‘ค๏ธŽ︎ u/notrhj ๐Ÿ“…๏ธŽ︎ Apr 07 2020 ๐Ÿ—ซ︎ replies

10:30 ECC Ram is a should have item. Not an absolute must that is cause to take up pitchforks and torches. Without it, you're equal to other file systems with ECC.

๐Ÿ‘๏ธŽ︎ 1 ๐Ÿ‘ค๏ธŽ︎ u/wing03 ๐Ÿ“…๏ธŽ︎ Apr 06 2020 ๐Ÿ—ซ︎ replies
Captions
hi folks are are we live okay hi everyone my name is Michael Lucas thank you for inviting me here uh I write technology books a whole bunch of them and I've been using UNIX since sometime in the 1980s I forget the exact year I've been assisted min since 95 and I now make my living writing about all the things I learned the hard way as a sis admit I'm a founding member of semi Borg the Southeast Michigan BSD user group they meet one week after the mug meeting so it's very easy to remember just come on over to alter engineering and I'm here tonight because of ZFS Allen Jude and I wrote a couple books on ZFS and I should probably also mention my new books are SSH mastery and of course ed mastery remember it is the standard UNIX editor as Michael Warren Lucas I write novels like git commit murder but sadly Michael Warren is best known for his cutting edge work in Linux erotica so let's talk about ZFS what is ZFS is anyone here using ZFS okay a few of you good ZFS is a modern fully featured file system and by that I mean it has things that we've wanted for decades in other file systems such as built-in integrity checking snapshots are built in from the ground up ZFS was created by Sun and some of you may remember Sun open sourced it and then Oracle bought Sun and darkness descended upon mankind fortunately although Oracle pulled ZFS back and closed the source once it was open its out there the the open ZFS project is now the source the the central coordinator of all things ZFS and updates and improvements are fed from open ZFS down into all of the operating systems that use it so the ZFS on Linux project FreeBSD canonical etc all get their ZFS implementations from open ZFS some of these projects do their own improvements on ZFS and will feed them back upstream and they it gets distributed out to all the projects so there's a very active development environment and community for ZFS so what makes ZFS special when I say it's fully featured what does that mean what that means is it uses very well understood technologies and puts them together in a seamless whole to take advantage of all the things we know about file systems and data today so we have checksums everybody uses checksums every packaging system uses checksums but by putting them in the file system you can test and verify the integrity of data in the file system so that you know if a stray cosmic ray hits your disk ZFS notices and can fix it it has a very sophisticated metadata compression there there's really no need to gzip your log files anymore because ZFS transparently compresses for you ZFS even has some parts of diff you can examine the differences between two different versions of a file system and by combining all of this all these well-understood techniques it creates something greater than the sum of its parts and I would like to again thank Gibb for three root beer life is good and we've all seen how disparate parts can come together to create something stronger for example if I can get Brian up here well we can demonstrate no no no okay the one thing that might be new to you is this phrase copy-on-write so what is that ZFS never changes a written disk sector if data that is on a sector is changed ZFS allocates a new sector and writes the data to that and then it D allocates the unused sector this means that the data on the disk is always coherent you never have a 1/2 written block if you get a power loss half way through the right and the right cache runs outta juice and lightning strikes or whatever the data on the disk is a coherent filesystem maybe you didn't save what wasn't written to disk but nothing's gonna save what's not written to disk and an interesting well that this is really no different than you copying a file before you edit and make changes it's sort of version control at the disk level and an interesting effect of this as you get effectively freeze snapshots which we'll talk about in a bit but before we go deeper let's discuss how you need to think about ZFS ZFS is not EXT FS it's not UF s it is not xf s if you treat it just like you FS somehow it will bite you you you need to know what it is you're doing and how ZFS works that that's not hard but a little bit of reading will save you a lot of pain nan ZFS tools like dump might appear to work they don't but they might look like it it's much like switching from anyone remember s 51k file systems before you FS we treated you FS very differently than we treated those old file systems ZFS is the same kind of change so ZFS hardware you'll see a lot of gossip and discussion on the internet about what hardware you should have for ZFS but these are really the essentials don't use a RAID controller ZFS expects to talk to the disk it Ixia fess will actually read information from the disk and it will notify you if it's getting bad sectors on the disk and the raid controllers hide all of that some raid controllers say that they can give you raw disk access what they might call J baud mode or just a bunch of disks often what they really do is they create one disk raid containers in their own proprietary format and they hide all of the performance information that ZFS is expecting so look carefully if if you're using a RAID controller check carefully make sure you're actually getting disks and not some vendor proprietary who knows what that it's passing off as a disk another subject of discussion is ACC RAM there are people who will say you must have ECC Ram for ZFS well the question is where does the cosmic-ray hit does it hit the memory or does it hit the disk so ace a host running ZFS without ECC Ram is no worse than any more traditional file system with ECC Ram the question is where is the checksumming taking place now if you can get ECC Ram and you can check some everywhere certainly do that I would never argue against ACC Ram and one of the advantages of ZFS is it is extremely redundant it has all kinds of redundancy built in Hardware redundancy only works if you have more than one hard drive so for a server I would encourage you to have multiple hard drives and there are there are ways you can work around a laptop but we'll get to those so some ZFS terminology and language a video or a virtual device is a group of storage providers disks a pool is a group of identical virtual devices a data set is a named chunk of data on a pool it could be a file system it could be a block device for a virtual machine it could be a snapshot it could be any number of things it's just a named lump of data am i moving around too much okay and you can arrange data in your pool any way you want one other thing to remember on ZFS and the ZFS tools is the minus F flag is important it's dangerous and any time you come across minus F you should think hard and thoroughly before typing that and then you shouldn't type that minus F stands for force and we'll talk about it yeah there are more slides that the minus F point is worth hammering home twice in this talk because that one of the most common messages you'll see on forums and mailing lists is I used minus F and now my life is terrible yes it is you used - f so a virtual device a virtual device is the basic unit of storage in ZFS all redundancy happens at the virtual device level so you can have a group of three hard drives in a virtual device you can have three SSDs five ten however many you lump these together and that makes a virtual device and most commonly these are just a raw disk they can be a GPT partition if you're on FreeBSD I'm a FreeBSD bigot as it said right up front so deal we have a stackable filesystem so it could be a crypto device it could be an LLVM raid I wouldn't but it could be if possible put it right on the raw disk and I'm gonna call these disks even though they might not be in your deployment so how do how do these virtual devices tie two poles basically a pole can only contain one type of video you can put 200 V devs in a pool but they all have to be the same type so you'll hear an X Vida a type X V dev and a type X pool get thrown around interchangeably they really aren't interchangeable but if you say I have a raid zpool people will understand what you mean you add virtual devices to pools you generally don't add providers to virtual devices that is once you have your your virtual device of three disks it stays three disks forever one of my personal favorite type of V dev is the strike each disk sorry no that I misspoke my personal least favorite type of pool is the stripe my apologies each disk is its own virtual device and has just striped across all of the virtual devices you can add more virtual devices to the pool and there is zero redundancy there is no self-healing and if any one of these devices fail the entire pool collapses and dies stripe is fine if you're say in a computing cluster and you need a great big scratch space and you don't care if it dies because the whole job has to be restarted anyway so who cares a lot of people if you're using ZFS on your laptop most laptops only have one disk you're stuck with stripes you can set the pool copies property to get self-healing which we'll talk about later but that will not protect you against hardware failure mirrors mirrors are my favorite and this is where each virtual device has multiple disks and all the disks are copies of each other so does it this is a good old-fashioned RAID mirror the nice thing with ZFS is if you want to convert your mirror to a RAID 10 striping on top of mirrors you just add another identical virtual device and ZFS will say oh I have more space and raid 10 is delightfully fast it's wonderful for databases and you can just pile on virtual devices until you run out of server rate Z is where ZFS starts to go off in its own special direction each virtual device contains multiple disks at least three data integrity is constantly checked by parity much like raid 5 if you lose a disk in the video there's no data loss and the pool can heal itself via its redundant checksums a raid z pool can have multiple identical V devs if you have a virtual device of 5 disks and forever you may have to buy it you know external storage controllers and daisy chain them together and get some power strips but the disks are fine one important thing is you cannot expand the size of the virtual device when you start with a 5 disk virtual device it is 5 disks forever there is work going on to be able to make that happen but who knows when we're gonna see that for the foreseeable future constant number of disks pervy dev there's three types of raid Z raid-z one single parity three or more disks and you can lose one disk per feet of raid-z to four or more disks you lose two disks to parity you can lose up to two disks without data loss and and I think some of you are starting to see the trend here raid z3 5 or more disks three disks are used for parity so that it's triple parity there are reasons these days to have triple parity one this size far exceeds that the throughput you get to the disk we have I haven't been to Micro Center lately how big our hard drives now 10 terabytes to I mean my laptop has a terabyte and I'm nowhere near filling this thing so and yet we're still at SATA 3 speeds you can write data that disk and it it won't be seen again until who knows when so there there are a couple of interesting research papers showing that there's a really good chance that if one of your disks fails and you have to rebuild parity which means walking all of the disk or all of the disks that has data on it you may find another flaw in another Drive and lose that drive so and the math says that for really large pools raid Z 3 is a serious concede be in serious consideration for your data so you have all this flexibility what do you do with it cuz you might say I have a 100 disk array I'm gonna raid z3 in and have you know just lose three disks to parity and I'm gonna have a huge amount of space just just it's not quite that simple in general don't put more than nine to twelve disks in a vida pool size really is a matter of hot debate go to a conference where there is heavy heavy ZFS use go to the bar walk up to someone and ask how big should a pool be out in the Old West they called this game lets you and him fight so sit back watch the fun on my servers I put the operating system in a mirrored pool if it's very high availability I will add a third disc to that mirror just to be sure and then output data in separate pools depending on the data this is where you look at your application and try to make an intelligent guess at what this horrible piece of software is going to do to your innocent server and to be fair the database admins don't know either they're making their best guess so ZFS versus ZFS raid Z versus traditional raid u Z this is combining a filesystem and a volume manager which means that ZFS knows what data has been written and it only has to reconstruct that you're a traditional RAID controller we'll only what rewrites the entire disk even if the disk is 90% empty now some raid cards have proprietary extensions that only copy data and this means they they have to have some awareness of your file system and this is all proprietary vendor crud and there are people who say that's fine there are people who hate it I am I do not have the time to come up here and and go on a soapbox one way or the other so I'm not going to speak any particular opinion about whether you should have this proprietary stuff so and the rebuilding happens because of copy-on-write which it knows what blocks have been written and it only writes new blocks so how do you actually use this it in our example I've labeled my disks and if you're not familiar with BSD each of these disk devices I list here there's a there's a implicit flash dev in front of them so these are GPT labels I use the zpool create command and here I'm making a striped pool it's disposable whatever I throw in can rot so the zpool create command I'm calling my my disposable data compost and I've added these three disk devices to it and that's it here's the pool here's what I want here's the type of pool here's the name and this is what you put in it and what you get the Z pool status command shows you all of your pools our pool compost is online it shows each of the disks any errors creating mirrors is much much like it except for the the mirror keyword here shows that where our pool is called reflect it's a mirrored pool here are two storage providers and again Z pool status we have our pool virtual device name mirror zero and there are the members now raid-z pools you have to use the keyword for the type of raid-z pool you're making but otherwise it's very similar this is a raid z1 pool with three disks so and once you you figure out how to read one type of pool they're pretty much all the same where things get fun is when you have multiple V devs there's all you do is you use the type keyword multiple times so here we're creating a pool called that there's a raid z1 and it has these three devices the raid z1 keyword again says this is a new Vito and here are the three devices for that and so you can look at the pool and see exactly that we have our first video and our second video and the members of each V dev in the pool so when we get two errors you'll be able to see exactly what virtual devices have the error and where they are where the problems are remember when I said - F force was bad this is exactly what I mean here I'm creating a pool it's called Dafty and it is a raid Z pool with HV dev has three members although I haven't had my morning caffeine yet and I used the raid-z and the mirror vdf keywords so I made a typo if I add minus F ZFS will say so you wish to shoot yourself in the foot I'm okay with this plan here you go and you you can't ZFS will not handle well different types of virtual device in the same pool if you find yourself you use the minus F anyway you will start to get errors because it's trying to do two different things in the same pool and you'll go searching for forum posting and mailing list posting saying how do I fix this and you will be told destroy the pool and start over restore from backup now you know there is one place where it is okay to use minus F and that's if you are recycling disks you made a pool you destroy a pool you want to create a new pool and ZFS marks the disks so that it each disk has a label that says what pool it is in so if you know the pool is not in use or the disk is not in use go ahead and use minus F because you really do intend to reuse this disk so once you've created your pools you can view them here I've got a system I use zpool this done I have a DB pool that contains my data I have a Z root pool which is the traditional ZFS root pool for the operating system or I could use Z pool list - V and get all the details which is is worth doing now and then on your system just to learn what normal looks like so let's talk about pool integrity for a moment pardon me ZFS is self-healing at the pool and at the Vidal a table basically every block is hashed repeatedly there are checksums everywhere on everything this is laid out as a tree and the hash of each child block is stored in the parent so ZFS has this concept of a scrub would just start at the root of the tree walk down the tree compute look get the hashes for every block on the disk compute the hash of every block on the disk make sure that things match up if something doesn't match look around rebuild the block from parity figure out where the problem is and restore it so how many of you have had a Santa and a JPG go bad because of an error on the disk you have yo half the picture looks great and the other half looks like garbage this always happens with like your kids wedding pictures or you know the newborn baby or know that that hated relative's funeral where you want that one last picture you know the important things so ZFS automatically handles all of that for you if you have a virtual device that is not redundant if you're using a stripe for your laptop will talk about copies property will talk about properties later but the copies property says keep an extra copy of everything which yes it cuts your storage space in half you tell it to keep two copies but when ZFS detects an error it can compare the copies figure out which is correct and heal it so if we have 10 terabyte hard drives I have no idea what you're doing with that space so set copies now I hear objections no and then about scrub verses F sick how many of you know and love F sick ok don't love it's a straw how many of you know and love F sick more than you know and love s FSDB and CLR I filesystem debuggers we have some FSDB fans in here awesome ahh ZFS has no offline on integrity checker you don't shut down the file system to check it it does all of this live and there's a certain if you've run UNIX for a while there's a certain gut level feeling that no you you shouldn't you shouldn't preme live file systems there live stuff is happening on them you shouldn't you should quiet the debt the file system and and check it in peace so you know that it's clean and pristine polished to a high glean before you take it back out on the road where some stupid gravel trucks throw something up and dancin but that that's just not the case with ZFS a ZFS scrub does everything f sick does and more it is possible to offline your pool and scrub it I mean it's a UNIX system you can take the file system offline if you want offline scrub won't help but you can do it and there are lots of little nitpicks against scrub and things that it possibly could not find but and more applying back in the day way back when and mr. O'Connor proudly remembers these days ah when you FS was a new thing and F sick was a new thing there were objections to F sick how can you possibly have an automated program check a file system you need the human hand to go and use FSDB to debug the file system and CLR I to wipe away unneeded inodes and sis admin's are this perverse blend of bleeding-edge and conservative we tend to be too to treat our file systems as the precious things they are but scrub is a better thing than F sick so pull properties properties are tunable or values both pools and datasets have properties to play with properties on a pool you'll use Z pool get and Z pool set some of these are read-only some are calculated by a pools characteristics and some of them you can change for example this pool size is a property you cannot change the size if you're a pool by toggling a software setting we do not yet have software that makes additional disk appear on the other hand a tunable like butta FS here this is pointing at Z root default this is where your root filesystem is you could change that and boot from an alternate route and here we'd use Z pool set to make a couple minor changes a pool can have a comment saying what the pool is for perhaps or perhaps it says Craig don't touch this whatever and here we set that copies property I talked about - - there's your laptop redundancy good question um properties only take effect on newly written data so if you want to set copies equals two on your laptop you need to do that in the install before it's written to disk or you can decide you know I don't care about the base OS files I would just restore those from media but I care about my pictures and my game scores so they you you set that immediately upon install before copying anything over another useful thing is the pool history ZFS records every action that changes the pool it doesn't record when you run z pool get when you're just looking who cares when it records every time you run Z pool set it reach chords every time you create a dataset it records every time you change something this is incredibly useful ah more than once in my corporate life I came into work and there was some flunky who came running up to me and said oh my god everything has gone terrible and I don't know what happened and there is the cynical and experienced part of me that says no you know exactly what happened because when I was you I knew exactly what happened and there was no way I was going to fess up so is equal history you can even read the history off of a destroyed pool so once you've dealt with the problem you can look at the pool and say okay you don't know what happened but the pool history shows that five minutes after you came in this command was run on this wall and can you tell me how that happened that's an important let's just say that I'm really glad that today's new and upcoming sis admin's have more accountability than we had back in the day so destroy a pool zpool destroy boom gone you can reuse the disks okay flags ZFS has a whole panoply of features nifty things that can do and when Sun was in charge of ZFS ZFS had version numbers and every time they added a new feature they cranked the version number like you do and then Oracle so the final oracle ZFS was version 28 they may or may not have had further versions internal to Oracle but that the open ZFS story ends with Oracle ZFS version 28 what remember how ZFS is developed you have the open ZFS body and you have all of these contributing bodies and and consumers and developers sending stuff up and down and while open ZFS is a clearinghouse and then and an arbiter there they're kind of like the computer science research group back at Berkeley in the day so how were they going to keep all this straight well they cranked the version number to 5,000 that was to give Oracle lots of room so they could continue to develop ZFS in-house because we're nice people that way not that Oracle would ever let Sun die so and and they deployed this thing called feature flags basically a feature flag is a way that the pool can say this is stuff I support so get all Z root and just grab for feature in ZFS version 5000 or later so you can assuming that you are putting ZFS on raw disk and you're not using an LLVM or a FreeBSD crypto layer or something to put it on you can take you can look at the feature flags that are live on your pool compared them to the feature flags available on another host even in another operating system and provided all the features are supported you can pick up those disks from one OS to the other and they're good to go you can also enable and disable features for example here you'll see this particular feature is enabled where this one is active active means that there is data written to disk using this feature active me or enabled means we've turned this feature on but nothing's actually written with it so you could turn this feature off and move it to an OS that doesn't support that feature ok are we for time data sets a data set is a named chunk of data it could be a file system could be a volume which is a lump of stuff for say a virtual machine it's black storage it could be a snapshot a clone or a bookmark properties and features work on a deeper data set basis the rule with ZFS lots of data sets so here we have just some samples let me see how much is used available and where it's mounted you might notice that we seem to have four routes data sets well we have the mount point but three of those are not actually mounted so you would create a data set on an existing system with ZFS create here are the first one I've created is this a data set just for MySQL what now if you have a complicated MySQL database you might want data sets under there for certain of your databases and I've used this - V to say create a ZFS volume which is a lump of black storage you can run ufs on ZFS in a virtual machine and have a self-healing ufs that the VM never notices you can run Windows in a virtual machine and have the filesystem self heal which there are times I would have given a kidney for that I mean not mine but someone's move data sets by renaming them destroy data sets with ZFS destroy when you're destroying a couple useful flags these work anywhere but they're particularly important when you're destroying - V is verbose mode and - n is a NOAA combined they say if I was really to destroy this dataset what would I really be destroying because CFS destroy can be kind of like you RM minus RF and let's let's be sure what we're doing here it's a dry run exactly a CFS property it's a per dataset characteristic much like Z pools I'm not going into detail that because you're all pretty smart people one place where CFS is a little different is the parent-child relationships datasets inherit their parents properties so you can say set a certain type of compression on home directories and set a different type of compression on var log but then have datasets beneath var log save our log mysql and set those child data sets will inherit the parent settings unless you change them so if you change the parent the change propagates immediately through all of the children we're renaming a data set changes its parent and changes its inheritance so mounting and unmounting ZFS the ZFS mount command not your regular mount command there is a mount point property so you can say I want my old MySQL dataset mounted but I want it on slash mount so pull repair and maintenance resilvered is the term they use for rebuilding from parity resilvered uses the V dev redundancy data like we talked about with scrubs if there's no redundancy there's no resilvered the catch with repairs is as you might guess they're throttled by disk i/o when you replace a disk three silver ring happens automatically remember you add V devs the pools not disks the vetos one thing to watch out for not all ten terabyte disks are the same size if you look carefully at the box it'll say how many sectors are actually on the disk if you have a disk die and you try to replace it with a disk that is you know two sectors smaller nope there's not enough room so pay careful attention to the actual size of your disks so you can add a video to a pool for example here I have our scratch pool it is a that it's just a strike I want more space stick another disc in fine striking here I have my database mirror pool and I've adding two more disks as a mirror vif and here I have a raid Z one pool called DB and I'm adding three disks making that pool larger so hardware the important thing is to in storage is that disks are terrible they will die the only question is when will it be least convenient and that's the time so ZFS shows hard-wearing in several different states online is working normally a degraded V dev has something one of the disks is kaput is dead do something a faulty disk is generating too many errors your RAID controller and its proprietary gunk that will hide faulted so you want to know when that disk starts generating errors so you can you can replace it at a time that is convenient to you not that's convenient to the disk an unavailable disk can't be open maybe it's unplugged who knows offline disks you turned it off turn it back on idiot and removed means that hardware has detected a drive has been removed so here's how this might look in production I have a host here freenas box it recently did a scrub took 15 hours and 57 minutes these are some really big discs but here we have two discs that are unavailable this rate z2 device is degraded so the pool it is in is degraded and now hopefully when you put together your big storage array you kept track of disk serial numbers and what tray they're in or how your host maps device IDs or labels to the physical array there's lots of ways to do that I don't care what you use just use one of them so and now that I've shown this example John if you're watching the video you really ought to look at this host and fix those two discs thank you very much so ah a couple neat things that you can do on ZFS that you can't do elsewhere log devices and cache devices the question in high-performance storage is where is your bottleneck and sis it men in the performance setting is all about arranging your bottlenecks sometimes it's like rearranging the deck chairs on the Lusitania but we do the best we can a read cache what they call an l2 arc is you all know how your host will store in memory the files that are most recently accessed and up until it uses all the memory it's allowed and then too bad a read cache is a very fast drive usually an SSD then it's used as a layer to read cache it's stuff that is used often enough to almost go in memory but not quite so you can keep that as a handy fast copy similarly a write log ZFS intent log or a slog suppose your database is quiet most of the time but every so often you get hammered with an amount with an insane amount of data well a write log is kind of like a write cache on a hard drive you put a really fast device in and say just dump it here and when the rush is over you can more slowly at a speed that the main system can affect move it off of that device and into the main pool so don't just buy these arbitrarily look at your system to performance testing see what you need yes get most file systems already have some level of caching to disk absolutely journaling etc the caches if you have a storage array you don't want to buy well we would all love to buy the fastest disks available on the market we would all love screaming fast storage uh-uh in reality we have to make some trade-offs and what a cache like this allows you to do is you buy fair speed main storage and you can see you can save a huge amount of money by not buying the top-of-the-line fastest drives and then buy a smaller device that is screaming fast and use that as the cache so the application sees that your storage is screaming fast and the system behaves as if the storage is screaming fast but you didn't spend the money and and personally I want to spend money on important things which is you know people pizza you know things like that there are proprietary systems that do things like this generally speaking ZFS does it better and faster this is this is tiered storage yes a lot of caching devices are ran right so this is this is an approach at tiered storage go take a look at some benchmarks that are not produced by vendors and see how ZFS stacks up against them Ram is gonna be miles ahead of your fastest write to disk yes until you pull the plug but if you have that much rim I mean it we can go around for hours on how to stack all of this just it's it's a tool don't go buying it blindly but if you need it you'll be really grateful it's there so this built-in filesystem compression compression exchanges CPU time for disk i/o and on most systems disk i/o is the limiting bottleneck that's where everything slows to a crawl it's getting stuff on an office storage CPU time we we this laptop has like 87 cores I don't even know anymore uh I don't count there we have all the processor time we could want so we I'm happy to make this trade because ZFS is default compression algorithm lz4 is very good at detecting when data is compressible and when it's not you don't want the filesystem trying to compress your binaries there's no advantage there you do want the filesystem compressing your text log files so because you can fit ten times as much of that onto the disk if you can shrink it by a factor of ten always enable commit if you want to play with compression it's on by default but if you want to play with it you enable it before writing the data the ZFS will not go through and rewrite old data for some cases gzip is better than LZ for I was responsible for a system that had like 15 years of phone call records and fairly small files all plaintext only accessed when something went annoyingly wrong and gzip 9 got us an extra 25% on that if I recall correctly so there's no more userland log compression the filesystem does it move on yep lv4 soon as you read it through the operating system depends on how you're moving it are you moving it within the same dataset you're giving it to the Russians yet you uncompress them decompress but that CPU time is infinite yes well you would not the file system does it for you the file system Auto compresses Auto decompresses you don't even have to worry about it if you want you should compress data in transit but that's that that's a that's a different thing and we you pipe it through gzip and pipe that to SSH and get on with your life yes yes or take this other one yes yep that's absolutely correct CPUs are bulletproof so and yes encryption is separate compression if you can take a 10 Meg file and shrink it to 5 Meg before you write it to disk that file gets read and written twice as fast no no encryption and compression have nothing to do with each other yes yes so similarly while disk i/o is the scarcest resource memory is slightly limited and if you're reading these text files that you have gzip compressed down into almost black hole density well why not store them in RAM the same way you can vastly improve the amount of the amount of disk style storage that you have cached in the CPU if you compress that so the the ZFS Ark the error buffer cache if you will does that for you it's just automatic let's um one feature that gets touted on ZFS is deduplication where if you have identical blacks on disk you could merge them into one block and keep track of it um most data is not ZFS D duplicate Bowl you you get some benefit but not much at all and there there are some deduplication also increases the amount of memory you need to use a pole so generally speaking I have never seen a ZFS pool where deduplication was really useful it is theoretically possible that you could have data that could be D duplicated and it might be worth it and it's conceivable but who knows 128 K by default smaller block sizes would be more D duplicate but these are modern discs so let's talk about snapshots with copy-on-write we've gone over an hour so I'm gonna skim through these just a little bit copy-on-write means that you never overwrite an existing allocated block you always allocate a new block so ZFS basically has a ledger and you can say I want to snapshot of this file system and when you snapshot it ZFS lists every block that is in that file system and it will when you overwrite when you want to change a block and it allocates a new block it just changes the accounting in the ledger let's say that old block with the old version is now allocated to this snapshot so you have a hundred gig file system and you take a snapshot and you change one gig worth of data the the live file system is still a hundred gig your snapshot is one gig because it only contains what changed getting rid of this snapshot is very easy because you just go to the ledger and say throw away throw that away so and snapshots are why why you would want to use the - VM so it's for ZFS snapshots are just bookkeeping and I got ahead of my sides but it's it's cool stuff yes there are many articles on how ZFS protects against ransomware and basically all you do for that is look at the next slide not this one the one we're going to what the catch with snapshots suppose your did your disk is getting full you want to clean up some space you go on your home directory you oh you have all these old is OS you throw them out but weirdly no spaces filled up there are no spaces freedom it's because those blocks are still in use by the snapshots of that data set so you have to go and be careful in and in what you store and what you snapshot because blocks are only freed once no snapshot uses them for ZFS rollback suppose you're using ZFS to back your office file store and one of these ransomware programs comes through and it encrypts the whole thing you can restore that with the ZFS rollback command and just roll back to the most recent unaffected snapshot and boom the windows admins owe you beer I'll wait until they have fixed the virus problem before rolling back or you're gonna be rolling right back again Ana nifty side effect of this is a clone a snapshot is read-only it's a it's a photo a time of a point in time but a clone is basically build a read/write filesystem on top of that snapshot so here I have I've taken a snapshot of today's MySQL database I want to test the new version of our software I cloned that snapshot I run the new version I run the upgrade I'm only using that small amount of space for the difference between the live data the snapshot and the clone so if the test goes horribly wrong no harm I file my bug report and throw out the clone whoo here's another nifty thing is boot environments it's built on clones and snapshots basically use a boot environment snapshots your root data set with all your binaries before you upgrade you run the upgrade everything is great no problem things go horribly wrong you roll back the nice thing about boot environments is you can keep them around how many of you have had an upgrade go bad but you didn't realize for days or weeks just how vile and insidious it was you have a month old boot environment now you'll want to take steps to protect your database data or what have you and but you can at least get the OS back to something that wasn't horrendous the last nifty CFS thing we're going to talk about is CFS send/receive ZFS knows what blocks have been written since a snapshot how many of you use our sync to synchronize okay it's everywhere you fire up your our sync job and on your localhost it goes through and it compares all the files and it's looking at M times usually or depending on how you configured it some some other characteristic and it picks up everything that's changed and it shoots it over and it pounds on that disk ZFS knows what blocks have been written so you synchronize two hosts initially there there's nothing for it if you want to if you want to move your 100 gigs of data and replicate on another host you have to suck it up and move a hundred gigs of data but then after that say you have a daily synchronization job ZFS knows all of the blocks that have been written since that last snapshot you sent and instead of walking the filesystem to figure out what it needs it just goes here's my list of blocks send those and and that's it another nice feature with this is it's resumable if you are halfway through your hundred gig transfer and someone cuts the cable you you don't have to resend what was already sent you can pick up where you left off so this really blows our sink out of the water for any kind of performance related work as I remember this guy oh is the most precious resource our sink eats it so I'd like to open the floor for questions don't ask me about the the differences between this the opensolaris sea DDL and the GPL because damned if I know I'm gonna do one thing before taking questions which um dear IRS my drive out here and dinner is now tax-deductible thank you very much everything seen that if there's folks you use you might like some of them so any questions yes sir no not multiple hosts there are add-on ways to do that and if you're doing something like cast or some of the other networked mirror to storage you can certainly put those on top of ZFS yes sir you can do a read-only mount you can't you can use tools like FSDB I'm sorry not FSDB um zdb and walk in and start looking and you will discover that ZFS has these things called Z nodes that look an awful lot like really fact I knows and you can back-trace what files go to what and what blocks they are and when they were touched so it is really not that different and if you wanted to look at it from a user level it mount it read-only just as you would really anything else we had over here right you're fine oh no it will use the smallest yeah yeah they they either thought of that ahead of time or they caught on real quick there are ways to grow the disks to say use the extra space but you have to give that command and and that's how you would say if you want to upgrade a pool in place say you have a pool of one gig one terabyte disks and you want to replace them with ten terabytes you fail one disk you plug in the 10 terabyte resilvered say grow that disk fail the next disk it's it's tedious but it's a lot more possible than other os's and and I think we're gonna take a break and let folks vote I nominate Craig for everything oh yes this is all live there's no taking it off to resilvered we we don't play that thank you very much hopefully you can get back to the next presentation um so mr. McCullen do you want to take the lead on this all right I don't think we have any nominations that I've seen one nomination I nominate myself but I also say we nominate the existing board okay let's do that 34 is anybody else wanna run don't feel steamrollered typically yeah all right any other nominations okay well then yes yes yeah amongst the board positions I am running yes all righty so the shai's have it okay all right yeah yes very sick you thank you and welcome our 2018 to 2019 board okay let's get back to Eddie yes uh Michael do have enough time for head okay I didn't want to cut you off completely at the legs is just a reminder to while we're getting everything reset up again we have our comment cards we do collect these at the end of the meeting if you would you know put put how you felt about the meeting any comments you have about the meeting as well come on Akana if you if you actually move the data on the disk from one data set to another it copies the data if you rename it within the same data set ZFS is lazy and just changes the label right right well the file name will be written twice yeah so cuz it's it's something you wrote new ZFS does not care it is like the honey badger okay sure one more ya know a version of it is written that does not have that those boxes in it and then the old index no no it was discarded or kept the snapshot or that's another great discussion topic and that short answer is it depends I believe mine under 90 I use a trick called reservation which is I block off 10% of the distance that you can't write to it so that when my disc hits that 90 percent point it's not actually full I just have a real hard warning that I need to look at you can throw home some shots your your enterprise of course has a fully functional battlestation network monitoring system that will tell you what the differences are
Info
Channel: Michigan!/Usr/Group
Views: 9,562
Rating: 4.8994975 out of 5
Keywords: mug.org, oss, open, source, linux, unix, Michael W Lucas, Michael Lucas, bsd, freebsd, openbsd, zfs, netbsd
Id: x9A0dX2WqW8
Channel Id: undefined
Length: 86min 55sec (5215 seconds)
Published: Wed Apr 25 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.