TrueNAS 12: Replacing Failed Drives

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
tom here from orange systems we're going to talk about replacing failed drives inside of trunas and we're going to forcibly fail a drive for this particular demonstration now the failed drive scenario is based on the resilience level of your system now it goes out of scope and i have other videos that dive deeper into different configurations for zfs arrays but this particular array can suffer a drive loss without a problem and we can rebuild it there are different scenarios for whether or not it can suffer one drive two drive three drive etc or when you build certain type of mirrors it goes like i said a little on the scope of this but i'll leave a link for those of you that want to dive deeper into all the different zfs raid configurations that are out there which there are a few other notes this is a 320 gig hard drive and these are 320 gig hard drives if you use a larger drive than the pool was built with you can but it will not use the full extent of that particular drive so these being 320 if i use anything other than a 320 something bigger you can't go smaller that means a terabyte drive would still only use 320 of it so you end up with a bunch of dead space on the drive but sometimes when you're dealing with older drives we've run this problem before you have only one drive of a failed array and it's hard to find that older smaller drive so it's not a big deal to put a larger drive in it just won't take full advantage of it now let's look at the system and dive into how we're going to fail this system over now we start with it works it's all up and running we have a green online check box and there's no problems with this system and the first thing we'll do is show you what happens just when you have something like a loose cable scenario which has actually happened to me a couple times so we're going to switch over to the overhead and look at the drives we're just going to take and unplug the cable now when you unplug it you can go back over to here and i'm going to go to here and we're going to go to status so we can see the drives right away it does let me know a drive has been removed and degraded now sometimes before a drive fails not always you'll get a read and write or check some errors because you'll have some indicators that the drive is going bad but that's a maybe it is not a guarantee but we have had we've seen this in one of the enterprise systems we were dealing with we had a sled that wasn't snapped in all the way that would cause a drive to kind of come and go sometimes we're not exactly sure we moved it to another slot on the drive and by the way zfs doesn't care what slot things are in uh and the problems seem to go away we just didn't we still don't have another drive to put in the other slot we don't know if that slot's bad but we know that drive is really happy in another spot the good news is even though it's degraded right now and actually let's go in and move some data around and this is where some test data is [Music] yeah let's go ahead rm-rf this folder here now the reason i'm doing this with that drive disconnected out of the array is so some data has changed since this drive became disconnected because you can just reconnect it and it will just join back in and how fast it joins back in is going to be dependent on how much of a difference there is on this particular drive from when it was attached so now that we've removed some data let's go back over here it's still in a degraded mode all right drives plugged back in let's hit refresh online it's already started syncing it and pretty much happens really fast the system is back online and its drive is now synced up with the other drives so those type of scenarios are actually really easy so if you remove a drive you know even as production it's only got to re-sync that drive based on whatever data changes happened while it was in disconnect mode that's why i made some minor changes but they were so minor it really it synced up really fast because there probably wasn't much in that particular folder what about the next scenario when the drive fails and it's not just a loose cable it has a catastrophic failure so same thing again here we're just going ahead and unplug the drive now i actually have here another drive so let's plug this one in so we'll take this drive and i got to get some power over to it all right now we have a different drive plugged in and we'll do a refresh over here now it sees that that drive was removed and even though another drive was plugged into the same cable zfs isn't looking at physically what cable something was plugged into in order to be able to replace the drive cfs actually goes and determines like if this was part of a pool and it wasn't so it says well you added a drive but it wasn't a drive we expected therefore this is where manual intervention comes in to actually replace this drive so we go here we're going to choose replace only disk that's extra in there now i'm using the force command because i think there's data on this drive force means don't ask questions don't do anything just erase it just i'm not worried about the date on the drive but if it does detect or something else on there that's why it has a force option so go ahead and hit replace disk replacing formatting successfully replace disk and well it took just a very short amount of time scanning i was going to go through the re-silvering process the re-silvering process well that's it it's now taken the data from the other drives to rebuild the pieces of data that need to be on this particular drive and we have successfully replaced the drive into this pool now this question seems to come up quite a bit can you expand the pool and all that that's out of scope of this i have a video i'll link to about expanding v devs where you can add another v dev but no you can't like as i stated before just keep popping more drives and you can only replace the existing ones now one more note these other two drives happen to be the boot pool and i'll show briefly where those are i've covered how to add into an existing in another video about backing up freenas that i'll leave a link to of how to replace one of the boot pool or add to a boot pool that doesn't have one but the concepts work exactly the same but when you look at the boot pool it's under system boot and not under the zfs pool then you go to actions and you'll see boot pool status but the commands are the same you can remove one of course because it's just a mirror but there's the replace you would choose the extra member disk and submit and that's how you would replace one if it was one of the boot pools that had gone bad so kind of the same concept just a different location for those you wondering how to boot pool so hopefully this is helpful and hopefully you're not watching this in a panic attack wondering what to do um it's actually provided you have your array set up and with enough resilience that losing a drive isn't that big of a deal it's really not that big of a deal to replace a drive you can go in there get the drive replaced now one thing of note this particular board is older and doesn't always like hot swapping so it sometimes requires a reboot when i swap drives around but it's inconsistent most modern boards are going to support hot swapping especially if you're using any enterprise equipment and when they do support hot swapping it's no big deal all this can be done live without any real you know takedown so if you're actually working on a production system and you don't have the um well let's see you don't have the availability to shut it all down yes this can be done in real time that's one of the reasons the hot swappable racks are so popular and because drives fail especially under heavy load or just over time or just for no reason at all occasionally it feels like that's why they usually have the you know easy to get two bays in the front alright and thanks and thank you for making it to the end of the video if you like this video please give it a thumbs up if you like to see more content from the channel hit the subscribe button and hit the bell icon if you like youtube to notify you when new videos come out if you'd like to hire us head over to lawrencestems.com fill out our contact page and let us know what we can help you with and what projects you'd like us to work together on if you want to carry on the discussion head over to forums.laurensystems.com where we can carry on the discussion about this video other videos or other tech topics in general even suggestions for new videos they're accepted right there on our forums which are free also if you'd like to help the channel in other ways head over to our affiliate page we have a lot of great tech offers for you and once again thanks for watching and see you next time
Info
Channel: Lawrence Systems
Views: 29,481
Rating: undefined out of 5
Keywords: lawrencesystems, freenas replace failed drive, freenas, freebsd, nas, zfs, network attached storage, open source, how to, storage, freenas (software), failed drive, ixsystems, truenas replace failed drive, bsd, home server, freenas replication
Id: TvaK2I3LY68
Channel Id: undefined
Length: 8min 36sec (516 seconds)
Published: Sat Jan 30 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.