Why you DON’T want a 20TB Hard Drive

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

These WD SMR drives are host-managed SMR, not drive-managed SMR: Seagate's "archive" and now-endemic unlabeled drives, nor the 2.5" drive-managed SMRs.

Linux has some beta-level support for host-managed SMR file systems AFAIK, but I wouldn't trust a single copy of any data files to it yet.

πŸ‘οΈŽ︎ 8 πŸ‘€οΈŽ︎ u/babecafe πŸ“…οΈŽ︎ Jan 02 2020 πŸ—«︎ replies

Personally I pretty much agree with Linus' interpretations but these drives still could be useful for long-term storage arrays. You just have to use a topography without distributed parity and make sure any dedicated parity drives in the system are non-SMR. This will optimize performance as much as is possible with this hardware.

πŸ‘οΈŽ︎ 14 πŸ‘€οΈŽ︎ u/pmjm πŸ“…οΈŽ︎ Jan 01 2020 πŸ—«︎ replies

It's a mix of a click-bait title and some "Linus points" that are as usual and expected not what the "other normal" people would care about (and I'm vastly enlarging the usual "normal" definition to include most DHers too).

He starts by acknowledging the speed isn't an issue at all during normal use but then it's a big problem:

  • when doing data migrations. While raw bandwidth does scale up with the number of drives the "logical" migrations (as opposed to I don't know, doing dd on a bunch of disks in parallel) aren't really speed-scaling with the number of drives and more drives usually mean more headaches. Last time you bought one or a bunch of large disks, lets say 10TBs you think you could migrate faster if you bought 5 times more 2TB drives (even if they -assume- had about the same raw speed)? Only in very theoretical cases, in practice they would've been a nightmare.

  • when doing RAID rebuilds he's speaking about biting his nails for the whole day while doing a recovery on a 160TB array in RAID-5 (hypothetically on the 160TBs with the new 20TB drives but he's done it on smaller ones). Well, of course but he's Linus and this shit of RAID with no backup is his hallmark. It has nothing to do with speed or even with losing one disk at all. He had a PRODUCTION WITH NO BACKUP array messed up and it wasn't because of (lack of) speed, in fact they were very expensive SSDs and NONE actually failed!

πŸ‘οΈŽ︎ 6 πŸ‘€οΈŽ︎ u/dr100 πŸ“…οΈŽ︎ Jan 02 2020 πŸ—«︎ replies

As someone who only updates roughly 100GB/month to my server, these would be fine - though I can still see why everyone dislikes smr.

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/alexnelson689 πŸ“…οΈŽ︎ Jan 02 2020 πŸ—«︎ replies

Looks like these drives are Host Managed, so I'd assume some optimizations could be done system side for random IO and/or TRIM.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/SirCrest_YT πŸ“…οΈŽ︎ Jan 01 2020 πŸ—«︎ replies

tl;dr the video - Current drive write speeds are too slow, and HDDs still cost too much money. To make this size drive viable, write speeds need to increase significantly, and prices need to come down a lot.

Assuming 100 MB/sec transfer speed:

1 min = 6 GB

1 hour = 360 GB

24 hours = 8,640 GB

So on average, 1 TB (931 GB) every 3 hours.

I'm currently in the process of migrating and organizing data across 4x 6TB WD Reds (2 of which are SMR). I pretty much see a solid transfer speed of about 100 MB/sec when moving data. I only transfer about 100-200 GB at a time, and all the files are 3-5 GB mp4 video files. All the drives are connected via HBA, no raid.

If I needed to rebuild/clone a single drive in one sitting, it would take ~18 hours for 6 TB.

Seems like a lot of people in this thread are missing the point of this clip, and instead are getting hung up on the SMR bit. If you had a array/storage pool that is made up of 4x 20 TB HDDs, it would take ~ 60 hours to rebuild a single disk. That's a long time to hope nothing else breaks while transferring data. If the HDDs were cheaper though, it would mitigate the risk since you could literally just throw more HDDs at the problem to solve it. But given that 20TB HDDs are ~ $500, thats a very expensive solution.

I had this same fear when 1TB HDD appeared on the market - when I got up to about 9 drives, I made the switch to 6TB (@ $300 a drive) - bought 2 at first, then another 2, and just recently, another 2.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/Halzman πŸ“…οΈŽ︎ Jan 02 2020 πŸ—«︎ replies

Linus needs to close his mouth more often.

πŸ‘οΈŽ︎ 2 πŸ‘€οΈŽ︎ u/Duamerthrax πŸ“…οΈŽ︎ Jan 02 2020 πŸ—«︎ replies

would these be good for an unraid setup? data is only written then read

πŸ‘οΈŽ︎ 1 πŸ‘€οΈŽ︎ u/[deleted] πŸ“…οΈŽ︎ Jan 02 2020 πŸ—«︎ replies
Captions
why don't we talk about Western Digital and their 20 terabyte hard drives here is and we can we can we can talk about sort of the the technical details a little bit later so this is posted by Andre Arjun l-arginine our genie Andre our Janu on the forums and we'll go through this stuff later is anybody actually asking for a 20 terabyte hard drive you are aren't you Am I I don't know you just bought a bunch of hard drives recently oh wait no not hard drives SSDs are these hard drives or SSDs are hard drives oh is this more like no that can't be right do do data centers use hard drives or SSDs they use both okay so is this more for data centers and so again here's the thing um okay so you know what fine let's run through let's run let's let's run through this so they're sampling enterprise OEMs with 20 terabyte ultra star DC HD 650 hard drives utilizing shingled magnetic recording and helium-filled enclosure so the helium is to improve the efficiency of the drive as the platter spins because there's more air resistance although it's helium resistance now it's not air and shingled magnetic recording takes the tracks that are laid out in circles on the drive platter and actually overlaps them a little bit we've done some really good content on this in the past basically what it means is that there's no penalty while you're reading it means you can squeeze them closer together there's no penalty while you're reading it but if you need to write it well here give me another single perfect thank you if I need to change this middle shingle now I need to also delete this one so that I can get at it change it and then put this back so there is a write performance penalty with shingled magnetic drive so Seagate has used jingled magnetic recording in their archive series in the past and I've actually used them I have found that the right penalty particularly if it is a write once read many application doesn't really matter that much but your mileage may vary if you're gonna try and write a ton of and I'm dated to it like sequential is okay if you're just archiving like a bunch of video footage which is the only thing I was using them for and the time we hadn't deployed hey no big deal but if you're gonna hit them with a bunch of random i/o you're really gonna suffer because every single one of those bits that you need to flip is gonna have a whole bunch of them that you need to then up and move somewhere else flip it and then rewrite it's it's a total mess and really it's actually kind of akin to the way that SSDs write which is part of why writing performance on SSDs is so much worse than shingled mega drives yes guys I understand that they are functionally very different on a hardware level I just mean conceptually they are similar in that you might need to change a very small thing but you actually have to erase and move then change it and then put other stuff back once you're done the new drives utilize nine platters if really earth what a harddrive platter is it's like a little metal disk little metal disk so traditional hard drives used a single platter and you could have recording space on both sides yeah now imagine this having nine you've got nine of those packed until a no wonder they need to fill it with helium in order to keep the air resistance down low enough to make this thing even reasonably efficient so if one platter goes is your whole hard drive go no if one platter goes you can in some cases we actually did a great video about this where we tore drive savers so someone like drive savers might be able to go in replace the the reed head so that's like this arm yeah that stick so overall that all the different platters they might be able to replace it and then put on a new one because usually it would be that that fails not like the platter itself that fails cuz it's just it's just magnetic ones and zeros on a metal disk there's not as much to fail unless the the reed head were to fail spectacularly and crash it's called a head crash if that were to hit it well it might scratch a bunch of the data but you might still be able to get a bunch of it back assuming that you can get a new reed head in so anyway where was I going with this right you've got nine of these monsters you've got this immensely complicated apparatus that's required to read off of them what's up so it has a two and a half million hour MTTF mean time between failures so okay that's really kind of where I was eventually gonna make my way to with this so they're advertising them for their sequential right use case because yep single thing they're writing it's not fast enough for random writes so here's my problem the only way that we can make hard drives faster is either by shrinking the size of the bits on the platters or by spinning them faster really that's that's it because if you think about it like the way that a record player works yeah right the only way to make the song go faster is to put the notes closer together or to spin the record faster yeah that's all that you can really do now Seagate actually is working on a dual actuator technology that would allow two separate read and write operations to take place simultaneously that's kind of cool then not the nine that are on these 20 terabyte drives they would all move together so if you need something on like the top platter the bottom one can't go get something from somewhere else well I'm getting to that okay so on paper this is great 20 terabyte drives would mean that in something like a sixty drive story nadir let's just do a quick calculate I don't I don't what why is PowerShell opening wow I hate you so much okay so 60 times 20 is something like a 60 Drive store Nader in a single for you rack with just 60 drives you could put 1.2 petabytes of data even if you're giving up let's say one in every six Drive as you might to something like you know raid z2 or you know some some kind of parity data protection you could put a petabyte of data in a single for you rack that's incredible the problem is that the only way to speed up the performance of these drives is complicated technology like dual actuators remember these tiny three-and-a-half-inch enclosures spinning it faster which adds more it adds more drag as the platters spin it adds more complexity because all of a sudden everything has to be done to within failure right that's exactly where we're going with this so hard drives their capacities keep getting bigger but their performance is stuck because we can't spin them faster and we can't really put the bits meaningfully any closer together all we can do is add more platters at this point and so what that means is that a 20 terabyte drive at what is pretty typical you know let's say like a yeah these are shingled so let's say it can right at 200 megabytes a second alright so let's do let's do some quick napkin math we can do what just happened hold on 200 megabytes a second times uh hold on a second 60 seconds okay so we've managed well what can I do for you wants meat okay yeah lots of shirts that LTTE store comm do we even have stock of anything right now Oh black and gold water bottles are back in stock right Oh stealth water bottles okay all right yeah we had a great holiday season thank you guys so much by the way on LTT store okay so in a minute I can write 12,000 megabytes okay so let's do times 60 again in an hour I can write 720 thousand megabytes so times 24 so let's say in a day alright in a day I can finally write almost the entire surface of this drive so do you remember I think we actually talked about this back when we were in university although I think it was in the context of USB thumb drives okay okay this is a long time ago to try to remember you're gonna remember but okay the point is guys when we like probably I'd say around grade 10 USB thumb drives were like the biz they were so cool remember that I don't know about grade 10 maybe and I know it's a university for me okay well it was it was I remember when I got my first what was it it was I think it was 16 megabytes I got my first USB thumb drive so you can remember how many megabytes are in your very first thumb drive I can't remember my birthday I remember your birthday now I forgot her birthday the first year we were dating yep I have never I have never heard the end of it as you guys could see okay so anyway back to USB thumb drives my first USB thumb drive was 16 megabytes and I was jazzed because I didn't have to use floppy disks to carry my assignments to and from school anymore like wow and I could even put files that weren't just text like if I wanted to share a cool song that I definitely got totally legally on Napster with a friend I could bring that to school and give it to them and they could transfer it to their thumb drive and it was like oh I can take this to my computer at home and put it on my my mini disc player whatever the case may be but around the time that I met you we started to run into this problem so USB 3 hadn't come along yet to fix it but USB 2 fun drives we're getting to the point where the capacity was so large that there were still some kind of reliability issues with them especially when you were using them on older systems but you couldn't really count on them to not have some kind of a problem if you needed to get all the data off of it in an emergency or something like that like it wasn't really safe to store that much data on this tiny thing and I remember we had this conversation do you remember this at all no yeah I knew you weren't listening it's fine I listen to lots of other things it's not this part yeah yeah that's fine so the point is the point that I'm getting to and we solve that problem with faster interfaces so now you can use USB 3 or Thunderbolt 3 to read and write off of these multi terabyte portable drives and we kind of fixed it by just going really really way faster but hard drives in that time hard drives have maybe doubled in speed except that when I met you 500 gig hard drive was pretty sweet and now we're talking drives that are literally what 40 times bad size is that right am i doing the math right can you open with the math here 20 terabytes you got 500 yeah yeah yeah 40 times the size so here here's my here's my rough what is this why is this full screen here's my rough napkin ah here guys ah let's just make sure that nothing you mean there's my rough document so in 24 hours I can write not quite that entire Drive so at what point our hard drives just not practical for this kind of storage even if they are cheap well we might be there unless they're really cheap unless you can literally buy two or three of them for the price that you could store that data on something else and then you can just have it like duplicated and in the event that something fails while you're trying to recover it you just grab it from the other one copied over to I'm like it starts to become this very complicated data management scheme so hard drive manufacturers are still pretty bullish on their future they think the hard drive is going to be around for a good long while but I think that for general consumers who can't afford these complicated data management schemes we might be getting close to the end because I don't know about you guys but I don't necessarily want even in my home Nass I don't necessarily want a drive that remember if I'm copying this over gigabit is now gonna take more than two days I don't necessarily want to drive that in the event that I'm dealing with some kind of data emergency is going to take two days for me to know if it's gonna be okay and the like especially crazy thing about this is if you talk to some datacenter clients some enterprise clients they're not even asking for this Backblaze famously uses I think it's shoot don't quote me on this it's either four or six terabyte drives Backblaze drives which ones do they use and the reason is that they've found that from a from a cost per terabyte total cost of ownership perspective the newer models actually have not had a great advantage for them and they have preferred the better reliability of these less complex drives even if it means that they ultimately have to deploy more servers over more cabinets in a larger area in the data set in the data center let me just have a look so drive counts so they are using up to 14 terabyte drives now but I can't remember which one was the bulk of their storage it looks like they're using 12 terabytes now so maybe I maybe I'm mistaken on that one they did have higher failure rates with those twelve terabyte drives than they did on here guys I can pull up their latest reliability report they actually publish this back in it looks like March remember to though guys that C Drive days oh wow actually those Toshiba ones are looking pretty good single failure so what are drive days the number of times it was used or like number of days it was used I think that's the total date like it's how long it's been used times the number of drives would be mine would be my understanding here so yeah it might be that that was might be that was outdated information so either way it's gonna come down to this complex relationship between how much the drive costs versus how much it costs for the thing that it needs to go in versus how often it fails and how complex and disastrous it is when it fails some people are like what about SSH deep so SSH D is basically a small SSD that's kind of like soldered on to a hard drive and I think you may be sort of misunderstanding the purpose of sshd if you think that it's applicable here so any hard drive whether it's got a small SSD cache on it or not is ultimately limited by how fast it spins and how close the bits are and sshd takes frequently accessed data and caches it on to an SSD so that in the event that there's like a game that you always play instead of grabbing it from the magnetic platters it can grab that off the SSD and you feel like it's super responsive now usually these these SSDs are so small that something like a game wouldn't end up being cached it's more like files that windows frequently accesses stuff like that or maybe you know components of DirectX or your graphics driver that it would access as you're loading up the game so it can accelerate almost anything but it doesn't help if the data doesn't fit in the cache so when we're talking about writing to the entire surface of a drive it would do meaningfully nothing and when you're talking about reading from the entire surface of the drive it's negligible again so it wouldn't really help with that who writes to an entire 20 terabyte hard drive over one or two days though um well okay so here's an example we're rolling a new petabyte project so I understand when you're during deployments okay I understand during deployment but that only happens theoretically once or twice like for day-to-day use isn't this okay this is a great way to like if you don't have a lot of space in your house and you are slow mo guys or whatever and you need lots of space so another scenario and this is the dangerous scenario and it might not matter today but it might matter five years from now here's the scenario you've got eight of these things spinning in a little enclosure like this sitting on your desk one of them dies all right so that's a total of eight times twenty is a hundred and sixty terabytes of data right that right hundred and eighty terabytes help me know 160 160 terabytes of data all of it is now at risk because what does of 20 terabytes of data that now needs to be restored to another drive now during that restore operation I pull that failed drive as soon as I possibly can I put in a new one now I need to wait two days every one of those drives has to be read entirely and one of them has to be written to entirely a process that could take over a day long sir we're going and I can't sleep you can't can you tell I've gone through this before you've seen me go through this before okay maybe I don't know maybe it's just the way I use hard drives or used to use hard drives like I would write to my C Drive and everything would be a my C Drive yes if I lose a drive might like my D Drive or II drive or whatever it's fine but that was back when a hard drive might be 80 gigs so what have you got on there you got to blu-ray so so that's fine 20 terabytes you can have your whole life on 20 terabytes okay and and there's that okay there's like a psychological component to this - when people see 20 terabytes just fill it well think about it cuz we got to remember that this is from not necessarily a techie person's perspective as well when you see a 20 terabyte storage drive in your computer you might go oh gee you know I can store everything on this and people don't practice safe backup that's true I don't either so well no I I take care of it I make sure actually our stuff could be safer right now it could be safer I'm just glad it's faster it was driving me crazy I would sit there and try to open a file and would take like 10 seconds a load we've been through this that was a power saving measure don't I had the disking about how like I had the disks parked when they were not being used which by the way means they're not sitting there spinning idly which I just made it so that I never used our server at home yes I saved everything I know okay so I told very safe I told the drives not yeah that's not safe at all well because I don't have your desktop backed up at all because why would I it's just windows it's utterly meaningless doesn't do anything well it's faster now so that's good yes okay so I changed it so the drives don't go to sleep you're welcome my princess I actually considered just moving the store Nadeau to home that all SSD server that's probably worth like 30 grand again but Dave got mad at me he was like you what Nate and so I brought it back
Info
Channel: LMG Clips
Views: 876,444
Rating: 4.8817048 out of 5
Keywords: linus, tech, tips, wan, luke, podcast, clip, snippet, opinion, technology, gaming, pc, hardware
Id: 8IRjFZ9xEj8
Channel Id: undefined
Length: 20min 13sec (1213 seconds)
Published: Wed Jan 01 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.