Intel Optane and a Whole lotta IOPS: A Chat with Allyn Malventano

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it's another storage video and i mean you know what what sort of crazy person is gonna have you know just all this storage it's just completely completely insane are those are those p5800x optane [Music] yup so i'm joined by a very special guest allen multana you know your stuff it's exciting sometimes we've got a lot more optane stuff and uh we well basically the story is i wanted to run a whole bunch of experiments i've only got the two uh vastly superior let's say uh obtain drives and he needed a few more yeah we needed to run some more tests we wanted to put some you know fabric pressure testing uh in order to like figure out how many threads does it take to load up a single drive in a worst case scenario and what is the absolute maximum number of iops that we can see on any platform and we have the highest end platforms to test both uh epic 7763 uh three or four workstation 39 75 wx i think it's the 36 oh well we did 32 and 64 quarters yeah so the 39.95 as well and xeon platinum 83 80s 80 cores of madness and uh hitting 10 million iops takes about five drives that's insane ridiculous admittedly that is a best-case scenario you know intel's rating for these drives is what like one and a half million iops yeah it's around there so but you can you know in a in a if you create some artificial workload scenarios you can actually achieve more than 1.5 million iops which is really pretty impressive i mean you contrast this to nand flash and you know we've done reviews of other nand flash devices and you know out of the box it's like one and a half million iops and then you put more than 20 percent of the capacity on there and then it's like 600 000 these don't care yeah they don't they really don't they don't care especially if you do full span i o like all day long just doesn't change all the way up and it just it doesn't matter it doesn't care yep it is really exciting there you know i see every time i'm gushing about optane i see a lot of people ask questions about things like well you know if you have efficient caching doesn't that make a lot of the need for opting go away and yes and no so for like just raw like i'm gonna recompile open embedded yeah you can make a lot of that go away with ram caching and just throw more ram at the problem that kind of thing the problem is that not all the software engineering has been done in all the scenarios where there is actually a caching mechanism there yep one of the other things is intel spdk so like i've been messing around with sbdk a little bit and spoiler alert the hard limit in the linux kernel is around 10 million iops at least on modern hardware yeah because the contact switching and the spin locks the i o and the interrupt handling or pull cues you can't you can't throw more hardware at it and make that go away yeah no and so what intel did was they ripped out all of that and replaced it with something called spdk and it gets rid of a lot of the layers yeah it gives you more of a direct it's more like interfacing with memory really because there's less overhead there's less you have to think about and it really is a computer science problem it's a really interesting computer science problem um storage is fast enough to where yeah everything else becomes a bottleneck yeah the databases are probably the easiest way that you could explain it so like for you non-computer scientists in the audience when you're working on a database a database is a huge complicated spiderweb black box of insanity and all modern databases implement something called a transaction log and if something unpredictable happens there's a software bug a power outage something you can shatter the entire structure of the database where that it doesn't make sense anymore yup a transaction log is a ledger of all the stuff that has been done to that database since the last backup you can take last night's backup plus the transaction log and get to where you're supposed to be or troubleshoot the bug because databases are that are that critical and if you really dodge your eyes and cross your t's because optane is so low latency and because of some of the mechanisms that it offers from a hardware perspective this is more to do with dimms than in nvme but you can do this on nvme for you know a more ghetto fabulous version you can store the transaction logs on optane yeah and a lot of the architectural shenanigans that happen in a database go away and so you end up with much better database scaling because you're able to make certain architectural changes architectural differences and these same kinds of things on nand is really hard to do um when you look at competing nand technologies things like the samsung 983 they have to implement it with power loss capacitors and onboard dram and that introduces it's like well it's okay in most scenarios but it's not okay in every scenario this you don't have to think about that it's just going to do it yeah for nand usually uh your right latencies if there's always not a lot of a bunch of load on the device your right latencies will usually look pretty darn quick like they'll almost they'll almost look like you would think oh it's almost like it's going to ram or something like it's super super low technically it is right it's going through the controller and to the input you know ram buffers on like basically sdram yeah uh you know on the uh on the dies of the flash right so it hasn't actually moved from that into the flash yet in that moment but it's told the computer yeah that was written out it's like well it wasn't really right and so for these if you compare like a like a brand new modern gen 4 uh you know nand based ssd the the qd1 like right latencies they'll actually be a little lower than these yeah because it's going to ram but when but when the octane drives respond to the host it's already committed yeah it's actually it's been written yeah it's done you could lose power right then yeah and it's it'll still be there when it comes back right the level of engineering that i mean even even the intel like the p 45 11 m.2 that i have have huge you know probably ten percent of the board area on those 110 millimeter m.2s is power loss capacitors and they have you know this incredible engineering to be like a computer within a computer so that when the main computer loses power they've got enough backup power that they can dot the eyes across the t's yeah but that is a much more elaborate and complicated setup than just the medium is that fast yep and that's also really exciting from the dim side of things because the dim side of things ups the throughput even more it's like yeah it's not as fast as traditional memory but when the computer loses power and it gets power again later it can just resume right where it left off yeah and the latency of the dims is it's insanity yeah it's like 300 350 nanosecond latency to the dimms compared to you know i mean those are like six seven microseconds so it's almost like another 10x on on top of that and it's you know it's the same media it's just all the other all the other hops you have to make to go across pci versus just direct to the cpu yeah you know in in some ways where we are with the software engineering is it requires less software engineering for a better result because all of the layers that we've added to protect ourselves from the crazy go away if you have this additional storage layer yeah but also uh because of that all of the traditional existing stuff needs more software engineering to take advantage of that or more software engineering to take that advantage of that as some type of like caching or middlewire layer because if you just add it as caching then you've also got the latency of like driver overhead to deal with that whereas you can just store it on this you don't have you don't have driver overhead you don't have anything to think about you don't have a lookup table to figure out okay is this something i should put in cash or should i write this to the device that requires computational overhead so funny it's like decades of worth of engineering efforts spent on something that like imagine if that came to be like 20 years ago yeah it would be computers would be completely different yeah by now right people will just be like you have to have this and say oh okay well not only that but everything would be kind of like designed around it right you would just you know instant on would actually be a thing right because just all your stuff is already there you just turn it on it just starts executing right off of it like that those sorts of things could happen it would be a return to the way it was in the 60s where like the big main frames they had the execute bit oh my pi dp11 i can hit a button on the front of it and halt execution change the state of the machine yeah and then flip a button and resume execution that's how it was in the 70s it was like oh the machine lost power it's like well the the information would be retained in the bits for hours after power loss so like if you only lost power for 10 or 15 minutes you just turn the machine on and then you start executing again and it's fine i am a bit jealous of that thing over there i need i need to get me one of those yeah so we've had a lot of fun benchmarking uh intel 8380s and got some surprising results and we've got to uh we've got to we've got to do some homework there yeah and then we've also tested you know four or five octane in you know threadripper workstations and the thing that my main takeaway from this is that you know there's not really there's really pretty severely diminishing returns much past you know four or five octane devices as long as you're not using something like spdk on top of it right because between the spin locks and the software overhead and that kind of thing you really lose a lot well it's gonna i've got some content planned to go over um some of the things like gpu direct because i was doing some testing gpu direct and the numbers there are just as bananas because without the cpu doing anything and it's just like oh the video card could read directly from storage how fast you can load a texture changes dramatically it is completely insane so you add something like this and it's like well it's not quite as fast as main memory but it's not ephemeral like main memory it's like oh i need to pre-load the game texture no no it's just there it's just yeah you just load it good to go that is cool stuff coming all that direct storage stuff and everything microsoft's working on that yeah just the little things little inefficiencies will snowball into something that is just completely untenable see also 10 million iops limit in the linux kernel now you can go a little beyond that like with tuning and really like you want to dive into it i can squeeze 15 million iops out of a single like a single socket but you know when you get the two sockets doing weird things start happening yeah yeah and uh and that's some of some of the uh some of the stuff i think that we're seeing but you're gonna have to stay tuned for that i've got a write up with some more benchmarks and some more fio scripts on the forum that you can check out i really like braking systems just to see like okay there's so much storage here that the bios has locked up yeah yeah or the whole system like there's that weird thing with md that you and i have been trying to get to the bottom of right like only if you have fast enough storage in the system and enough load to throw at it does only some particular types of ways of making an array in linux not just like crash the kernel or anything like that no just hard lock yeah it's not like you just don't even you don't you don't get any error message you don't get nothing it's just like system request doesn't work nothing and the thing that we reproduced earlier uh the ipmi was working initially and then the ipmi hung yeah and so i think we accidentally overwrote like memory space that ipmi was using or something because the ipmi was like okay here's a login form then you log in successfully and then it was like yeah and it was like you couldn't even reconnect yeah like i've never seen an out of band management like that just become in-band and scrambled and like you can send it commands i mean it's connected by the message box to other things but yeah something weird is going on there i don't i don't get what was going on and then the other instance where i mean i actually scrambled the bios or scrambled something in such a way where i couldn't even reboot the system i had to reflash i did that too but i can't i can't reproduce it reliably i can get this this needs a lock but at least one time it was like oh i can at least get the i can reprogram it where five layers are deep on the obscure configuration like bleeding edge leading edge cpus and platforms and storage and it's a leaky abstraction really really leaking so it's a pretty exciting stuff when you when you think it through and you do all that kind of stuff so be sure to check out that forum thread look for that and look for more other future videos with spdk but if you also have any questions about super insane fast storage and enterprise storage and it's like hey does this make sense for my workload i don't know but we can figure it out let us know maybe i can get him back and we'll be like let's solve your problem he keeps pulling me back in well there's so much storage happening what else do you expect that's true [Music] [Music] you
Info
Channel: Level1Techs
Views: 10,217
Rating: undefined out of 5
Keywords: technology, science, design, ux, computers, hardware, software, programming, level1, l1, level one
Id: 9Gu_rT8N0-U
Channel Id: undefined
Length: 13min 45sec (825 seconds)
Published: Fri Aug 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.