Memory, Cache Locality, and why Arrays are Fast (Data Structures and Optimization)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

tldw: The video is about L1/L2/L3 cache misses.

Not declaring/filling arr in that first screen (where we're supposed to guess why one loop runs faster than the other) was quite misleading. I kept thinking "this shouldn't run at all". Or, if arr is defined but empty, both loops should be equally fast.

👍︎︎ 10 👤︎︎ u/MotleyHatch 📅︎︎ Jan 11 2021 🗫︎ replies

This isn't just some basic video, this has to be the best educational video I've seen on youtube, that guy deserves a reward or something

👍︎︎ 1 👤︎︎ u/Curious_Boy42 📅︎︎ Jan 11 2021 🗫︎ replies

Captions

why does the code on the top run 10 times faster than the code on the bottom watch this video and find out today we're starting a new series on data structures and optimization and we'll start by talking about arrays or the array data structure having an understanding of what's happening under the hood is super important for performance so we'll be going into depth on what arrays are the cool things your computer does to make things faster we'll talk about why arrays are fast and when they're not and if you stick around to the end we'll talk about how this fits into what's coming up next what are arrays to put simply they're a way of storing several items at once typically in code you'll have a situation like this where you want to store some random crap i might want to declare some items like float prices equals 199 299 etc etc this is the c plus syntax or in javascript it'll look something like const price equals 199 299 you get the you get the idea now here in the code prices is my array and i've got three items in the array and generally the property of arrays is that you can look up individual elements in the array by their index typically arrays are zero indexed meaning the first element starts with index zero so if i want to refer to the first element its price is at zero and then the second is prices at one and so on what does this have to do with memory so now when i talk about arrays i'm going to be talking about them in the sense of c plus or c sharp or a language like that where they're the exact same type of items be it an array of floats integers or complex structs and the memory is contiguous so what's a contiguous array when we're talking about a contiguous array what we mean is that the memory for the array is all together in one big lump if you're not quite sure what that means recall how memory works now this is a simplified view of how memory works on your computer you've got this giant block of data i'll draw this big rectangle that represents memory on your computer and it's divided up into billions and billions of these slots that fit a single byte each slot has an address which is incrementing from one to the next in the same way that houses on streets have addresses which means that every byte in memory is addressable by its address when you allocate a variable what you're doing is allocating space for your variable somewhere in memory be it on the stack or heap for example if i'm allocating space for a single 32-bit float that's 32 bits or 4 bytes so we get this little block of memory here so a contiguous block of memory would mean that let's say i allocate a big chunk of memory it would mean that all of the bytes for the memory i requested are consecutive with no gaps between them and non-contiguous would mean that it's fragmented between different places and memory so maybe a little here and a little there with spaces in between it's the difference between having a coal cookie or a fragmented one all good so far let's talk about some of the cool things your computer does for you but to do that we're going to take a step back and walk through a real world problem let's say that you're my assistant i work on big important things and i ask you to look up something for me let's say it's a chapter in a book so you head to the library look up the book copy the chapter and bring it back to me job done then i ask you for something else maybe the next chapter in general if your job is to head to the library and get crap for me how can you save some time there's two easy strategies you can employ the first is instead of just copying the chapter i want you could have just checked out the entire book that way once you get back to the office if i ask you for something else and it happens to be in the same book well job done you get it back to me in a fraction of the time this is what the l1 l2 l3 caches are for another thing you can do is try to predict what i will want next if you can see that i'm binge watching seasons of rick and morty and i'm on season 2 you can predict i'll need season 3 really soon and get it before i even ask this is prefetching the l123 caches when you go to buy a computer for example let me head over to the apple website and i'll just click on the macbook pro option i don't actually own one of these they're super expensive and i'm super cheap but i like looking at them anyway here in the tech specs section you can find mention of the size of the l3 cache over here on the amd website i can find mention of the l1 and l2 caches as well so what are these and how do they work most modern cpus come with a bit of built-in memory in the form of these caches and there's often an l1 and an l2 cache and usually an l3 cache as well this is basically super ultra fast memory that the cpu can use to save things when you do a read from memory you don't just pull a single byte or whatever it is that you asked for from memory what actually happens is a whole sequence of operations starting at the cpu it goes and checks the l1 cache then the l2 cache then finally the l3 cache looking for the data you're requesting each of these is successfully costlier to fetch from but also each of these is successfully bigger and of course they're all way way faster than getting it from main memory and if at one of these stages you get a cache hit like let's say the data was in l1 already or maybe it missed l1 but you hit l2 great you return the data and move on with your life if you get a cache miss that means that you have to reach all the way out to main memory which is pretty costly on the order of at least 100 times more costly than having it immediately available at that point instead of just pulling in that small bit of data you asked it pulls in what's called a cash line which is 64 bytes which is why on subsequent requests you may get a cash hit you see the whole 64 bytes was cash somewhere and so you don't have to request it specifically now but wait there's more remember i mentioned that you might also try to predict what i'm going to watch next based on what i'm watching now well guess what the cpu does for you there's a prefetcher unit that fills that exact same role it lives sort of between the cpu and the caches looking at what you request from memory the exact details of how any given hardware prefetch is implemented is totally vendor specific and obviously complex but the gist of it is that if your memory accesses follow an easy recognizable pattern for example you're iterating over the array one by one or with a constant stride that works too then what happens is the hardware can pull memory into the cache that you haven't specifically requested yet and when you do use it cache it what does any of this have to do with arrays if anything we're almost there but first one last quick tangent dynamic arrays these are arrays that can grow to accommodate more items and these are just implemented really simply by allocating more memory than you needed to begin with and keeping track of where the end of the array is that's all there's still contiguous chunks of memory and we'll be talking about dynamic arrays at this point they're practically the same thing for our purposes okay again what does this have to do with arrays arrays are contiguous chunks of memory cpus love operating on contiguous chunks of memory which is why operations like say iterating over a list they're super fast pretty much can't give the cpu a better scenario accessing elements randomly is also pretty fast accessing a random element just involves a memory read since you can calculate the address trivially from the start of the array although you'll probably incur cash miss but that's not a big deal and appending items to an array assuming that you have space or removing them from the end of an array is super fast it should be somewhat obvious that it just involves copying or removing a single thing so what sucks it's not particularly difficult to run into scenarios where arrays are pretty bad finding an element in an array is slow since you kind of have to check every element one by one this might not be a big deal when you have a small array but imagine you have billions you want some way to do it faster so finding sucks also inserting or removing items from anywhere except the end it gets progressively slower and slower the closer you are to the start of the array and it's not hard to see why here's your array as a big rectangle and i'll just go ahead and dump a bunch of numbers in there and now you want to remove the second last item well that's not the end of the world you simply copy the last item into that spot and then mark the array as being a little bit smaller job done but if you have to do an operation at the beginning you can kind of see where this is going let's say you have to remove the first item well this is going to suck because i have to move every single item in the array over by one how do we make this better this is where more advanced data structures come in but it's not actually possible to understand more advanced ones without a solid basis in arrays funnily enough they're foundational for example hash tables these are often implemented as a combination of arrays and linked lists so yeah you might be able to understand what a hash table does but not how it does it anyway we'll follow up this with linked lists in the future so stay tuned for that hope you enjoyed this and learned something in the process cheers

Info

Channel: SimonDev

Views: 218,349

Rating: 4.9099698 out of 5

Keywords: simondev, game development, programming tutorial

Id: 247cXLkYt2M

Channel Id: undefined

Length: 9min 38sec (578 seconds)

Published: Mon Jan 11 2021