WHY IS THE HEAP SO SLOW?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in my previous video we learned about the stack a memory region where we can keep data organized and compact keeping data on the stack can improve the performance of our programs but it also has limitations because data on the stack is compacted we cannot accommodate dynamically sized data for example if we want to add elements to an array we might have to overwrite other values stored in the stack additionally the stack is limited in size which means stack overflows can occur if we attempt to fit very large data on on it in this video we will explore the solution to these problems hi friends my name is George and this is core dumped today we are going to learn about the Heap a memory region that is very feared but it exists because is necessary for the more experienced people please consider the information in this video is oversimplified for easy understanding if you are wondering why I have to clarify this well I just received a comment from someone saying I'm stupid for showing an example where I use an 8bit unsigned integer for representing an age according to this guy my example program is garbage because someday in the future people will live for more than 255 years and my program will crash then some Hacker News article is going to talk about my mistake and eventually the prime agent's great great great great great great grandson is going to react to my failure if you agree with this guy I'm sorry as an AI model I need to be retrained with this information to understand whatever weird coding cult you guys belong to anyway let's continue with the video as always let's take a look at some Concepts the first one is system calls for historical reasons primarily for security our programs should not directly access computer hardware this is why operating systems exist they serve as intermediaries when our programs require Hardware resources it is the operating system's responsibility to allocate those resources consider a hello world program although you might think this is one of the simplest programs you can write you'll be surprised by everything happening under the hood to display these two words on the screen for instance print F in C is just another function it formats the string according to the format specifier provided that's what the f at the end of the function name stands for by the way but then it passes the formatted string to other functions now depending on the implementation things can vary from here but if you follow the flow of the formatted string you'll find out that at some point it is passed to the right function which is a wrapper for a system call in this case the system call instructs the operating system to Output characters to the standard output stream usually the terminal system calls are the apis provided by the operating system and we use them to request some service only the operating system can perform what interests us about all this is that system calls can be expensive in terms of performance when a program is executed it receives a new name a process our process always has a state at a very low level is the CPU who kept that state through its registers program counter and others if you're not familiar with registers think of them as built-in variables within the CPU they are essentially the Hardware's way of representing variables right in the circuitry some registers are versatile and can be used for various tasks While others are designed for specific purposes because they hold values that change as our process runs we can say that as a set registers carry the state of our process when our process makes a system call a request is made to the operating system and because the operating system is also software it requires using the CPU to serve that request however the operating system can't simply start using the CPU because that implies manipulating the registers our process was using altering its state as we don't want to lose the state our program had just before making the system call it needs to be captured sort of like taking a snapshot this captured information is known as the execution context of the process and it needs to be store in memory so it can be retrieved later once the operating system finishes serving the system call saving the current state of a process and then loading the state of a different process so that it can continue execution is a process known as a context switch a context switch in occurs overhead because saving and restoring the execution context takes time and resources notice that apart from the cost of the context switch it takes time for the operating system to handle the system call itself this means there's even more overhead okay but why do we care about this well as you might guess whenever a process needs to request memory to the operating system a system call is needed in my previous video I mentioned that one of the of the reasons the stack is fast is because all its memory is pre-allocated at the beginning of our program's execution so there's no need to request memory from the operating system on the Fly for Simplicity in that video I couldn't explain why requesting memory is so expensive but now you know why it requires a system call and its Associated overhead let's establish clear distinctions between the memory regions utilized by our process upon compilation our code is translated into binary machine code a sequence of instructions that needs to be loaded in memory when our program is launched hence it's beneficial to identify this region of memory as containing solely instructions typically this region is called the text segment additionally we designate a specific region for storing initialized and uninitialized Global and static data then there's the stack a concept we're already familiar with notice that these regions share something in common their size is fixed don't let the stack confuse you by the way it's size never changes what changes is the amount of data stored on it lastly we Define a special memory region capable of expansion as needed this region is known as the Heap unlike the stack the Heap size can be dynamically adjusted through system calls this is what we know as the memory layout of a process the memory layout of a typical C process looks like this I consider this layout is a little hard to explain though so let's keep things simple also due to screen size and to make the animations easy to understand this is how I'm going to illustrate the stack and the Heap for the rest of the video something very important to mention is when we request memory via a system call the operating system doesn't respond with exactly the amount of memory we need instead it responds by giving us a memory chunk this Behavior has implications which we'll discuss shortly the ability to request as many memory chunks as we want makes the amount of memory that can be all allocated on the Heap virtually unlimited and that makes it perfect to solve the stack limitations the first limitation we are going to solve is the stack not being capable of storing big amounts of data we can capitalize on the fact that every bite of memory is accessible via an address since addresses are essentially numerical values we can represent them as integers which have a fixed size known at compile time instead of directly pushing data onto the stack we opt to store it in the Heap subsequently in the stack we only store the memory address in the Heap where that data resides numeric values representing memory addresses are commonly referred to as pointers in low-level languages the term is quite self-explanatory also it's important to understand that a pointer only represents a memory address and does not contain information regarding the length or size of the data it points to hence it is our responsibility to manage and track the memory sub regions within the Heap distinguishing ing between those that are occupied and those that are not in the current example after storing data in the Heap we have one utilized sub region and two available sub regions it's worth noting that storing data in the Heap doesn't always require requesting additional memory this is where the impact of the operating system returning memory chunks comes into play if there's enough free space left from previous allocations we can simply write our value there in such a scenario no additional system calls are required now we have two utilized sub regions and three available sub regions if you followed my previous video you might now be concerned about fragmentation if you are not familiar with this concept let me illustrate it for you let's say we now want to store an array that contains 10 unsigned 8 bit integers which is 10 bytes long the problem here is that even though there are in total 15 free bytes in the Heap they are not contiguous so the array doesn't fit this problem is known as external frag fragmentation in this case the only way to fit the array is to request another chunk from the operating system but to do this our program has to pay the price of making a system call fragmentation can be avoided by keeping data as compact as possible in this example if we had to organize our values from the very beginning no system call was needed now at first glance it seems like everything we have to do to avoid fragmentation in the he he is to store our values as close as possible unfortunately with the Heap that is not the case pay attention to this part what I'm about to say is key to understanding the fundamental differences between the stack and the Heap in the stack only the last value pushed can be removed as this predictable Behavior guarantees that data is always compacted fragmentation is impossible in the stack the issue with the Heap lies in its unpredictability even if efforts are made to store values close together unlike the stack there's no certainty regarding the order in which values will be removed to understand this let's consider a server application scenario the primary function of this server is to receive images from clients and then return them to the clients with a black and white filter applied This Server is designed to handle multiple clients concurrently initially during its development the developer faced uncertainty regarding the sizes of the images that clients would send to prevent potential overflows caused by receiving excessively large images that might not fit on the stack the decision was made to store each received image on the Heap when the server is deployed six client simultaneously send requests to the server the server spawns six worker threads to handle each client's request upon receiving an image from a client each thread stores the image on the Heap in a compact way but but due to variations in image sizes sent by different clients processing times vary smaller images are processed more quickly as soon as these smaller images are processed and the server responds to the respective clients the memory allocated for those requests is no longer necessary allowing these segments to become free again as you can see any value stored in the Heap can be removed at any moment making it impossible to organize data in a predictable way as we do with the stack fragmentation on the Heap is potentially inevitable but now that we understand that the Heap will contain these holes all over the place is logical to utilize them whenever possible whenever we need to allocate space for a value in the Heap we should check if any of these holes are sufficiently large because if so we can bypass the overhead of invoking a system call now we need to determine which one of these holes gaps or available regions we are going to utilize there are three common strategies to pick choose the first hole that is large enough to accommodate our value select the smallest hole that is still large enough for our value or opt for the largest available hole where our value can fit be aware that none of these strategies avoid fragmentation the choice between them depends on several factors first fit is generally faster but is not the best reducing fragmentation best fit and worst fit can reduce fragmentation in certain scenarios but they may not be as fast whichever strategy we opt for it involves a trade-off between various factors to finish this section of the video Let's organize what we've learned up to this point we'll Define two functions aloc and free the alloc function takes a parameter n representing the size and bytes of the value we want to store on the Heap internally aloc will employ some of the discussed strategies to locate an available sub region in the Heap ensuring it is at least n bytes in size if no suitable sub region is found aloc invokes a system called to request more memory from the operating system once the required space is identified aloc updates the data structure managing the heap's state and returns a pointer to the allocated region the free function is utilized when the allocated memory is no longer needed it receives the pointer to the region and searches for it in the table marking it as available again additionally if the freed sub region is adjacent to another free sub region free is responsible for merging both sub regions now if you feel like this is too much information the good news is that you usually don't do any of this by yourself in C for example all of this is already implemented in the standard Library everything you have to do when you need to use the Heap is include the library and start using it functions and that's it we've solved one limitation of the stack if values are too large to fit on the stack we store them on the Heap and On The Stack we just maintain a reference to the value we still have one limitation to address dynamically sized values as mentioned at the beginning of this video because everything in the stack is compacted values cannot grow or Shrink without overwriting other values in the comments somebody asked me why we can't just extract the necessary data adjust the size of our value of interest and then push the data back well this is indeed possible but if you think about it doing so would require allocating extra memory to temporarily store that data and then push it back to the stack this process might as well need a system call to request that extra memory and even if no system call was needed copying data around in memory takes additional time at that point the stack wouldn't be as fast anymore the hilarious part here is that the Heap itself does not resolve this issue storing an array on the Heap may still involve overwriting other values when adding elements similar to what occurs in the stack in this scenario our solution is data structures the simplest option is a linked list since pointers have a fixed size known at compile time they can be utilized as members of custom types in the stack we keep a pointer to the first node every node stored in the Heap contains an element and a pointer to the next node when we need to add an element we allocate memory on the Heap for a new node and ensure that the last element of the linked list points to the new node which becomes the new last node and that's it problem solved with linked lists we don't need large amounts of contiguous memory in fact they take advantage of the holes that appear over the Heap linked lists do have drawbacks one of which is that nodes are scattered throughout the Heap leading to decreased probabilities of cache hits if our aim is to maintain compactness in our list what we need is an array list commonly referred to as a vector in low-level programming languages the array list implementation deserves its own own video so we won't get into details here the video is getting quite long anyways so let's get into the final part why is the Heap so feared well managing the Heap is considerably more complex than managing the stack storing values in the stack is relatively straightforward because of the stack pointer in contrast the Heap operates quite differently searching for available sub regions within the Heap is a time consuming process moreover in the worst case scenario no available sub region may be large enough compelling us to request memory via a system call incurring a significant performance penalty in essence excessive allocations on the Heap can drastically slow down our program but that isn't all because of its nature manually manipulating the Heap is very error prone in languages like C there's a world of common errors related to manual memory management a world that is hidden from highlevel developers because garbage collectors keep them safe will we'll talk about those errors in more detail with animations and everything in an upcoming episode and finally I just want to clarify that the Heap can be your best Ally if you use it with responsibility when you hear someone saying that the Heap is slow that's very misleading so yeah I click baited you what's slow is the whole process of allocating memory on the Heap but once you get that memory if you're smart enough and keep in mind Concepts like caching so you can choose a good data structure accessing memory on the heat can be as fast as the stack and that wraps it up for now if you enjoyed the video hit the like button and consider subscribing if you want to learn more I'm currently working on several videos and by the way follow me on Twitter if you want to suggest or vote for which topics I should cover next also in less than two months we've already surpassed 11,000 subscribers so a big thanks to all of you

Info

Channel: Core Dumped

Views: 122,307

Rating: undefined out of 5

Keywords: Programing, C programing, C programing language, Rust, The Stack, The Heap, Memory management, Malloc, Low Level Programing, Computer Science, Developer, Coding

Id: ioJkA7Mw2-U

Channel Id: undefined

Length: 17min 52sec (1072 seconds)

Published: Sat Feb 24 2024