WHY IS THE STACK SO FAST?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in my previous video I talked about how providing the size of your variables in your code can help compilers speed up your programs I ended that video by mentioning that you might want your types to grow or Shrink dynamically at runtime but that's a problem because in low-level programming languages compilers complain when you are not explicit about sizes so you can't declare custom types with arrays unless you give the array a specific size but that means your array will have that size during the entire lifetime of your program by the end of this video you will understand the reasons behind this limitation hi friends my name is George and this is core dumped today we are going to take a look at the stack that memory region that everybody tells you is super fast and you should always try to put data on it but almost nobody tells you why a stack is a data structure that follows the last in first out principle this means that the last element added to the stack is the first one to be removed usually its size is limited so if we push too many elements and run out of space we say the stack has overflowed which may sound familiar to you although recently the site is dead due to the rise of large language models like me I mean like gp4 Anyway the stack in this video is not a software stack although there is a relationship as you'll see what we are studying right now is the execution stack sometimes called the program stack or even the hardware stack when we run a program theoretically there is nothing preventing us from writing our values anywhere in memory even though we are specifying the size of our values randomly placing them could result in wasted space to comprehend this it's crucial to consider the role of the operating system you might be familiar with the concept that the operating system always acts as an intermediary between programs and Hardware well memory is no exception when a program is executed it cannot simply begin storing data anywhere in physical memory as other programs might already be utilizing some of that space therefore before it must request memory from the operating system the operating system in turn will search for available spaces where our program can write if we try to read from or write to a memory address that the operating system hasn't allocated to our program it will promptly terminate the program's execution for safety reasons this is when we encounter the infamous error known as segmentation fault core dumped from which this channel derives its name the important part here is that the operating system doesn't allocate memory precisely to the amount we need instead it provides a memory chunk where we can freely read and write however if we don't organize our data and just place it anywhere we might face this issue where there is sufficient free memory but we can't store data due to the lack of contiguous space this problem is commonly referred to as external fragmentation in this case the only way to store our value in memory is to request another chunk if we had organized our data more efficiently there would be no requirement to request additional memory it's worth noting that asking for more memory can be considerably expensive in terms of performance as a brief preview this is why many recommend minimizing the use of the Heap we'll delve deeper into this topic in my upcoming video about the Heap so consider subscribing for more details furthermore when the operating system allocates memory to our program it doesn't track whether we utilize that memory or not its sole awareness is that the specified memory region belongs to our program if another program requires memory it will have to put its data somewhere else even if there is enough free contiguous memory in our program memory region but memory is finite so what do we do if all the memory is already in use by other programs once again the answer lies with the operating system modern operating systems are specifically designed to address this type of issue in the absence of available memory they leverage storage space as if it were additional memory have you ever seen that partition called swap in Linux systems well now you know what it does this is what we call virtual memory and it is very useful because it not only creates the illusion of having much more memory than is actually present but it also gives the appearance of writing directly to physical memory so as programmers we don't have to worry about all the underlying Concepts I could spend 30 minutes talking about virtual memory but that's out of the scope of this video however it's important to note that relying on storage to extend main memory is a double-edged sword storage is thousands of times slower than RAM and this feature that operating systems provide should be used with with caution in my previous video Someone commented that in modern days the size of our data in memory might not seem crucial due to the virtually unlimited memory available while it's true that memory appears Limitless for us as programmers it's essential to exercise caution and consider that performance still plays a significant role and speaking of performance it turns out that even memory can be a bottleneck despite the immense speed of modern CPUs fetching data from memory incurs a noticeable cost to address this manufacturers integrate a feature called cache into modern chips cache is essentially a small dedicated memory inside the processor where a copy of a memory region is stored this way when the CPU requires data for manipulation or an operation it can promptly retrieve it from the cache instead of searching for it in the main memory when the necessary information is present in the cache we refer to it as a cach hit however if the required data is not in the cache it results in a cache Miss in such cases the CPU is compelled to wait for the data to be fetched from the main memory incurring additional time once again this is completely invisible to us the programmers in this case is the hardware and not the operating system who decides what memory region is kept in Cache if we care about performance the best we can do is to keep our data as compact as possible this this way there is a higher probability that it will fit in the cache this proximity increases the chances of a cash hit this new concept of keeping data as close as possible to the CPU is what low-level people know as locality and there you have the second reason to keep data compact performance now you might be thinking what the hell all of this has to do with the stack well it turns out if we want to keep data compact a good way to organize things is using a stack let's take a look the first thing we can do to speed things up is to pre-allocate memory when our program starts this way every time we need to store some value we already have somewhere to write it since our goal here is to avoid requesting memory from the operating system on the Fly the size of this pre-allocated region remains fixed throughout the entire lifetime of our program therefore we must be careful when determining its size while there's nothing preventing us from allocating gigabytes to this region it's essential to consider the Simplicity of our program if it only uses a couple of variables it may never fully occupy such a large memory space as mentioned earlier this would result in memory wastage as the operating system would Reserve that free space for our program preventing other programs from utilizing it for this reason the size of this pre-allocated region is typically kept small the exact size depends on the compiler we are using but generally it won't exceed a few megabytes this might seem seem insufficient at first glance but if you do the math in only 1 Megabyte you can fit more than 13, 64-bit integers when our program starts the main function serves as our entry point each time a new variable is encountered its value is stacked into this designated region this is where the term the stack becomes apparent the memory address marking the beginning of the stack is referred to as the stack origin while the stack pointer indicates the current topmost data on the stack of course the stack pointer is stored somewhere it can be done in memory but modern architectures incorporate a register exclusively dedicated to holding the stack pointer value directly within the CPU consequently the processor doesn't need to wait for it to be fetched from memory or even from the cache one of the reasons why memory allocation on the stack is highly efficient lies in the constant guidance provided by the stack pointer to write something onto the stack all that's required is to fetch the stack pointer address address add one to that value and the result is the memory address where we can write more data and that's it that's all it takes to allocate memory on the stack as you'll discover in my upcoming video this process is not as straightforward when allocating memory on the Heap in that scenario we must request memory from the operating system wait for it to locate available space and due to the system not returning the exact amount of needed memory we must employ fancy strategies to avoid fragmentation and all of those are extra steps that make the process slower when a function is called all its local values are pushed onto the stack but it's crucial to remember the starting point of these local values because when the function concludes we need to reset the stack pointer to its position just before the function calls these distinct sub regions are known as stack frames when a function being called is expected to return a value such as in this example we allocate a distinct space called the return link following this we start pushing our local values upon the function's conclusion the return value is directly written into the return link subsequently the stack frame of the Cali function is removed except for the returning value which now integrates into the stack frame of the caller function here it's important to note that for the compiler to generate efficient machine code it requires knowledge of the exact size of the return link this is why systems programming languages mandate specifying the size of the Val value a function is returning which you do by providing a return type finally let's consider some important limitations in this example we write some values in the stack a number an array then more numbers and let's say then we want to add some elements to the array see the problem here adding elements to an array on the stack means we might have to overwrite other values that we might need later this is exactly what I talked about in my previous video the compiler will complain if you don't specify the size of any array even if the array is a member of a custom type sorry if I made you wait a lot for this answer but I think everything I've talked about up to this point was necessary to understand this when you specify the size of an array the compiler gains precise knowledge about the amount of space required to accommodate that value in the stack but to the finite size of the stack even with provided sizes exceeding this limit can lead to a program crash resulting in a stack Overflow error this scenario is particularly pertinent when dealing with recursion when a function calls itself new stack frames are successively pushed into the stack for each recursive call if a base case isn't defined to Halt the recursion the program will keep adding stack frames until it overflows furthermore even with a base case a stack overflow can occur if the program doesn't reach it before surpassing the Stack's capacity hence there are instances where using iterative Solutions is recommended over recursive ones it's essential to clarify that the behavior of the stack is influenced by the programming language and its compiler what you've seen in this video might not precisely mirror the implementation in every language therefore it's important to understand that I aim to simplify concepts for the purpose of illustration in these videos in summary today we learned that maintaining compact data is crucial for reducing memory usage by avoiding fragmentation and enhancing performance through an increased probability of cach a hits specifying the size of our variables enables compilers to organize our data more efficiently and predictably in the stack this organization facilitates the handling of function calls and local variables in a seamless and Speedy manner we also delved into the limitations of the stack it possesses a size constraint and lacks the flexibility for types to Dynam dynamically grow or Shrink Additionally the stack operates as a single-threaded entity in the context of multi-threading each thread necessitates its own stack posing a limitation when attempting to share memory between threads all these challenges find a common solution in what is known as the Heap a topic that we will explore in a future video and that is a very very simplified version of how the stack Works hopefully you now have an idea of why it is so fast if you learned something hit that like button that will help me a lot making these videos is kind of timec consuming but I've been enjoying it so consider subscribing if you want to learn more it's free and by the way this is where I say goodbye because as a free AI model I have a token limit of 250
Info
Channel: Core Dumped
Views: 140,948
Rating: undefined out of 5
Keywords: ComputerScience, Computer Science, Programming, Rust, JavaScript, C++, C programming, Developer, The Stack, The Heap, Call Stack, Hardware Stack, Python
Id: N3o5yHYLviQ
Channel Id: undefined
Length: 13min 45sec (825 seconds)
Published: Sat Feb 03 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.