Back To Basics: C++ Containers

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and apologies for being completely out of schedule at the moment if you want to know the reasons why you can listen to the cpp cast episode i've just recorded with rob and jason i'll put a link to that down below this video is about c plus plus containers and how they can make your programs more flexible and how they can make you a more fluid programmer arguably in this age of modern c plus programming understanding containers thoroughly is vitally important so let's get started i'm sure that by virtue of watching this channel most of you are familiar with the order of a program it's a sequence of instructions and those instructions operate upon data there's all sorts of different instructions we've got loops we've got branches but all the time we're operating on different pieces of data as the program becomes larger and more sophisticated we start to see that some data is related to others and as programmers it becomes quite inconvenient to have all of these disparate separate pieces of information and as program designers we strive to organize our programs and our data in such a way as to make working with them both effective and flexible whereas the program structure can be influenced by good practice and following design patterns typically we group data into meaningful categories so for example this data and this data and this data they're all red and in contrast this data this data and this data are all green individually this red and green could be the type of the data we've seen that as integers and strings and custom structs and classes but often we also like to group data together particularly if it's of the same type since the dawn of time we've called this grouping of similar data arrays you should be quite familiar with an array here is an array that holds 10 integers in this case we're telling the compiler at compile time that a is an array that must hold 10 integers and so when this program is run and it gets to this point memory to fulfill the storage requirements of our array is allocated on the so-called stack in contrast we could allocate this memory technically at runtime by using the new keyword and this is allocated on the so-called heap when we allocate memory like this we must remember to free it up but when it's allocated on the stack we don't need to free it up because as soon as this goes out of scope well the structure of the program will free that up for us regardless the trustee array has been with us for many many years and has been time proven as the best form of container for all data types well let's just consider that for a second is it really the best what is an array when i've declared an array like this my program is requesting 10 spaces in memory sufficient to store integers this means that somewhere in our memory a pointer is created called a and if you recall from my what our pointers video a pointer is just a variable that holds a memory address we know the amount of memory required because we've told it we want 10 elements and each one of those elements must hold an integer well an integer in c plus is four bytes so to store one integer we would need four bytes arrays also have to be contiguous this means there can't be any spaces or gaps in between the elements when laid out in memory this is important because it makes arrays very fast to access if i for example want to access array element 0. i can take the initial pointer and to that i can add our index multiplied by the size of the element of the array that's easy enough that means we're pointing to this location if i change this to a seven and this to a seven we have zero one two three four five six seven which points to the seventh element of our array if there were gaps in between our elements in memory we couldn't do this and also tools like the compiler wouldn't be able to do many optimizations and tricks now being the fat fingered clumsy programmer that i am it's not uncommon that i would inadvertently type 17 instead of seven in which case we're now pointing to some memory that we don't know what it is and that's entirely legal to do in c plus other than indexing like this or producing interesting pointer arithmetic like this there is no other way to interact with an array there's no defined interface which describes what the elements of the array are and how we could traverse them in order or how we could perform interesting operations upon them it's just a lump of data admittedly it's an incredibly important lump of data but it requires a very manual interaction on behalf of the programmer so rather curiously we're going to start with a container that's not really a container but it certainly smells like one standard array as has just been demonstrated the only realistic way of interacting with an array is through its index and i could quite happily set an invalid index to a value hopefully it goes without saying this is bad recognizing some of the limitations of this basic array c plus provides an alternative if we include array from the standard library so let's have a look to declare a standard array we create the object and we specify with a template what the type is it's going to hold in this case it's integer and it's going to hold 10 of them i'll call this one b and this time i'm also going to try and set index 17 to be equal to 6. i just compile and run this program and we'll use the debugger to step through it and see what happens so here i'm setting the 17th element of array a to 6 no problems at all that's absolutely fine but if i try and set it of my standard array it froze an error we've got some rudimentary error handling on our fundamental array type curiously after what seems like a few days the visual studio static analysis tool can also pull up these errors too but we saw at runtime that the standard array object is looking after us and i shall point out that only applies in debug mode so what are the other advantages of having this sort of pseudo container previously with our a array we don't know what the size is anywhere else in our program we have to maintain its size we've defined it as 10 here we could have a constant somewhere described externally to our program because don't forget this is defined statically we don't know what the size is at runtime but now we have a standard array object just like any class it has methods we can invoke to interrogate the nature of the object so we can specify b.size for example which will return how many elements are in the array if we wanted to get a raw pointer to the contiguous data underneath we can do just that with data but because standard array is trying to emulate what is known as a container it provides other things too an empty container contains no data so we need to add something to it here's some data one of the fundamental principles of a container is that you can traverse the container in a particular direction this would therefore imply that the data within the container some of it exists at the front and some of it exists at the back so if i take my standard array of 10 elements that we had before we know how these are lined out in memory there's some contiguous memory allocation but our container allows us to abstract away from that and we can define one as being the beginning and one as being the end this allows us to create a special object called an iterator and iterators are really just a way of identifying one of the elements in a container so let's say we wanted to initialize some of the values of our array traditionally we could create a for loop and in this for loop we know the size of our array we just calculate the index and we index the array directly setting it to some value well since we're using a standard array we can be a bit smarter about this we needn't hard code in the size we can interrogate it we could also use iterators associated with the container now this will get a little bit wordy here i define an iterator i and i set it to be the beginning of our container and i'm going to increment that iterator until we reach the end of our container the iterator itself is a bit like a pointer so i can set the value that the iterator is representing to well our something now this is quite a mouthful now fortunately in modern c plus we have the auto keyword to handle that for us using iterators like this is useful if we have different begin and end iterators perhaps we want to work on a smaller range of the full array but in this instance and more often than you think you typically operate on the whole array at once so fortunately there is some syntax shortcuts just for doing that i call these little auto for loops but they are known as a ranged for loop it knows that it will automatically iterate through every item in the container because the array is emulating the container interface there is something else we can do too the algorithm library provided as part of the standard is absolutely nuts and it's full of all sorts of bafflingly strange functions it's really worth a look and many of those functions are designed to work on containers so by exploiting containers we've gone from something that's inherently unsafe and very inflexible to something which is proven to be safe and flexible but also less code and easier to read and understand containers also support initialization via initializer lists and i personally find initializer lists are very descriptive when trying to read the code they're also great for situations where you just want to set up some data which you'll be referencing later on you know that data is not going to change and containers as we'll see in the rest of this episode have a variety of advantages and disadvantages depending on the circumstances in which they're used arrays are great but they suffer from one big problem we need to know the size of them in advance this can be quite a limitation so we'll see that the rest of the containers are able to be dynamic in size and the first container that people go to and we use it a lot on this channel is standard vector the unfortunately named standard vector is often the source of confusion for new c plus programmers because you naturally assume that it is somehow related to an x y coordinate it is not but instead refers to a structure or a container which is like an array but can grow in size when needed yet again here is our memory but this time there are other things in it which don't necessarily belong to us we can't touch them when we declare a standard vector we only specify the type of data that it's going to store so in this case it's going to be an integer we don't specify a size when this vector object is created well it's got to exist somewhere so it's given a pointer that represents the start because it's a container we add things to the vector using the push back or in place back functions recall that containers have a front and a back so pushing to the back of the vector well that puts it at the end so as we start adding integers to our vector it starts occupying the memory and the nice thing is these are contiguous so if we have our vector here defined as a we can index our vector just like an array and we can read and write to it but let's keep adding more and more integers to our vector so the vector is growing in size each time and here we hit a problem we can't keep growing and remain contiguous and being contiguous is vitally important to a vector that's what gives us this fast indexable access we can access any location with minimal overhead so what can it do well it can identify that it can't expand anymore it's got to go and find some memory which is larger than the memory it currently occupies that's okay so it copies the entirety of its contents over to this larger space and then as the user starts adding more elements to the vector that's fine it can grow again now big problem here and it's one i've pointed out on the channel a couple of times before let's assume just for a second you had a pointer to one of these vector elements and then you added some stuff to your vector and it was in copied in its entirety to somewhere else in memory well of course this pointer is now completely invalid it's pointing to junk so what we can assume is that when we add things to vectors we can't assume that any pointers to the vector remain valid after the fact however if we know that our vector is never going to change in size then this is perfectly acceptable too and in fact you can initialize a vector to a known size to begin with and treat it just like an array now i've created a completely silly program to try and explore what these containers are doing behind the scenes and to be fair this program is a perfect example of why containers exist in the first place they're there to avoid needing to do lots of pointer arithmetic and black magic so if looking at some of this syntax is scaring you off don't worry about it it is not relevant in order to use containers but might appeal to those nerdy enough to worry about how they work behind the scenes so i'll just quickly run through this bit of code i've got a small lambda function here that rolls a die it returns a random value between one and six inclusive and here i've created my container i've called it vector and it stores integers by default i add one dice roll to my container and then i start doing strange things i grab the pointer to where the first element of the container is when the container was created and an item added to it and i also create an incredibly wordy duration because i'm going to record how much time it takes to add an item to that container then i sit in a do while loop each time i'm waiting for the user to press the enter key in the command prompt and each time the user does press the enter key we're going to add another dice roll to the container and we're going to display its contents and various aspects of the memory associated with that container things to note here is that we're using iterators of the container we're using the size method of the container and we're using these little auto for loops to iterate through all of the items of the container so let's take a look to begin with the program will automatically add one dice roll to the container here it is content six and we can see that the memory offset from where the original is created is zero and the offset of this particular element from the start of the container is zero this is to be expected the vector hasn't moved in memory and we're looking at element zero of the vector so let's press the enter key well adding in another dice roll a rather rubbish random number generator admittedly it's another six well that took three microseconds but what we can see here is that the offset from the start of the container has remained contiguous we've got element one and element zero but where it exists in memory relative to when the program started has completely changed let's try again well here we've added a 5. we can see that the vector has remained contiguous it took three microseconds but yet again the entire vector has moved in memory add another element it's well this random number generator is rubbish and the odds of us actually going back to where we started is quite remarkable but hey ho there we go it took five microseconds but this time it has copied the vector again we've added another six this random number generator truly is terrible but the whole vector has moved in memory now this time rather curiously we've added another five but we can see it only took one microsecond the vector itself has remained contiguous the order of the elements is zero one two three four five but if we look at the previous iteration we can see that the memory addresses are the same as the memory addresses this time all we've got this time is an additional 61 at the end so this is telling us it didn't move the whole vector in order to add this one element to add a few more this time it did five microseconds this time it didn't and here it didn't again this time it did move everything this time it didn't move everything didn't move everything didn't move everything but this time it had to so the predictability of the time requirements of the vector is well somewhat unstable vectors don't grow element by element that would be too inefficient instead it makes an educated guess that actually if i'm having things added to me then maybe i should grow by four or five elements this time and maybe next time it'll be 10 or 12 elements the actual policy of growth is determined by the implementation so this is why we see on certain insertions into the vector that there is basically no time overhead at all so standard vector is the probably the most useful jelly bean container that we have it's great for random access and indexing we can access any part of the vector and we know predictably how much time that's going to take the downside is that we don't know what's going to happen when we add things to the vector that can become quite unpredictable not only in terms of time but also anything that we've got externally that's relying on the data in the vector however if we initialize our vectors at the point of creation with a specific size we can treat them just like arrays which is a very useful thing next we'll look at what we can consider to be the opposite of a vector standard list in the standard library a list represents a doubly linked list and when we add our first element to the list container it gets put somewhere in memory each element in the list is associated with a node unlike the vector we can add things to the front or the back of the list so let's say we push to the back of the list another integer it goes somewhere else in memory there is no guarantee of anything being contiguous and it creates another node that node holds the value that we're storing and so in our case it's probably going to be an integer nodes are joined together via a couple of pointers one that points to the node that immediately comes after it and another which points to the node that immediately precedes it so let's add a third item to the back of our list again it gets allocated somewhere random in memory the memory could be filled up with all sorts of other things so it just has to go and find some space that's appropriate and again we create some pointers adding things to the front of the list is just as simple too it goes and finds an appropriate place in memory creates the node and updates the pointers in fact we can keep adding and adding and adding to the list and we know that the location of the thing in memory is not going to change it doesn't need to it just finds available memory each time we add something in if we remove something from the list let's remove this item which removes this node well our pointers effectively just join together so insertion and removal from lists are very cheap to do there's no reorganization of memory like there was with the vector however one of the problems we have got is identifying how many things that are in the list the only way we can work this out is to go to the front of the list using our begin function and count the individual items until we get to the location where there is no next node with a vector we could determine the size instantaneously it's stored as part of the vector but with the list we have to iterate through all of the items and of course with a large list that's a lot of things to iterate through we also have no way of directly indexing the list we can't use brackets for example to give us the address of a specific node because there's nothing contiguous about a list we can't perform that calculation so the only way to get to a known node in your list is to walk the list from the beginning or the end so lists are useful if you're frequently adding and removing things to them but don't need to access them in a specific way we can use the same program as before to analyze the list all we need to do is change the type of our container because containers share this common interface everything else will just work let's take a look so here is our list it contains one element and the address of that element is zero we had another element it took two microseconds so no time at all and gave us some memory address 56 and as we add elements what we see is that their addresses never change but the cost of inserting the element is also always very low it's a shame that there isn't a data structure that has the insertion speed of a list but the indexable speed of a vector well actually there is and oddly it's called a double ended queue but i call them a deck standard deck represents the point where containers go from rather trivial implementations of data storage into a more sophisticated hybrid structure rather crudely one could consider a deck to be a linked list of arrays each of these blocks represents a small array of information well it's the elements that are deck store it could be a very small number such as four or eight elements per block therefore if we wanted to add something to the back of our deck it needs to make a decision about either creating a new block and inserting the element at the front of it or using up the existing space in the block at the end likewise for inserting something at the front of the deck too unlike the vector and just like the list this means that there is no requirement to copy the contents of the deck should it run out of space inserting something into the middle of a deck can incur some overhead but it stops as soon as it's finished copying the contents of the local blocks i.e it doesn't need to relocate its entire contents in order to satisfy that insertion a deck also contains an index which does the required calculations to convert a direct access index like this n into the correct location required depending on the block and the element within that block this means that decks actually have a significant memory overhead before we even start to use them so where is a vector in a list while they contain nothing just a few bytes a deck does actually require some setup in advance but once it's set up it does allow us this random access indexing it's not quite as fast as a vector because it has to go through these calculations and lookup tables and the deck has very cheap insertion at both ends of its container and just to show that a deck will fit into our program let's include it here and again all i'm going to do is change the container type to deck let's iterate through the program for a few steps so we're now filling up the first block of our deck we can see that the items are contiguous but now we've got to this fifth item they've started to fill up a different block it looks like the block size for this particular implementation of the deck contains four elements per block so the standard deck seems like the wonder container we should all be using and there's probably some truth to that but just remember that to index an element in the deck is twice as costly as that of indexing it for a standard vector the next two container types are a little different and don't necessarily fit the mold that we've seen so far we're going to look at set an unordered set in many ways the standard set is analogous to its mathematical counterpart it is a set of unique objects so whereas the containers we've seen before are just glorified massive data stores of some description a set does actually imply some rules onto what contents it can contain and in principle this rule is it can only have one of any particular type of item i.e they must all be unique so let's include set into our program and change our container type we have to make a few changes here as we can see visual studio has highlighted some errors and that's because the set doesn't have a front and a back in fact alongside only containing unique objects the set will also sort the objects too so instead of pushing to the back we simply insert items into the set let's take a look this time now our rubbish random number generator may actually prove to be quite useful we know that it's going to generate sixes and fives for the first few entries our set to begin with contains one element six well we know that the first random number it generates is another six so our set still only contains one element then it rolled a five and it contains the two elements then we know it rolls another five but it still only contains the two elements as we were trying to add non-unique elements to the set it didn't take very much time at all but adding a new element to the set that was unique took more time the set needed to rearrange its contents if we keep pressing the enter key eventually we get to the point where we've filled it up with all of the possibilities that our system allows it's a six-sided dice so we can only store the numbers one to six and we can see that they're stored in order that's because our set is sorted another useful property of the set is that the items within it are always in the same place in memory alongside set is unordered set now if we don't want the set to be put in the correct order whereas we saw before it was one two three four five six we can actually gain some performance we saw that typically it took six to eight microseconds in order to add items to the set when they were new by not worrying about the order of the items we can insert items into the set and retain its properties of exclusivity but on the whole perform faster and in this case we saw that we never really went above 5 microseconds for the insertion but the contents of the set is not in order i find set useful when the user is collecting lots of data all at once and they may have duplicates of data for example if you've got an array of tiles and you want to select a whole bunch of these tiles as they're waving the mouse cursor around you can just simply throw those tile coordinates at a set and let the set do the heavy lifting of working out whether it's a duplicate or not the final pair of containers i want us to look at today are the map containers and these don't necessarily fit into the mold like the others like set we have map and unordered map unfortunately one of the things i can't do here is just change the container type and that's because maps are fundamentally quite different we can see it's thrown an error here a map requires two types specified as part of its template typically maps are used to store key and value pairs and so the first argument in the template is going to be the type of our key i'm going to be a bit different and use a string we can see now that visual studio in the compiler is just unhappy with a lot of the code that we've got here and that's because maps use another layer of typing above what they're fundamentally storing i've moved to the end of the program here to talk about maps because i can upload this file to the github afterwards and for now i've just commented out the preceding program as i mentioned maps are just like containers in terms of having a beginning and an end but they are used to create a correlation between one type and another for example i could map the word one onto the value one and i could create a mapping for well as many different words as i like just as before i can iterate through all of the items in the map but there's a slight difference in this instance the type returned in this case is actually a standard pair where the first element of the pair is the key value and the second element of the pair is the well value value let's take a look well this was an unordered map and we can see that it's actually stored things one two six three five four yes it's certainly unordered if we change that to a regular map it's well it still looks rubbish it stored things in well what order is this now well it stored it in alphabetical order because that's how it knows to compare the key values so here we've got the f's o s t t so yes that is now stored in alphabetical order on the whole but not always what we will see is that map and set will perform slower than the unordered counterparts this is because the unordered counterparts hash the key value or the element itself in the case of a set which is a way of creating a unique identifying number from the data that's being stored there it's far quicker to do things with a large unique number than it is with a complex data component it becomes even more complicated when you need to sort things because you need to see is one larger than the other and that can mean all sorts of things in the context of set and map and the objects that you're storing within them in this video today i've shown very basic types that are built into the standard library strings and integers if you're using your own custom objects then you'll need to provide special utilities that actually perform the equality and the comparisons of your objects the compiler certainly has no idea whether your custom struct is larger than another custom struct when it's trying to evaluate the order that they should be stored in those containers so this video has been a whirlwind look at the containers and why they're useful and why knowing about them will make you a better and more fluid programmer anyway if you've enjoyed this video a big thumbs up please have a think about subscribing come and have a chat on the discord i'll upload this file to github even though it's not that useful and i think the next video will be the next part in the networking series until then take care
Info
Channel: javidx9
Views: 74,683
Rating: 4.9883285 out of 5
Keywords: one lone coder, onelonecoder, learning, programming, tutorial, c++, beginner, olcconsolegameengine, command prompt, ascii, game, game engine, pixelgameengine, olc::pixelgameengine
Id: 6OoSgY6NVVk
Channel Id: undefined
Length: 31min 41sec (1901 seconds)
Published: Mon Apr 05 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.