CS50 2021 - Lecture 5 - Data Structures (pre-release)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right this is cs50 and this is already week five which means this is actually our last week in c together in fact in just a few days time what has looked like this and much more cryptic than this perhaps is going to be distilled into something much simpler next week when we transition to a language called python and with python we'll still have our conditionals and loops and functions and so forth but a lot of like the low-level plumbing that you might have been wrestling with struggling with frustrated by over the past couple of weeks especially now that we've introduced pointers and it feels like you probably have to do everything yourself in python and in a lot of higher level languages so to speak more modern more recent languages you'll be able to do so much more with just single lines of code and indeed we're going to start leveraging libraries all the more code that other people wrote frameworks which is collections of libraries that other people wrote and on top of all that will you be able to make even better grander more impressive projects that actually solve problems of particular interest to you particularly by way of your own final project so last week though in week four recall that we focused on memory and we've been treating this memory inside of your computer as kind of like a canvas right at the end of the day it's just zeros and ones or bytes really and it's really up to you what you do with those bytes and how you interconnect them how you represent information on them and arrays were like one of the simplest ways we started playing around with that memory just contiguous chunks of memory back to back to back but let's consider for a moment some of the problems that pretty quickly arise with arrays and then today focus on what more generally are called data structures using your computer's memory as a much more versatile cam canvas to create even two-dimensional structures to represent information and ultimately to solve more interesting problems so here's an array of three of size three maybe the size of three integers and suppose that this is inside of a program and at this point in the story you've got three numbers in it already one two and three and suppose whatever the context you need to now add a fourth number to this array like the number four well instinctively where should the number four go if this is your computer's memory and we currently have this array one two three from what left to right where should the number four just perhaps naively go yeah what do you think sorry oh okay so you could replace number one i don't really like that though because i'd like to keep number one around but that's an option but i'm losing of course information so what else could i do if i want to add the number four over there yeah so i mean it feels like if there's some ordering to these which seems kind of a reasonable inference that it probably belongs somewhere over here but recall last week as we started poking around a computer's memory there's other stuff potentially going on and if we sort of fill that in ideally we'd want to just plop the number four here if we're maintaining this kind of order but recall in the context of your computer's memory there might be other stuff there some of these garbage values that might be usable but we don't really know or care what they are as represented by oscar here but there might actually be useful data in use like if your program has not just a few integers in this array but also a string that says like hello world it could be that your computer has plopped the h-e-l-l-o w-o-r-l-d right after this array why well maybe you created the array in one line of code and filled it with one two three maybe the next line of code used get string or maybe just hard-coded a string in your code for hello world and so you kind of painted yourself into a corner so to speak now i think you might claim well let's just overwrite the h but that's kind of problematic for the same reasons we don't want to do that so where else could the four go or how do we solve this problem if we want to add a number and there's clearly memory available because those garbage values are junk that we don't care about anymore so we could certainly reuse those where could the four and perhaps this whole array go any instincts here your thoughts yeah okay so i'm hearing we could move it somewhere maybe replace some of those garbage values and honestly we kind of have a lot of options we could use any of these garbage values up here we could use any of these down here or even further down the point is there is plenty of memory available as indicated by these oscars where we could put four maybe even five six or more integers the catch is that we sort of chose poorly early on or we just got unlucky and one two three ended up back to back with some other data that we care about all right so that's fine let's go ahead and assume that we'll abstract away everything else and we'll plop the new array in this location here so i'm going to go ahead and copy the one over the two over the three over and then ultimately once i'm ready to fill the four i can throw away essentially the old array at this point because i have it now entirely in duplicate and i can populate it with the number four all right so problem solved that is a correct potential solution to this problem but what's the trade-off and this is something we're going to start thinking about all the more what's the downside of having solved this problem in this way yeah yeah i'm adding a lot of running time it took me a lot of effort to copy those additional numbers now granted it's a small array three numbers who cares it's going to be over in the blink of an eye but if we start talking about interesting data sets sort of uh web application data sets mobile app data sets where you have not just a few but maybe a few hundred a few thousand a few million pieces of data this is probably kind of a sub-optimal solution to just oh move all your data from one place to another because who's to say that we're not going to paint ourselves into a new corner and it would feel like you're wasting all of this time moving stuff around and ultimately just costing yourself a huge amount of time in fact if we put this now into the context of our big o notation from a few weeks back what might the running time now of search be for an array let's start simple a throwback a couple of weeks ago if you're using an array to recap what was the running time of a search algorithm in big o notation so maybe in the worst case if you've got n numbers three in this case or four but n more generally big o of what for search yeah what do you think big o of n and what's your intuition for that okay yeah so if we go through each element for instance from left to right then search is going to take us big o notation a big o running time if though we're talking about these numbers specifically and now i'll explicitly stipulate that yeah they're sorted does that bias anything what would the big o notation be for searching an array in this case be it of size 3 or 4 or n more generally feel free to just shout this one out big o of not n but rather log n right because we could use per week zero binary search on an array like this we'd have to deal with some rounding because there's not a perfect number of elements at the moment but you could use binary search go to the middle roughly and then go left or right left or right until you find the element you care about so search remains in big o of log n when using arrays but what about insertion now if we start to think about other operations like adding a number to this array or adding a friend to your contacts app or google finding another page on the internet so insertion happens all the time what's the running time of insert when it comes to inserting into an existing array of size n how many steps might that take any thoughts here you've got n elements already just make more attention if i was this a hand no or just a scratch yes okay we'll go with it say again say once more there's a little noise over here big o of n it would be indeed n why because in the worst case where you're sort of out of space you have to allocate it would seem a new array maybe taking over some of the previous garbage values but the catch is even though you're only inserting one new number like the number four you have to copy over all the darn existing numbers into the new one so if your original array is size n the copying of that is gonna take big o of n plus one but we can throw away the plus one because the math we did in the past so insert now becomes big o of n and that might not be ideal because if you're in the habit of inserting things frequently that could start to add up and add up and end up and this is why computer programs and websites and mobile apps could be slow if you're not being mindful of these kinds of trade-offs so what about just for good measure uh omega notation and maybe the best case well just to recap here we could get lucky and search could just take one step because you might just get lucky and boom the number you're looking for is right there in the middle if using binary search or even linear search for that matter and insert two if there's enough room and we didn't have to move all of those numbers one two and three to a new location you could get lucky and we could have someone suggested just put the number four right there at the end but of course we could get lucky and lucky or not and if we don't get lucky it might take n steps if we do get lucky it might just take the one or constant number of steps so what now knowing what your memory is what can we now do with it that solves that kind of problem because the world is going to get really slow and our apps and our phones and our computers are going to get really slow if we're just constantly wasting time moving things around in memory what could we perhaps do instead well just just one new piece of syntax today that builds on these three pieces of syntax from the past recall that we've looked at struct which is a key word in c that just lets you invent your own structure your own variable if you will in conjunction with typedef which lets you say a person has a name and a number or something like that or a candidate has a name and some number of votes you can encapsulate multiple pieces of data inside of just one using struct what did we use the dot notation for now a couple times what does the dot operator do in c to access what perfect to access the field inside of a structure so if you've got a person with a name and a number you could say something like person.name or person.number if person is the name of one such variable star of course we've seen now in a few ways like way back in week one we saw it as like multiplication uh last week we began to see it in the context of pointers whereby you use it to declare a pointer like in star p or something like that but we also saw it in one other context which was like the opposite which was the dereference operator which says if this is an address that is if this is a variable like a pointer and you put a star in front of it then with no int or no char no data type in front of it that means go to that address and it dereferences the pointer and goes to that location so it turns out that using these three building blocks you can actually start to now use your computer's memory almost any way you want and even next week when we transition to python and you start to get a lot of features for free like a single line of code will just do so much more in python than it does in c it boils down to those basic primitives and just so you've seen it already it turns out that it's so common in c to use this operator to go inside of a structure and this operator to go to an address that there's shorthand notation for it aka syntactic sugar that literally looks like an arrow so recall last week i was in the habit of pointing even with the big foam finger this arrow notation a hyphen and an angled bracket denotes going to a uh an address and looking at a field inside of it but we'll see this in practice in just a bit so what might be the solution now to this problem we saw a moment ago whereby we wanted we had painted ourselves into a corner and our memory a few moments ago looked like this we could just copy the whole existing array to a new location add the four and go about our business what would another perhaps better solution longer term be that doesn't require constantly moving stuff around maybe hang in there for your instincts if you know the sort of buzz phrase we're looking for from past experience hang in there but if we want to avoid moving the one two and the three but we still want to be able to add endless amounts of data what could we do yeah so maybe create some kind of list using pointers that just kind of point at a new location right in an ideal world even though this uh piece of memory is being used by this h in the string hello world maybe we could somehow use a pointer from last week like an arrow that says after the three oh i don't know go down over here to this location in memory and you just kind of stitch together these integers in memory so that each one leads to the next it's not necessarily the case that it's literally back to back that would have the downside it would seem of costing us a little bit of space like a pointer which recall takes up some amount of space typically 8 bytes or 64 bits but i don't have to copy potentially a huge amount of data just to add one more number and so these things do have a name and indeed these things are what generally would be called a linked list a linked list captures exactly that kind of intuition of linking together things in memory so let's take a look at an example here's computers memory in the abstract suppose that i'm trying to create an array no let's generalize it as a list now of numbers an array has a very specific meaning it's memory that's contiguous back to back to back at the end of the day i as the programmer just care about the data 1 2 3 4 and so forth i don't really care how it's stored until i don't care how it's stored when i'm writing the code i just want it to work at the end of the day so suppose that i first insert my number one and who knows it ends up up there at location ox123 for the sake of discussion all right maybe there's something already here and heck maybe there's something already here but there's plenty of other options for where this thing can go and suppose that for the sake of discussion the first available spot for the next number happens to be over here at uh location ox456 for the sake of discussion so that's where i'm gonna plop the number two and where might the number three end up oh i don't know maybe down over there at ox789 the point being i don't know what is or really care about everything else that's in the computer's memory i just care that there are at least three locations available where i can put my one my two and my three but the catch is now that we're not using an array we can't just naively assume that you just add one to an index and boom you're at the next number add two to an index and boom you're at the next next number now you kind of have to leave these little bread crumbs or use the arrow notation to kind of lead from one to the other and sometimes it might be close a few bites away maybe it's a whole gigabyte away in an even bigger computer's memory so how might i do this like where do these pointers go as you proposed right all i have access to here are bytes i've already stored the one the two and the three so what more should i do okay yeah so let me you put the pointers right next to these numbers so let me at least plan ahead so that when i ask the computer like malloc recall from last week for some memory i don't just ask it now for space for just the number let me start getting into the habit of asking matlock for enough space for the number and a pointer to another such number so it's a little more aggressive of me to ask for more memory but i'm kind of planning ahead and here's an example of a trade-off almost any time in cs when you start using more space you can save time or if you try to conserve space you might have to lose time it's being that kind of trade-off there so how might i solve this well let me abstract this away and either next to or below i'm just drawing it uh vertically just for the sake of discussion so the arrows are a bit prettier i've asked malloc for now twice as much space prac it would seem than i previously needed but i'm going to use this second chunk of memory to refer to the next number and i'm going to use this chunk of memory deferred to the next essentially stitching this thing together so what should go in this first box well i claim the number ox456 and it's written in hex because it represents a memory address but this is the equivalent of sort of drawing an arrow from one to the other as a little uh check here what should go in this second box if the goal is to stitch these together in order one two three feel free to just shout this out okay okay that worked well so ox 789 indeed and you can't do that with hands because i can't count that fast so ox789 should go here because that's like a little breadcrumb to the next and then we don't really have terribly many possibilities here this has to have a value right because at the end of the day it's gotta use its 64 bits in some way so what value should go here if this is the end of this list so it could be 0x123 the implication being that it would kind of be a cyclical list which is okay but potentially problematic if any of you have accidentally sort of lost control over your uh code space because you had an infinite loop this would seem a very easy way to give yourself the accidental uh probability of a an infinite loop what might be simpler than that and ward that off say again so just the null character not n-u-l confusingly which is at the end of strings but n-u-l-l as we introduced it last week which is the same as ox0 so this is just a special value that programmers decades ago decided that if you store the address zero that's not a valid address there's never going to be anything useful at ox0 therefore it's a sentinel value just a special value that indicates that's it there's nowhere further to go it's okay to come back to your suggestion of making a cyclical list but we'd better be smart enough to maybe remember where did the list start so that you can detect cycles if you start looping around in this structure otherwise all right but these addresses who really cares at the end of the day if we abstract this away it really just now looks like this and indeed this is how most anyone would draw this on a whiteboard if having a discussion at work talking about what data structure we should use to solve some problem in the real world we don't care generally about the addresses we care that in code we can access them but in terms of the concept alone this would be perhaps the right way to think about this all right let me pause here to see if there's any questions on this idea of creating a linked list in memory by just storing not just the numbers like one two three but twice as much data so that you have little bread crumbs in the form of pointers that can lead you from one to the next any questions on these linked lists any questions no all right oh yeah over here this does take more memory than an array because i now need space for these pointers and to be clear i technically didn't really draw this to scale thus far in the class we've generally thought about integers like 1 2 and 3 as being 4 bytes or 32 bits i made the claim last week that on modern computers pointers tend to be eight bytes or 64 bits so technically this box should actually be a little bigger it was just going to look a little stupid in the picture so i abstracted it away but indeed you're using more space as a result can the computer store i'm not sure i follow i did hear you though the bits i'm not sure say that ask it a different way sorry how does the computer identify useful data from uh used data so for instance garbage values or non-garbage values for now think of that as the job of malloc so when you ask malloc for memory as we started to last week malloc keeps track of the addresses of the memory it has handed to you as valid values the other type of memory you use not just from the heap because recall we briefly discussed that malloc uses space from the heap which was drawn at the top of the picture pointing down there's also stack memory which is where all of your local variables go and where all of the memory used by individual functions go and that was drawn in the picture is working its way up that's just an artist's rendition of direction the compiler essentially will also help keep track of which values are valid or not inside of the stack or really the underlying code that you've written will keep track of that for you so it's managed for you at that point all right good question sorry it took me a bit to catch on so let's now translate this to actual code how could we implement this idea of let's call these things nodes and that's a term of art and cs whenever you have some kind of data structure that encapsulates information node n-o-d-e is the generic term for that so each of these might be said to be a node well how can we do this well a couple of weeks ago we saw how we could represent something like a student or a candidate and a student or rather a person we said well has a name and a number and we used a few pieces of syntax here one we use the struct keyword which gives us a data structure we use typedef which defines the name person to be our new data type representing that whole structure so we probably have the right ingredients here to build up this thing called a node and just to be clear what should go inside of one of these nodes do we think it's not going to be a name or a number obviously but what should a node have in terms of those fields perhaps yeah so a number like a number and a pointer in some form so let's translate this to actual code so let's rename person to node for to capture this notion here and the number is easy if it's just going to be an int that's fine we can just say int number or int n or whatever you want to call that particular field the next one's a little non-obvious and this is where things get a little weird at first but in retrospect it should all kind of fit together let me propose that ideally we would say something like node star next and i could call the word next anything i want next just means what comes after me is the notion i'm using in it so a lot of cs people would just use next to represent the name of this pointer but there's a catch here c and c compilers are pretty naive recall they only look at code top to bottom left to right and anytime they encounter a word they have never seen before bad things happen like you can't compile your code you get some cryptic error message or the like and that seems to be about to happen here because if the compiler is reading this code from top to bottom it's going to say oh inside of this struct should be a variable called next which is of type node star what the heck is a node because it literally does not find out until two lines later after that semicolon so the way to avoid this which we haven't quite seen before is that you can temporarily name this whole thing up here struct node and then down here inside of the data structure you say struct node star and then you leave the rest alone this is kind of a work around this is possible because now you're teaching the compiler from the first line that here comes a data structure called struct node down here you're shortening the name of this whole thing to just node why it's just a little more convenient than having to write struct everywhere but you do have to write struct node star inside of the data structure but that's okay because it's already come into existence now as of that first line of code so that's the only fundamental difference between what we did last week with a person or a candidate we just now have to use this this struct workaround syntactically all right yeah question why is the next variable a struct node star pointer and not an int star pointer for instance so think about the picture we are trying to draw technically yes each of these arrows i deliberately drew is pointing at the number but that's not alone they need to point at the whole data structure in memory because the computer ultimately and the compiler in turn needs to know that this chunk of memory is not just an int it is a whole node inside of a node is a number and also another pointer so when you draw these arrows it would be incorrect to point at just the number because that throws away information that would leave the compiler wondering okay i'm at a number where the heck is the pointer you have to tell it that it's pointing at a whole node so it knows a few bytes away is that corresponding pointer good question yeah really good question it would seem that just as copying the array earlier required twice as much memory because we copied from old to new so technically twice as much plus one for the new number here too it looks like we're using twice as much memory also and to my comments earlier it's even more than twice as much memory because these pointers are eight bytes and not just four bytes like a typical integer is the differences are these in the context of the array you were using that memory temporarily so yes you needed twice as much memory but then you were quickly freeing the original array so you weren't consuming long term more memory than you might need the difference here too is that as we'll see in a moment it turns out it's going to be relatively quick for me potentially to insert new numbers in here because i'm not going to have to do a huge amount of copying and even though i might still have to follow all of these arrows which is going to take some amount of time i'm not going to have to be asking for more memory freeing more memory and certain operations in a computer anything involving asking for or giving back memory tends to be slower so we get to avoid that situation as well there's going to be some downsides though this is not all upside but we'll see in a bit just what some of those trade-offs actually are all right so from here if we go back to the structure in code as we left it let's start to now build up a linked list with some actual code how do you go about in c representing a linked list in code well at the moment it would actually be as simple as this you declare a variable called list for instance that itself stores the address of a node that's what node star means the address of a node so if you want to store a linked list in memory you just create a variable called list or whatever else and you just say that this variable is going to be pointing at the first node in a list wherever it happens to end up because malloc is ultimately going to be the tool that we use just to go get at any one particular node in memory all right so let's actually do this in pictorial form when you write a line of code like i just did here and i do not initialize it to anything with the assignment operator an equal sign it does exist in memory as a box as i'll draw it here called list but i've deliberately drawn oscar inside of it why to connote what exactly why have i done this here it's a garbage value i have been allocated the variable in memory called list which is going to give me 64 bits or 8 bytes somewhere drawn here with this box but if i myself have not used the assignment operator it's not going to get magically initialized to any particular address for me it's not going to even give me a node this is literally just going to be an address of a future node that exists so what would be a solution here suppose that i'm beginning to create my linked list but i don't have any nodes yet what would be a sensible thing to initialize list two perhaps yeah again so just null right when in doubt with pointers generally it's a good thing to initialize things to null so at least it's not a garbage value it's a known value invalid yes but it's a special value you can then check for with the conditional or the like so this might be a better way to create a linked list even before you've inserted any numbers into the thing itself all right so after that how can we go about adding something to this linked list so now the story looks like this oscar is gone because inside of this box is all zero bits just because it's nice and clean and this represents an empty linked list well if i want to add the number one to this linked list what could i do well perhaps i could start with code like this borrowing inspiration from last week let's ask malloc for enough space for the size of a node and this kind of gets to your question earlier like what is it i'm manipulating here i don't just need space for an ant and i don't just need space for a pointer i need space for both and i gave that thing a name node so sizeof node figures out and does the arithmetic for me and gives me back the right number of bytes uh this then stores the address of that chunk of memory in what i'll temporarily called n just to represent a generic new node and it's of type node star because just like last week when i asked matlock for enough space for an int and i stored it in an int star pointer this week if i'm asking for memory for a node i'm storing it in a node star pointer so technically nothing new there except for this new term of art and data structure called node all right so what does that do for me it essentially draws a picture like this in memory i still have my list variable from my previous line of code initialized to null and that's why i've drawn it blank i also now have a temporary variable called n which i initialized to the return value of malloc which gave me one of these nodes in memory but i've drawn it having garbage values too because i don't know what int is there i don't know what pointer is there it's garbage values because malloc does not magically initialize memory for me there is another function for that but malloc alone just says sure use this chunk of memory deal with whatever's there so how could i go about initializing this to known values well suppose i want to insert the number one and then leave it at that a list of size one i could do something like this and this is where you have to think back to some of these basics my conditional here is asking the question if n does not equal null so that is if malloc gave me valid memory and i don't have to quit altogether because my computer's out of memory if n does not equal null that is it equals a valid address i'm going to go ahead and do this and this is cryptic looking syntax now but does someone want to take a stab at translating this inside line of code to english in some sense how might you explain what that inner line of code is doing star and dot number equals one uh uh let me go further back no okay over here yeah perfect the place that n is pointing to set it equal to one or using the vernacular of going there go to the address in n and set its number field to one however you want to think about it that's fine but the star again is the dereference operator here and we're doing the parentheses which we haven't needed to do before because we haven't dealt with pointers and data structures together until today this just means go there first and then once you're there go access number you don't want to do one thing before the other so this is just enforcing order of operations the parentheses just like in grade school math all right so this line of code is cryptic it's ugly it's not something most people easily remember thankfully there's that syntactic sugar that simplifies this line of code to just this and this even though it's new to you today should eventually feel a little more familiar because this now is shorthand notation for saying start at n go there as by following the arrow and when you get there change the number field in this case 2 1. so most people would not write code like this it's just ugly it's a couple extra keystrokes this just looks more like the artist's renditions we've been talking about and how most cs people would think about pointers as really just being arrows in some form all right so what have we just done the picture now after setting number to one looks a little something like this so there's still one step missing and that's of course to initialize it would seem the pointer in this new node to something known like null so i bet we could do this like this with a different line of code i'm just going to say say if n does not equal null then set n next field to null or more pedantically go to n follow the arrow and then update the next field that you find there to equal null and again this is just doing some nice bookkeeping technically speaking we might not need to set this to null if we're going to keep adding more and more numbers to it but i'm doing it step by step so that i have a very clean picture and there's no bugs in my code at this point but i'm still not done there's one last thing i'm going to have to do here if the goal ultimately was to insert the number one into my linked list what's the last step i should perhaps do here just been english is fine yeah yes i now need to update the actual variable that represents my linked list to point at this brand new node that is now perfectly initialized as having an integer and a null pointer yeah technically this is already pointing there but i described this deliberately earlier as being temporary i just needed this to get it back from malloc and sort of clean things up initially this is the long-term variable i care about so i'm going to want to do something simple like this list equals n and this seems a little weird that list equals n but again think about what's inside this box at the moment this is null because there is no linked list at the beginning of our story n is the address of the beginning and it turns out end of our linked list so it stands to reason that if you set list equal to n that has the effect of copying this address up here or really just copying the arrow into that same location so that now the picture looks like this and heck if this was a temporary variable it will eventually go away and now this is the picture so kind of an annoying number of steps certainly to walk through verbally like this but it's just malloc to give yourself a node initialize the one the second the two fields inside of it update the linked list and boom you're on your way i didn't have to copy anything i just had to insert something in this case let me pause here to see if there's any questions on those steps and we'll see before long it all in context with some larger code yes i i drew them separately just for the sake of the voiceover of doing each thing very methodically in real code as we'll transition to now i could have and should have just done it all inside of one conditional after checking if n is not equal to null i could set number to a value like one and i could set the pointer itself to something like null in fact let me go ahead and do this let me go ahead and let me find a good entry point here give me one second how about we do something like this let me switch over to some code here let me start to make a program called list dot c and in list.c let's start with the old way so we kind of follow our the breadcrumbs we've laid for ourselves as follows so in this list.c i'm going to include standardio.h int main void as usual then inside of my code here i'm going to go ahead and give myself the first version of memory so int list3 is now implemented at the moment in an array so we're rewinding for now to week two style code and then let me just initialize this thing at the first location will be one at the next location will be two and at the last location will be three so the array is zero indexed always i for just the sake of discussion though and putting in the numbers one two three like a normal person might all right so now let's just print these out four int i get zero i less than three i plus plus let's go ahead now and print out using printf percent i backslash n list bracket i so very simple program kind of inspired by what we did in week two just to create and then print out the contents of an array so let's make list so far so good dot slash list and voila we see one two three now let's start to practice some of what we're preaching with this new syntax so let me go in now and get rid of the array version and let me zoom out a little bit to give ourselves some more space and now let's begin to create a list of size three so if i'm going to do this now dynamically so that i'm allocating these things again and again let me go ahead and do this let me give myself a list that's of type in star equal the return value of malloc of three times whoops three times the size of an int so what this is going to do for me is give me enough memory for that very first picture we drew on the board which was the array containing one two and three but laying the foundation to be able to resize it which was ultimately the goal so my syntax is a little different here i'm going to use malloc and get memory from the so-called heap as we called it last week instead of using the stack by just doing the previous version where i said int list 3. that is to say this line of code from the first version is in some sense identical to this line of code in the second version but the first line of code puts the memory on the stack automatically for me the second line of code that i've left here now is creating an array of size three but it's putting it on the heap and that's important because it was only on the heap and via this new function last week malloc that you can actually ask for more memory and even give it back when you just use the first notation int list 3 you have permanently given yourself an array of size three you cannot add to that in code so let me go ahead and do this if list equals equals null something went wrong the computer's out of memory so let's just return one and quit out of this program there's nothing to see here so just a good error check there now let me go ahead and initialize this list so list bracket zero will be one again list bracket one will be two and this bracket two will be three so that's the same kind of syntax as before and notice this equivalence recall that there's this relationship between chunks of memory and arrays and arrays are really just doing pointer arithmetic for you where the square bracket notation is so if i've asked myself here in line five for enough memory for three integers it is perfectly okay to treat it now like an array using square bracket notation because the computer will do the arithmetic for me and find the first location the second and the third if you really want to be kind of cool and hacker like well you could say list equals one list plus one equals two list plus two equals three that's the same thing using very explicit pointer arithmetic which we looked at briefly last week but this is atrocious to look at for most people it's just not very user friendly it's longer to type so most people even when allocating memory dynamically as i did a second ago would just use the more familiar notation of an array all right so let's go on now suppose uh time passes and i realize oh shoot i really wanted this array to be of size 4 instead of size 3. now obviously i could just rewind and like fix the program but suppose that this is a much larger program and i've realized at this point that i need to be able to dynamically add more things to this array for whatever reason well let me go ahead and do this let me just say all right list should actually be the result of asking for four uh chunks of memory for malloc and then i could do something like this uh list bracket three equals four now this is buggy potentially in a couple of ways but let me ask first what's really wrong first with this code the goal at hand is to start with the array of size three with the one two three and i want to add a number 4 to it so at the moment in line 17 i've asked the computer for a chunk of 4 integers just like the picture and then i'm adding the number 4 to it but i kind of have skipped a few steps and broken this somehow yeah yeah i don't necessarily know where this is going to end up in memory it's probably not going to be immediately adjacent to the previous chunk and so yes i even though i'm putting the number four there i haven't copied the one the two or the three over to this chunk of memory so well let me fix well that's actually indeed really the essence of the problem i am orphaning the original chunk of memory if you think of the picture that i drew earlier let me scroll up the line of code up here on line 5 that allocates space for the initial three integers this code is fine this code is fine but as soon as i do this i'm clobbering the value of list and saying no no don't point at this chunk of memory point at this chunk of memory at which point i've forgotten if you will where the original chunk of memory is so the right way to do something like this would be a little more involved let me go ahead and give myself a temporary variable and i'll literally call it temp tmp kind of like i did last week so that i can now ask the computer for a completely different chunk of memory of size four i'm going to again say if temp equals null i'm going to say oh bad things happened here so let me just return one and you know what just to be tidy let me free the original list before i quit because remember from last week anytime you use malloc you eventually have to use free but this chunk of code here is just a safety check if there's no more memory there's nothing to see here i'm just going to clean up my state and quit but now if i have asked for this chunk of memory now i can do this for int i get whoops for int i gets 0 i is less than 3 i plus plus what if i do something like this temp bracket i equals list bracket i that would seem to have the effect of copying all of the memory from one to the other and then i think i need to do one last thing temp bracket 3 gets the number 4 for instance again i'm kind of just hard coding the numbers for the sake of discussion after i've done this what could i now do i could now set list equals to temp and now i have updated my link list properly so let me go ahead and do this four int i gets zero i is less than four i plus plus let me go ahead and print each of these elements out with percent i using list bracket i and then i'm going to return zero just to signify that all is successful now so to recap we initialize the original array of size three and plug in the values one two three time passes and then i realized wait a minute i need more space and so i asked the computer for a second chunk of memory this one of size four just as a safety check i make sure that temp doesn't equal null because if it does i'm out of memory so i should just quit altogether but once i'm sure that it's not null i'm gonna copy all the values from the old list into the new list and then i'm going to add my new number at the end of that list and then now that i'm done playing around with this temporary variable i'm going to remember in my list variable what the address is of this new chunk of memory and then i'm going to print all of those values out so at least aesthetically when i make this new version of my list except for my missing semicolon let me try this again when i make list okay what i do this time implicitly declaring a library function malloc dot dot what's my mistake anytime you see that kind of error yeah library so up here i forgot to do include standard lib.h which is where malloc lives let me go ahead and again do make list there we go so i fixed that dot slash list and i should see one two three four but there's still a bug here does anyone see the uh bugger question oh sorry say again i forgot to free the original list and we could see this even if not just with our own eyes or intuition if i do something like valgrind of dot slash list remember our tool from this past week let me increase the size of my terminal window temporarily the output is crazy cryptic at first but notice that i have definitely lost some number of bytes here and indeed it's even pointing at the line number in which some of those bytes were lost so let me go ahead and back to my code and indeed i think what i need to do is before i clobber the value of list pointing it at this new chunk of memory instead of the old i think i now need to first proactively say free the old list of memory and then change its value so if i now do make list and do dot slash list the output is still the same and if i cross my fingers and run valgrind again after increasing my window size hopefully here oh still a bug so better it seems like less memory is lost what have i now forgotten to do i forgot to free it at the very end too because i still have a chunk of memory that i got from malloc so let me go to the very bottom of the program now and after i'm done sort of uh sort of senselessly just printing this thing out let me free the new list and now let me do make list dot slash list it still works visually now let's do valgrind of dot slash list enter and now hopefully all heap blocks were freed no leaks are possible so this is perhaps the best kind of output you can see from a tool like valgrind i used the heap but i freed all the memory as well so there were two fixes needed there or any questions then on this array-based approach the first of which is statically allocating an array so to speak by just hard-coding the number three the second version now is dynamically allocating the array using not the stack but the heap but it too suffers from the slowness we described earlier of having to copy all those values from one to the other okay uh hand was over here good question why did i not have to free the temp i essentially did eventually because temp was pointing at the chunk of four integers but on line 33 here i assigned list to be identical to what temp was pointing at and so when i finally freed the list that was the same thing as freeing temp in fact if i wanted to i could say free temp here and it would be the same but conceptually it's sort of wrong because at this point in the story i should be freeing the actual list not that temporary variable but they were the same at that point in the story good question and long story short everything we're doing thus far is still in the world of arrays the only distinction we're making is that in version one when i said int list bracket 3 close bracket that was an array of fixed size so-called statically allocated on the stack as per last week this version now is still dealing with arrays but i'm kind of flexing my muscles and using dynamic memory allocation so that i can still use an array per the first pictures we started talking about but i can at least grow the array if i want so we haven't even now solved this even better in a sense with linked list that's going to come next yeah how am i able to free list i freed the original address of list i then changed what list is storing i'm moving its arrow to a new chunk of memory and that is perfectly reasonable for me to now manipulate because now list is pointing at the same value of temp and temp is what was given the return value of malloc the second time so that chunk of memory is valid uh so oh so it's fine so these are just um you know squares on the board right there's just pointers inside of them so what i'm technically saying is i'm not pointing i'm not freeing list per se i am freeing the chunk of memory that begins at the address currently in list therefore if a few lines later i change what the address is in list totally reasonable to then touch that memory and eventually free it later because you're not freeing the variable per se you're freeing the address in the variable good distinction uh questions down here oh sorry sorry again sorry i'm having trouble hearing oh someone already answered okay so then that was fruitful all right so let me back up here and just now make one final edit so let's finish this with one final improvement here because it turns out there's a somewhat better way to actually resize an array as we've been doing here and there's another function in standard lib that's called reallock for reallocate and i'm just going to go in and make a little bit of a change here so that i can do the following let me go ahead and first comment this now just so we can keep track of what's been going on this whole time so dynamically allocate an array of size 3. assign three numbers to that array time passes allocate new array of size four copy numbers from old array into new array and add fourth number to new array free old array remember if you will new array using my same list variable and now print new array free new array hopefully that helps and we'll post this code online after two which tells a more explicit story so it turns out that we can reduce some of the labor involved with this um not so much with the printing here but with this copying turns out c does have a function called reallock that can actually handle the resizing of an array for you as follows i'm going to scroll up to where i previously allocated a new array of size 4 and i'm instead going to say this resize old array to b of size 4. now previously this wasn't necessarily possible because recall that we had painted ourselves into a corner with the example on the screen where hello world happened to be right after the original array but let me do this let me use reallock for reallocate and pass in not just the size of memory we want this time but also the address that we want to resize which again is this array called list all right the code thereafter is pretty much the same but what i don't need to do is this so reallock is a pretty handy function that will do the following if at the very beginning of class when we had one two three on the board and someone's instinct was to just plop the four right at the end of the list if there's available memory realic will just do that and boom it will just grow the array for you in the computer's memory if though it realizes sorry there's also there's already a string like hello world or something else there reallock will handle the trouble of moving that whole array from one chunk of memory originally to a new chunk of memory and then relock will return to you the address of that new chunk of memory and it will handle the process of freeing the old chunk for you so you do not need to do this yourself so in fact let me go ahead and get rid of this as well so reallock just condenses a lot of what we just did into a single function whereby realic handles it for you all right so that's the final improvement on this array-based approach but what none of this code changes is the reality that we've been trying to actually solve this problem a little differently by actually creating a different data structure altogether so before we get there why don't we go ahead here and take a five minute break and when we resume we'll now redo this code using this new technique see you in five i want won't say all right so the example code we've just seen is an implementation of starting with one an array of fixed size statically sized so to speak on the stack and that was the int numbers bracket 3 semicolon then we transitioned to dynamically allocating space for an array which was using our new friend malloc and then we demonstrated how you could resize an array once you're using malloc and in turn the heap because the heap allows you to ask for more and give back memory the stack does not let you do that you have to decide in advance with those square brackets how much memory you want we then introduced reallock which just simplifies the process of letting the library deal with the process of finding more memory for you copying the old memory into the new memory and then giving you the address of the new chunk of memory but suppose instead we want to move past mirror arrays because at the end of the day whether it's me writing the code or reallock doing the code it's going to copy all of those values back and forth and back and forth and eventually you could imagine having such a long list of numbers such a long list of contacts in your address book that there's plenty of space available like lots of oscar the grouches over here over here over here but if by bad luck there's not enough chunks of memory contiguously you could imagine the technique we just used ultimately failing that is to say there might be in total enough memory available for a bigger list but if it's not back to back to back that is contiguous you're just out of luck and so that would seem to be a shame when you could just stitch together a data structure known as instead a linked list so this linked list recall allows us to say put the number one over here the number two over here the number three over here each at their respective addresses like ox123 we proposed ox456 and ox789 and if we allocate a little more space now for each of these numbers thinking of them now not as individual numbers but nodes that term of art that describes a collection of values in memory we can now use these additional values to store the address of the next node the next node and then down here we could put just 0 aka null to indicate the end of the list but there too we can kind of abstract away like we did last week and just think of these addresses as being pointers leading like breadcrumbs from one node to another in memory so we're still using more space but now that space can be anywhere and we never again have to move the one the two or the three we can put the four the five the six or any other number wherever happens to be available in the computer's memory so if we were to build this structure up we need of course a struct to recap whereby instead of having a person today we might have a struct called node inside of which would be not only the number but also a pointer to the next node which we might draw like this but again recall that c is very literal so if you don't give this thing a name at the top it's not going to be able to compile so if we instead call this thing sort of self-referentially a struct node next can we now build up a more sophisticated node that allows us to link one to the other and where we left off then is building up a linked list as follows we redefined list as being not int star list but node star list we might initialize that initially to null so that we know that the list is initially of size zero and then if we allocate a new node to store say the number one we might use code like this give me enough space for a node store the address that comes back from malloc in a temporary variable called n which itself is of type node star because that just means it's the address of a node and then using a few different lines of code we can change what are initially garbage values to for instance the number one and recall this syntax here just means go to the address n and access its number field set it to one or more abstractly just use this new arrow notation syntactic sugar for the same that updates the picture to look like this and then we can go ahead and initialize the their oscar to just a null value by saying start at n go there and access the next field and set it equal to null and if we repeat this process a few different times we can insert more and more nodes into the list so for instance suppose that after updating the list by setting list equal to that new node n we want to actually now insert like the number two what might happen next well we might allocate code like this and now i'm going to combine it all into one expression one conditional i'm going to allocate enough space for another node i'm going to store the var the address of it in the same temporary variable i'll again call it n here a little safety check if n does not equal null that means all is well so i'm going to go ahead and update number to be 2 and next to be null using again this arrow notation but now assuming i want a sorted list how do i go about inserting this number two into that same structure not at the beginning but really at the end of this list well the picture now just looks like this i've allocated another temporary node the address of which is over here i've initialized the number field to 2 and the next field to null as indicated by the blank box here and so in english if i want to insert 2 at the end of this list what needs to point to what just to be clear what needs to point to what yeah oh the one needs to point to the two sorry the no the one needs to point to a two so we somehow need an arrow coming off of this thing pointing to the two and then at the end of the day it doesn't matter what end continues to point to because it was meant to just be temporary so how could we do this we need to somehow get at not this pointer like last time but this pointer well if list is the variable that stores the beginning of my linked list if i do list arrow something that's going to follow this first arrow this field of course is called next because this field is called number so i think if i set something like list arrow next that'll allow me to update this rectangle here to equal whatever n itself equals so let me propose code like this literally list arrow next equals n because n is the address of the new node so list arrow next means go ahead and start at the list follow the arrow go to the next field and set it equal to whatever n equals and that's why at the moment in the story we have two otherwise identical arrows pointing at the same new node this thing is temporary so it doesn't matter if eventually it disappears like if the function returns or whatever and so now the picture looks like this and let's do maybe one more insertion here suppose i wanted to insert one more number here like let's and let's skip one deliberately here what if i want to allocate um ultimately how about for this one the number three so let's not skip just yet the number three well then we might use the same code as before just changing the number this time to a three this picture now looks like this because i'm reusing n it's my temporary node i think if i continue the same volunteers logic we now want to update the two to point to the three so how might i express this well in code i could just explicitly say start it list follow the arrow and go to the next field follow another arrow go to the next field and set this equal to the value of n so you wouldn't typically write code that gets this long and stringy as we're about to see because we're going to eventually do this kind of thing in a loop or something more dynamic but for now if i'm just building this thing manually just to show you the syntax this too is correct start it list follow the arrow and go to the next field follow that arrow and go to the next field set that next field equal to the address in n and again this is starting to get a little poorly designed and if you had another arrow and another arrow you're probably doing something wrong but i'm just doing this at the moment for the sake of discussion to sort of manually stitch this thing together just like we did the same for the array a moment ago so at this point in the story you've got arrows coming off of each of these nodes including the two pointing at the three just as n is but again n is temporary eventually it's going to go away so now the picture might look like this and again the point here was just to demonstrate some of this new syntax and now let's consider some of the trade-offs we're using more memory than we used to for the array approach because now we need all this darn space for the pointers we are using you know kind of the same amount of time roughly because every time to get to the end of the list that's big o of n right to start at the left go all the way to the end and add one more node so it hasn't necessarily sped things up but if we really want to be picky we haven't bothered now reallocating memory asking the operating system for more or less and i claim that tends to be a slow operation so it'd be nice to avoid that if we can but more compellingly these nodes one two three and anything else can be anywhere in the computer's memory so even if there isn't a huge chunk of memory back to back contiguously you can just start using the memory much more efficiently in some sense because you can fill in all of the gaps so even if your memory is very fragmented with lots of variables here here here here you can use all of it ultimately by just squeezing these nodes wherever is available malloc figures out where is available but you don't have to worry about finding a bigger and a bigger and a bigger contiguous array of memory so we've still solved at least that problem or any questions then on this example here any questions or comments on how we got here no all right well let's translate then this into some similar code that allows us to build up a linked list now using code similar and spirited before but now using this new primitive so i'm going to go back into bs code here i'm going to go ahead now and delete the entirety of this old version that was entirely array-based and now inside of my main function i'm going to go ahead and first do this i'm going to first give myself a a list of size 0 and i'm going to call that node star list and i'm going to initialize that to null as we proposed earlier but i'm also now going to have to take the additional step of defining what this node is so recall that i might do something like typedef struct node inside of this struct node i'm going to have a number which i'll call number of type int and i'm going to have a structure called node with a star that says the next pointer is called next and i'm going to call this whole thing more succinctly node instead of struct node now as an aside for those of you wondering what the difference really is between struct and node technically i could do something like this not use typedef and not use the word node alone this syntax here would actually create for me a new data type called verbosely struct node and i could use this throughout my code saying struck node struct node it just gets a little tedious and it would be nicer just to refer to this thing more simplistically as a node so what type def has been doing for us is it again lets us invent our own word that's even more succinct and this just has the effect now of calling this whole thing node without the need subsequently to keep saying struct all over the place just fyi all right so now now that this thing exists in main let's go ahead and do this let's add a number to list and to do this i'm going to give myself a temporary variable i'll call it n for consistency i'm going to use malloc to give myself the size of a node just like in our slides and then i'm going to do a little safety check if n equals equals null i'm going to do the opposite of the slides i'm just going to quit out of this program because there's nothing useful to be done at this point but most likely my computer's not going to run out of memory so i'm going to assume we can keep going with some of the logic here if n does not equal null and that is it's a valid memory address i'm going to say n bracket i'm going to build this up backwards j well let's do that's okay let's go ahead and do this n bracket number equals 1 and then n bracket net or arrow next equals null and now update list to point to new node list equals n so at this point in the story we've essentially constructed what was that first picture which looks like this this is the corresponding code by which we built up this node in memory suppose now we want to add the number two to the list so let's do this again add number uh add a number to list how might i do this well i don't need to re-declare n because i can use the same temporary variables before so this time i'm just going to say n equals malloc and the size of a node i'm again going to have my safety check so if n equals equals null then let's just quit out of this all together but but but i have to be a little more careful now technically speaking what do i still need to do before i quit out of my program to be really proper free the memory that did succeed a little higher up so i think it suffices to free what is now called list way at the top all right now if all was well though let's go ahead and say n bracket number equals two and now n bracket uh sorry n arrow next equals null and now let's go ahead and add it to the list uh if i go ahead and do uh list arrow next equals n i think what we've just done is build up the equivalent now of this in the computer's memory by going to the list fields next field which is synonymous with the one node's bottom most box and store the address of what was n which a moment ago looked like this and i'm just throwing away in the picture the temporary variable all right one last thing to do let me go down here and say add a number two list n equals malloc let's do it one more time size of node and clearly in a real program we might want to start using a loop and do this dynamically or a function because it's a lot of repetition now but just to go through the syntax here this is fine if n equals equals null out of memory for some reason let's return one but but but we should return we should free the list itself and even the second node list bracket next but i've deliberately done this poorly all right this is a little more subtle now and let me get rid of the highlighting just so it's a little more visible if n happens to equal equal null and something really just went wrong there out of memory why am i freeing two addresses now and again it's not that i'm freeing those variables per se i'm freeing the addresses at in those variables but there's also a bug with my code here and it's subtle let me ask more pointedly this line here 43 what is that freeing specifically can i go to you i'm freeing not not so that's okay i'm not freeing lists two times technically i'm freeing list once and list next ones but let me just ask the more explicit question what am i freeing with line 43 at the moment which node i think node number one why because if one is at the beginning of the list list contains the address of that number one node and so this frees that node this line of code you might think now intuitively okay it's probably freeing the node number two but this is bad and this is subtle valgrind might help you catch this but by eyeing it it's not necessarily obvious you should never touch memory that you have already freed and so the fact that i did this in this order very bad because i'm telling the operating system i don't know i don't need the list address anymore do with it what you want and then literally one line later you're saying wait a minute let me actually go to that address for a moment and look at the next field of that first node it's too late you've already sort of given up control over the node so it's an easy fix in this case logically but we should be freeing the second node first and then the first one so that we're doing it in essentially reverse order and again valgrind would help you catch that but that's the kind of thing one needs to be careful about when touching memory at all you cannot touch memory after you've freed it but here is my last step let me go ahead and update the number field of next to a number field of n to b three the next note of n to be null and then just like in the slide earlier i think i can do list next next equals n and that has the effect now of building up in the computer's memory essentially this data structure very manually very pedantically like in a better world we'd have a loop and some functions that are automating this process but for now we're doing it just to play around with the syntax so at this point unfortunately suppose i want to print the numbers it's no longer as easy as int i equals 0 i less than 3 i plus plus because you cannot just do something like this because pointer arithmetic uh no longer comes into play when it's you who are stitching together the data structure in memory in all of our past examples with arrays you've been trusting that all of the bytes in the array are back to back to back so it's perfectly reasonable for the compiler and the computer to just figure out oh well if you want bracket zero that's at the beginning bracket one it's one location over bracket two it's one location over this is way less obvious now because even though you might to go to the first element in the linked list or the second or the third you can't just jump to those arithmetically by doing a bit of math instead you have to follow all of those arrows so with linked lists you can't use this square bracket notation anymore because one node might be here over here over here over here you can't just use some simple offset so i think our code's gonna have to be a little fancier and this might look scary at first but it's just an application of some of the basic definitions here let me do a for loop that actually uses a node star variable initialized to the list itself i'm going to keep doing this so long as temp does not equal null and on each iteration of this loop i'm going to update temp to be whatever temp arrow next is and i'll rewind in a moment and explain in more detail but when i print something here with printf i can still use percent i because it's still a number at the end of the day but what i want to print out is the number in this temporary variable so maybe the ugliest for loop we've ever seen because it's mixing not just the idea of a for loop which itself was a bit cryptic or weeks ago but now i'm using pointers instead of integers but i'm not violating the definition of a for loop recall that a for loop has three main things in parentheses what do you want to initialize first what condition do you want to keep checking again and again and what update do you want to make on every iteration of the loop so with that basic definition in mind this is giving me a temporary variable called temp that is initialized to the beginning of the loop so it's like pointing my finger at the number one node then i'm asking the question does temp not equal null well hopefully not because i'm pointing at a valid node that is the number one node so of course it doesn't equal null yet null won't be until we get to the end of the list so what do i do i start at this temp variable i follow the arrow and go to the number field therein what do i then do the for loop says change temp to be whatever is at temp by following the arrow and grabbing the next field that then has the result of being checked against this conditional no of course it doesn't equal null because the second node is the number two node null is still at the very end so i print out the number two next step i update temp one more time to be whatever is next that then does not yet equal null so i go ahead and print out the number three node then one last time i update temp to be whatever temp is in the next field but after one two three that last next field is null and so i break out of this for loop all together so if i do this in pictorial form all we're doing if i now use my finger to represent the temp variable i initialize temp to be whatever list is so it points here that's obviously not null so i print out whatever is that temp follow the arrow in number and i print that out then i update temp to point here then i update temp to point here then i update temp to point here wait that's null the for loop ends so again admittedly much more cryptic and then our familiar int i equals zero and so forth but it's just a different utilization of the for loop syntax yes good question how is it that i'm actually printing numbers and not printing out addresses instead the compiler is helping me here because i taught it in the very beginning of my program what a node is which looks like this here the compiler knows that a node has a number field and an x field down here in the for loop because i'm iterating using a node star pointer and not an int star pointer the compiler knows that anytime i'm pointing at something i'm pointing at the whole node doesn't matter where specifically in the rectangle i'm pointing per se it's ultimately pointing at the whole node itself and the fact that i then use temp arrow number means okay adjust your finger slightly so you're literally pointing at the number field and not the next field so that's sufficient information for the computer to distinguish the two good question other questions then on this approach here yeah in back how would i use a for loop to add elements to a linked list you will do something like this if i may in problem set five we will give you some of the scaffolding for doing this but in this coming weeks materials will we guide you to that but let me not spoil it just yet fair question though yeah ok good question is line 49 acceptable even if we freed it earlier we didn't free it in line 43 in this case right you can only reach line 49 if n does not equal null and you do not return on line 45. so that's safe i was only doing those freeing if i knew on line 45 that i'm out of here anyway at that point good question and yeah correct if you're asking about temp because it's in a for loop does that mean you don't have to free it you never have to free pointers per se you should only free addresses that were returned to you by malloc so i haven't finished the program to be fair but you're not freeing variables you're not freeing like fields you are freeing specific addresses whatever they may be so the last thing and i was kind of stalling on showing this just because it too is a little cryptic here is how you can free now a whole linked list in the world of arrays recall it was so easy you just say free list you return 0 and you're done not with a linked list because again the computer doesn't know what you have stitched together using all of these pointers all over the computer's memory you need to follow those arrows so one way to do this would be as follows while the list itself is not null so while there's a list to be freed what do i want to do i'm going to give myself a temporary variable called temp again and it's a different temp because it's in a different scope it's inside of the while loop instead of the for loop a few lines earlier i am going to initialize temp to be the address of the next node just so i can get one step ahead of things why am i doing this because now i can boldly free the list itself which does not mean the whole list again i'm freeing the address in list which is the address in the address of the number one node that's what list is it's just the address of the number one node so if i first use temp to point at the number two slightly in the middle of the picture then it is safe for me on line 61 at the moment to free list that is the address of the first node now i'm going to say all right once i freed the first list the first node in the list i can update the list itself to be literally temp and now the loop repeats so what's happening here if you think about this picture temp is initially pointing at not the list but list arrow next so temp represented by my right hand here is pointing at the number two totally safe and reasonable to free now the list itself aka the address of the number one node that has the effect of just throwing away the number one node telling the computer you can reuse that memory for you the last line of code i wrote updated list to point at the number two at which point my loop proceeded to do the exact same thing again and only once my finger is literally pointing at nowhere the null symbol will the loop by nature of a while loop as i'll toggle back to break out and there's nothing more to be freed so again what you'll see ultimately in problem set five more on that later is an opportunity to play around with just this syntax but also these ideas but again even though the syntax is admittedly pretty cryptic we're still using basics like these for loops or while loops we're just starting to now follow explicit addresses rather than letting the computer do all of the arithmetic for us as we previously benefited from at the very end of this thing i'm going to return zero as though all is well and i think then we're good to go our questions on this linked list code now and again we'll walk through this again in the coming weeks spec yeah sure can we explain this this while loop here for freeing the list so notice that first i'm just asking the obvious question is the list null because because if it is there's no work to be done however while the list is not null according to line 58 what do we want to do i want to create a temporary variable that points at the same thing that list arrow next is pointing at so what does that mean here's list list arrow next is whatever this thing is here so if my right hand represents the temporary variable i'm literally pointing at the same thing as the list is itself the next line of code recall was free the list and unlike in our world of arrays like half an hour ago where that just meant free the whole darn list you now have taken over control over the computer's memory with a linked list in ways that you didn't with the array the computer knew how to free the whole array because you malloc the whole thing at once you are now mallocking the linked list one node at a time and the operating system does not keep track of for you where all these nodes are so when you free list you are literally freeing the value of the list variable which is just this first node here then my last line of code which i'll flip back to in a second updates list to now ignore the freed memory and point at two and the story then repeats so again it's just a very pedantic way of using this new syntax of star notation and the arrow notation and the like to sort of do the equivalent of walking down all of these arrows following all of these breadcrumbs but it does take admittedly some getting used to syntax you only have to do one week but again next week in python we begin to abstract a lot of this complexity away but none of this complexity is going away it's just that someone else the authors of python for instance will have automated this kind of stuff for us the goal this week is to understand what it is we're going to get for free so to speak next week all right questions on these linked lists all right just oh yeah in back fair question let me summarize as could we have freed this with a for loop absolutely it just is a matter of style it's a little more elegant to do it in a while loop according to me but other people will reasonably disagree anything you can do with a while loop you can do with a for loop and vice versa do while loops recall are a little different but they will always do at least one thing but four loops and while loops behave the same in this case sure other questions all right well let's just vary things a little bit here just to see what some of the pitfalls might now be without getting into the weeds of code indeed we'll try to save some of that for problems at five's exploration but instead let's imagine that we want to create a list here of our own um i can offer in exchange for a few volunteers uh some foam fingers to bring to the next game perhaps could we get maybe just one volunteer first come on up you will be our linked list from the get-go what's your name pedro come on up all right thank you to pedro and if you want to just stand roughly over here but you are a null pointer so just point sort of at the ground as though you're pointing at zero all right so pedro is our linked list of size zero which pictorially might look a little something like this for consistency with our past pictures now suppose that we want to go ahead and malloc oh how about uh the number two can we get a volunteer to be on camera here okay you kind of jumped out of your seat do you want to come up okay you really want the foam finger i say all right round of applause sure okay and what's your name say again caleb caleb sorry all right so here is your number two for your number field and here is your pointer and come on let's say that there was room for caleb like right there that's perfect so caleb got malloc if you will over here so now if we want to insert caleb and the number two into this linked list well what do we need to do i already initialized you to two and pointing as you are to the ground means you're initialized to null for your next field pedro what you should do perfect what's your pedro dupe point that's fine too so pedro is now pointing at the list so now our list looks a little something like this so so far so good all is well so the first couple of these will be pretty straightforward let's insert one more if anyone really wants another foam finger here how about right in the middle come on down and just in anticipation how about let's malloc someone el okay your friends are pointing at you do you want to come down too preemptively this is a pool of memory if you will what's your name hannah all right hannah you are number four and hang there for just a moment all right so we've just uh mallocked hannah and hannah how about hannah suppose you ended up over there in just some random location all right so what should we now do if the goal is to keep these things sorted how about so pedro do you have to update yourself no no all right caleb what do you have to do okay and hannah what should you be doing i would just you're oh it's just you for now so point at the ground representing null okay so again demonstrating the fact that unlike in past weeks where we had our nice clean array back to back to back contiguously these guys are deliberately all over the stage so let's mount another how about number five what's your name jonathan all right jonathan you are our number five and pick your favorite place in memory okay all right so jonathan's now over there and hannah's over there so five we want to point hannah at number five so you of course are going to point there and where should you be pointing down to represent null as well okay so pretty straightforward but now things get a little interesting and here we'll use a chance to without the weeds of code point out how order of operations is really going to matter suppose that i next want to allocate say the number one and i want to insert the number one into this list yes this is what the code would look like but if we sort of act this out could we get one more volunteer how about on the end there in the sweater yeah come on down we have what's your name lauren okay lauren come on down and how about lauren why don't you go right in here in front if you don't mind here is your number here is your pointer so i've initialized lauren to the number one and your pointer will be null pointing at the ground uh where do you belong if we're maintaining sorted order looks like right at the beginning what should happen here okay so pedro has presumed to point now at lauren but how do you know where to point pedro's undoing what he did a moment ago so this was delivered and that was perfect that pedro presumed to point immediately at lauren why you literally just orphaned all of these folks all of these chunks of memory why because if pedro was our only variable pointing at that chunk of memory this is the danger of using pointers and dynamic memory allocation and building your own data structures the moment you point temporarily if you could to lauren i have no idea where he's pointing to i have no idea how to get back to caleb or hannah or anyone else on stage so that was bad so you did undo it so that's good i think we need lauren to make a decision first who should you point at so pointing at caleb why because you're pointing at literally who pedro is pointing at pedro now what are you safe to do good so order of operations there matters and if we had just done this line of code in red here list equals n that was like pedro's first instinct bad things happen and we orphaned the rest of the list but if we think through it logically and do this as lauren did for us instead we've now updated the list to look a little something more like this let's do one last one we got one more foam finger here for the number three how about on the end yeah you want to come down all right one final volunteer all right what's your name miriam all right so here is your number three here's your pointer if you want to go maybe in the middle of the stage in a random memory location so here too the goal is to maintain sorted order so let's ask the audience who or what number should point at whom first here so we don't screw up and orphan some of the memory and this is if we do orphan memory this is what's called again per last week a memory leak your mac your pc your phone can start to slow down if you keep asking for memory but never give it back or lose track of it so we want to get this right who should point at whom or what number say again three should point at four so three do you want to point at four and not uh so okay good and how did you know miriam whom to point at perfect okay so copying caleb why because if you look at where this list is currently constructed and you can cheat on the board here two is pointing to four if you point at whoever caleb number two is pointing at that indeed leads you to hannah for number four so now what's the next step to stitch this together our voice in the crowd two to three so two to three so caleb i think it's now safe for you to decouple because someone is already pointing at hannah we haven't orphaned anyone so now if we follow the breadcrumbs we've got pedro leading to one to two to three to four to five we need the numbers back but you can keep the foam fingers thank you to our volunteers here thank you can just put the numbers here thank you to all so this is only to say that when you start looking at the code this week and in the problem set it's going to be very easy to sort of lose sight of the force for the trees because the code does get really dense but the ideas again really do bubble up to these higher level descriptions and if you think about data structures at this level if you go off and program after a class like cs50 and you're whiteboarding something with a friend or a colleague most people think at and talk at this level and they just assume that yeah if we went back and look at our textbooks or class notes we could figure out how to implement this but the important stuff is the conversation and the ideas up here even though via this week will we get some practice with the actual code so when it comes to analyzing an algorithm like this let's consider the following what might be now the running time of operations like searching and sorting and searching and inserting into a linked list we talked about arrays earlier and we had some binary search possibilities still as soon as it's an array but as soon as we have a linked list these arrows like our volunteers could be anywhere on stage and so you can't just assume that you can jump arithmetically to the middle element to the middle element to the middle one you pretty much have to follow all of these breadcrumbs again and again so how might that inform what we see well consider this too even though i keep drawing all these pictures with all of the numbers exposed and all of us humans in the room can easily spot where the one is where the two is where the three is the computer again just like with our lockers and arrays can only see one location at a time and the key thing with a linked list is that the only address we've fundamentally been remembering is what pedro represented a moment ago he was the link to all of the other nodes and in turn each person led to the next but without pedro we would have lost some of or all of the linked list so when you start with a linked list if you want to find an element as by a search you have to do it linearly following all of the arrows following all of the pointers on the stage in order to get to the node in question and only once you hit null can you conclude yep it was there or no it was not so given that if a computer essentially can only see the number one or the number two or the number three or the number four or the number five one at a time how might we think about the running time then oops spoiler of cert it is indeed big o of n but why is that well in the worst case the number you might be looking for is all the way at the end and so obviously you're gonna have to search all of the n elements and i drew these things with boxes on top of them because again even though you and i can immediately see where the five is for instance the computer can only figure that out by starting at the beginning and going there so there too is another trade-off it would seem that overnight we have lost the ability to do a very powerful algorithm from week zero known as binary search right it's gone because there's no way in this picture to jump mathematically to the middle node unless you remember where it is and then remember where every other node is and at that point you're back to an array linked list by design only remember the next node in the list all right how about something like insert in the worst case perhaps how many steps might it take to insert something into a linked list someone else someone else yeah say again n squared fortunately it's not that bad it's not as bad as n squared that typically means doing n things n times and i think we can stay under that but not a bad bad thought yeah why would it be n okay so you're to summarize you're proposing end because to find where the thing goes you have to traverse potentially the whole list because if i'm inserting the number six or the number 99 that uh numerically belongs at the very end i can only find its location by looking for all of them at this point though in the term and really this point in the story you should start to question these kinds of very simplistic questions to be honest because the answer is almost always going to depend right if i've just got a linked list that looks like this the first question back to someone asking this question would be well does the list need to be sorted right i've drawn it as sorted and it might imply as much so that's a reasonable assumption to have made but if i don't care about maintaining sorted order i could actually insert into a linked list in constant time why i could just keep inserting into the beginning into the beginning into the beginning and even though the list is getting longer the number of steps required to insert something between the first element is not growing at all you just keep kind of inserting inserting if you want to keep it sorted though yes it's going to be indeed big o of n but again these kinds of now assumptions are going to start to matter so let's for the sake of discussion say it's big o of n if we do want to maintain sorted order but what about um in the case of not caring it might indeed be big o of one and now these are the kinds of decisions that will start to leave to you what about in the best case here if we're thinking about big omega notation then frankly we could just get lucky in the best case and the element we're looking for happens to be at the beginning or heck we just blindly insert to the beginning irrespective of the order that we want to keep things in all right so besides that how can we improve further on this design we don't need to stop at linked lists because honestly it's not been a clear win like linked lists allow us to use more of our memory because we don't need massive growing chunks of contiguous memory so that's a win but they still require big o of n time to find the end of it if we care about order we're using at least twice as much memory for the darn pointer so that seems like you know a side step it's not really a step forward so can we do better here's where we can now accelerate the story by just stipulating that hey even if you haven't used this technique yet we would seem to have an ability to stitch together pieces of memory just using pointers and anything you could imagine drawing with arrows you can implement it would seem in code so what if we leverage a second dimension instead of just stringing together things laterally left to right essentially even though they were bouncing around on the screen what if we start to leverage a second dimension here so to speak and build more interesting structures in the computer's memory well it turns out that in a computer's memory we could create a tree similar to a family tree if you've ever seen or drawn a family tree with grandparents and parents and siblings and so forth it's kind of this uh uh you know so inverted branch of a tree that grows uh typically when it's drawn downward instead of upward like a typical tree but that's something we could translate into code as well specifically let's do something called a binary search tree which is a type of tree and what i mean by this is the following notice this this is an example of an array from like week two when we first talked about those and we had the lockers on stage and recall that what was nice about an array if one it's sorted and two all of its numbers are indeed contiguous which is by definition of an array we can just do some simple math for instance if there's seven elements in this array and we do seven divided by two that's what three and a half round down through truncation that's three zero one two three that gives me the middle element arithmetically in this thing and even though i have to be careful about rounding using simple arithmetic i can very quickly with a single line of code or math fine for you the middle of the left half of the left half of the right half or whatever that's the power of arrays and that's what gave us binary search and how did binary search work well we looked at the middle and then we went left or right and then we went left or right again sort of as implied by the this color scheme here wouldn't it be nice if we somehow preserved the new upsides today of dynamic memory allocation giving ourselves the ability to just add another element add another element add another element but retain the power of binary search because log of n was much better than n certainly for large data sets right even the phone book demonstrated as much weeks ago so what if i kind of draw this same picture in two dimensions and i preserve the color scheme just so it's obvious what came where what do these things kind of look like now maybe like things we might now call nodes right a node is just a generic term for like storing some data what if the data these nodes are storing are numbers so still integers but what if we kind of connected these cleverly like an old family tree whereby every node has not one pointer now but as many as two maybe zero like in the leaves at the bottom there in green but other nodes on the interior might have as many as two like having two children so to speak and indeed the vernacular here is exactly that this would be called the root of the tree or this would be a parent with respect to these children the green ones would be grandchildren respect to these the green ones would be siblings with sorry these green ones would be siblings with respect to each other and over there too so all the same jargon you might use in the real world applies in the world of data structures and cs trees but this is interesting because i think we could build this now this kind of data structure in the computer's memory how well supposed that we defined as node to be no longer just this a no a number in a next field what if we sort of give ourselves a bit more room here and give ourselves a pointer called left and another one called right both of which is a pointer to a struct node so same idea as before but now we just make sure we think of these things as pointing this way and this way not just this way not just a single direction but two so you could imagine in code building something up like this with a node that creates in essence this diagram here but why is this compelling suppose i want to find the number three i want to search for the number three in this tree it would seem just like pedro was the beginning of our linked list in the world of trees the root so to speak is the beginning of your data structure you can retain and remember this entire tree just by pointing at the root node ultimately one variable can hang on to this whole tree so how can i find the number three well if i look at the root node and the number i'm looking for is less than notice i can go this way or if it's greater than i can go this way so i've preserved that property of the phone book or just a sorted array in general what's true over here if i'm looking for three i can go to the right of the two because that number is going to be greater if i go left it's going to be smaller instead and here's an example of actually recursion recursion in a physical sense much like the mario's pyramid which was kind of recursively defined notice this i claim this whole thing is a tree specifically a binary search tree which means every node has two or maybe one or maybe zero children but no more than two hence the buy in binary and it's the case that every left child is smaller than the root and every right child is larger than the root that definition certainly works for 2 4 and 6 but it also works recursively for every sub tree or branch of this tree notice if you think of this as the root it is indeed bigger than this left child and it's smaller than this right child and if you look even at the leaves so to speak the grandchildren here this root node is bigger than its left child if it existed so it's sort of a meaningless statement and it's it's less than its right child or it's not greater than certainly so that's meaningless too so we haven't violated the definition even for these leaves as well and so now how many steps does it take to find in the worst case any number in a binary search tree it would seem so it seems too literally and the height of this thing is actually three and so long story short especially if you're a little less comfy with your logarithms from yesteryear log base two is just like the number of times you can divide something in half and half and half until you get down to one this is kind of like a logarithm in the reverse direction here's a whole lot of elements and we're having we're having until we get down to one so the height of this tree that is to say is log base 2 of n which means that even in the worst case the number you're looking for maybe is all the way at the bottom in the leaves doesn't matter it's going to take log base 2 of n steps or log of n steps to find maximally any one of those numbers so again binary sorry binary search is back but we've paid a price right this isn't a linked list anymore it's a tree but we've gained back binary search which is pretty compelling right that's where the whole class began on making that distinction but what price have we paid to retain binary search in this new world yeah it's no uh it's no longer sorted left to right but this is a claim sorted according to the binary search tree definition where again left tree is left child is smaller than root and right child is greater than root so it is sorted but it's sorted in a two dimensional sense if you will not just one but another price paid was there a hand in back to yeah way back exactly every node now needs not one number but two three pieces of data a number and now two pointers so again there's that kind of trade-off again where well if you want to save time you've got to give something if you start giving space and you start using more space you can speed up time like you've got it there's always a price paid and it's very often in space or time or complexity or developer time uh the number of bugs you have to solve i mean all of these are sort of finite resources that you have to juggle among so if we consider now the code with which we can implement this here might be the node and how might we actually use something like this well let's take a look at maybe one final program in c here before we transition to higher level concepts ultimately let me go ahead here and let me just open a program i wrote here in advance so let me in a moment copy over a file called tree dot c which we'll have on the course's website and i'll walk you through some of the logic here that i've written for code dots for tree dot c all right so what do we have here first so here is a implementation of a binary search tree for numbers and as before i've just kind of played around and i've inserted the no the numbers manually so what's going on first here is my definition of a node for a binary search tree copied and pasted from what i proposed on the board a moment ago here are two prototypes for two functions that i'll show you in a moment that allow me to free an entire an entire tree one note at a time and that also allow me to print the tree in order so it's even though they're not sorted left to right i bet if i'm clever about what child i print first i can reconstruct the idea of printing this tree properly so how might i implement a binary search tree here's my main function here is how i might represent a tree of size 0. it's just a null pointer called tree here's how i might add a number to that list so here for instance is me mallocking space for a node storing it in a temporary variable called n here is me just doing a safety check make sure n does not equal null and then here is me initializing this node to contain the number two first then initializing the left child of that node to be null and the right child of that null node to be null and then initializing the tree itself to be equal to that particular node so at this point in the story there's just one rectangle on the screen containing the number two with no children all right let's just add manually to this a little further let's add another number to the list by mallocking another node i don't need to re-declare n as a node star because it already exists at this point here's a little safety check i'm gonna not bother with my let me do this free memory here just to be safe do i want to do this um we're going to free memory 2 which i've not done here but i'll save that for another time here i'm going to initialize the number to one i'm going to initialize the children of this no no to null and null and now i'm going to do this initialize the tree's left child to be n so what that's essentially doing here is if this is my root node the single rectangle i described a moment ago that currently has no children neither left nor right here's my new node with the number one i want it to become the new left child so that line of code on the screen there tree left equals none sorry tree left equals n is like stitching these two together with a pointer from two to the one all right the next line of co next lines of code you can probably guess are me adding another number to the list just the number three so this is a simpler tree with two one and three respectively and this code let me wave my hands is almost the same except for the fact that i'm updating the tree's right child to be this new and third node let's now run the code before looking at those two functions let me do make tree dot slash tree and voila one two three so it sounds like the data structure is sorted to your concern earlier but how did i actually print this and then eventually free the whole thing well let's look at the definition of first print tree and this is where things get kind of interesting print tree returns uh nothing so it's a void function but it takes a pointer to a root element as its sole argument nodestar root here's my safety check if root equals equals null there's obviously nothing to print just return that sort of goes without saying but here's where things get a little magical otherwise print your left child then print your own number then print your right child what is this an example of even though it's not mentioned by name here what programming technique here yeah so this is actually perhaps the most compelling use of recursion yet it wasn't really that compelling with the mario thing because we had such an easy implementation with a for loop weeks ago but here is kind of a perfect application of recursion where your data structure itself is recursive right if you take any snip of any branch it all still looks like a tree just a smaller one that lends itself to recursion so here is this leap of faith where i say uh print my left tree or my left sub-tree if you will via my child at the left then i'll print my own root node here in the middle then go ahead and print my right subtree and because we have this base case that just makes sure that if the tree the root is null there's nothing to do you're not going to recurse infinitely you're not going to call yourself again and again and again infinitely many times so it just kind of works out and prints the one the two and the three and notice what we could do too if you wanted to print the tree in reverse order you could do that print your right tree first the greater element then yourself then your smaller subtree and if i do make tree here and dot slash tree while on now i've reversed the order of the list and that's pretty cool you could do it with a for loop in an array but you can also do it even with this two-dimensional structure let's lastly look just at this free tree function and this one's almost the same order doesn't matter in quite the same way but it does still matter here's what i did with free tree well if the root of the tree is null there's obviously nothing to do just return otherwise go ahead and free your left child and all of its descendants then free your right child and all of its descendants and then free yourself and again free literally just freeze the address in that variable doesn't free the whole darn thing it just freeze literally what's at that address why was it important that i did line 72 last though why did i free the left child and the right child before i freed myself so to speak exactly if you free yourself first if i had done incorrectly this line higher up you're not allowed to touch the left child subtree or the right child subtree because the memory address is no longer valid at that point you would get some kind of memory error perhaps the program would crash valgrind definitely wouldn't like it bad things would otherwise happen but here then is an example of recursion and again just a recursive use of an actual data structure and what's even cooler here is relatively speaking suppose we wanted to search something like this binary search actually gets pretty straightforward to implement too for instance here might be the prototype for a search function for a binary search tree you give me the the root of a tree and you give me a number i'm looking for and i can pretty easily now return true if it's in there or false if it's not how well let's first ask a question if if if battery died okay so if there we go if tree equals equals null then you just return false because if there's no tree there's no number so it's obviously not there return false else if the number you're looking for is less than the tree's own number which direction should we go okay left how do we express that well let's just return the answer to this question search the left sub-tree by way of my left child looking for the same number and you just assume through the beauty of recursion that ah you're kicking the can and let yourself figure it out with a smaller problem just that snipped left tree instead else if the number you're looking for is greater than the tree's own number go to the right as you might infer so i can just return the answer to this question search my right subtree for that same number and there's a fourth and final condition what's the fourth scenario we have to consider explicitly yeah if the number itself is right there so else if the number i'm looking for equals the tree's own number then and only then should you return true and if you're you're thinking quickly here there's an optimization possible better design opportunity think back to even our scratch days what could we do a little better here you're pointing at it exactly and else suffices because if there's logically only four things that could happen you're wasting your time by asking a fourth gratuitous question and else here suffices so here too more so than the mario example a few weeks ago there's just this elegance arguably to recursion and that's it this is not pseudo code this is the code for binary cert on a binary search tree and so recursion tends to work in lockstep with these kinds of data structures that have this kind of structure to them as we're seeing here are any questions then on binary search as implemented here with a tree yeah uh good question so uh when returning a boolean value true and false are values that are defined in a library called standard bool s-t-d-b-o-o-l dot h with the header file that you can use um it is the case that true is probable uh it's not well defined what they are but they would map indeed yes to zero and one essentially but you should not compare them explicitly to zero and one when you're using true and false you should compare them to each other ah sorry so if i am in my own code from earlier in a void function it is totally fine to return you just can't return something explicitly so return just means that's it quit out of this function you're not actually handing back a value so it's a way of short circuiting the execution if you don't like that and some people do frown upon having code returned from functions prematurely you could invert the logic and do something like this if the root does not equal null do all of these things and then indent all three of these lines underneath that's perfectly fine too i happen to write it the other way just so that there was explicitly a base case that i could point to on the screen whereas now it's kind of implicitly there for us only but a good observation too all right so let's ask the question as before about like running time of this it would look like binary search is back and we can now do things in logarithmic time but we should be careful is this a binary search tree just to be clear and again a binary search tree is a tree where the root is greater than its left child and smaller than its right child that's the essence so you're shake you're nodding your head you agree okay i agree so this is a binary search tree is this a binary search tree okay i'm hearing yeses or i'm hearing just my delay changing the vote it would seem so this is kind of one of those trick questions this is a binary search tree because i've not violated the definition of what i gave you right is there any example of a left child that is greater than its parent or is there any example of a right child that's smaller than its parent that's just the opposite way of describing the same thing no this is a binary search tree unfortunately it also looks like albeit at a different axis what a linked list but you could imagine this happening right suppose that i hadn't been as thoughtful as i was earlier by inserting two and then one and then three which kind of nicely balanced everything out suppose that instead because of what the user's typing in or whatever you can drive in your own code suppose you insert a one and then a two and then a three like you've kind of created a problem for yourself because if we follow the same logic as before going left or going right this is how you might implement a binary search tree accidentally if you just blindly keep following that definition i mean this would be better designed as what if we kind of like rotated the whole thing around and that's totally fine and those kinds of trees actually have names there's trees called avl trees in computer science there are red black trees in computer science there are other types of trees that additionally add some logic that tell you when you got to pivot the thing and rotate it and kind of snip off the root and fix things in this way but a binary search tree in and of itself does not guarantee that it will be balanced so to speak and so if you consider the worst case scenario of even using a binary search tree if you're not smart about the code you're writing and you just blindly follow this definition you might accidentally create a crazy long and stringy binary search tree that essentially looks like a linked list because you're not even using any of the left children so unfortunately the literal answer to the question here is what's the running time of search well hopefully log n but not if you don't maintain the balance of the tree both insert and search could actually devolve into instead of big o of log n literally big o of n if you don't somehow take into account and we're not going to do the code for that here sort of a higher level thing you might explore beyond down the road it can devolve into something that you might not have intended and so now that we're talking about two dimensions it's really the onus is on the programmer to consider what kinds of perverse situations might happen where the thing devolves into a structure that you don't actually want it to devolve into all right any questions then on these binary search trees no all right we've got just a few structures to go let's go ahead and take one more five minute break here when we come back we'll talk at this level about some final applications of this see you in five all right so we are back and as promised we'll sort of operate now at this higher level where if we take for granted that even though we haven't had an opportunity to play with these techniques yet you have the ability now in code to kind of stitch things together both in a wonder dimension and even two dimensions to build things like lists and trees so if we have these building blocks things like now arrays and lists and trees what if we start to kind of amalgamate them such that we build things out of multiple data structures can we start to get some of the best of both worlds by way of for instance something called a hash table so a hash table is sort of a swiss army knife of data structures in that it's so commonly used because it allows you to associate keys with values so to speak so for instance it allows you to associate a username with a password or a name with a number or anything where you have to take something as input and get as output a corresponding piece of information a hash table is often a data structure of choice and here's what it looks like it's actually looks like an array at first glance but for discussion sake i've drawn this array vertically which is totally fine it's still just an array but it allows you a hash table to jump to any of these locations randomly that is instantly so for instance there's actually 26 locations in this array because i want to for instance store initially uh names uh of people for instance and wouldn't it be nice if the person's name starts with a i have a go-to place for it maybe the first box and if it starts with z i put them at the bottom so that i can jump instantly arithmetically using a little bit of ascii or unicode fanciness exactly to the location that they wanna they need to go so for instance here's our array 0 index 0 through 25. if i think of this though as a through z i'm going to think of these 26 locations now in the context of a hash table is what we'll generally call buckets so buckets into which you can put values so for instance suppose that we want to insert a value one name into this data structure and that name is say albus so albus starting with a albus might be go at the very beginning of this list all right and then we want to insert another name this one happens to be zacharias starting with z so it goes all the way at the end of this data structure in location 25 aka z and then maybe a third name like hermione and that goes at location h according to that position in the alphabet so this is great because in constant time i can insert and conversely search for any of these names based on the first letter of their name a or z or h in this case let's fast forward and assume we put a whole bunch of other names that might look familiar into this hash table it's great because every name has its own location but if you're thinking of names you don't yet see it on the screen we eventually encounter a problem with this right when could something go wrong using a hash table like this if we wanted to insert even more names what's going to eventually happen yeah there's already someone with the first letter right like i haven't even mentioned harry for instance or hagrid and yet hermione's already using that spot so that sort of invites the question well what happens maybe if we want to insert harry next do we maybe cheat and put him at location i but then if there's some location i where do we put them and it just feels like the situation could very quickly devolve but i've deliberately drawn this data structure that i claim as a hash table sort of in two directions an array vertically here but what might this be hinting i'm using horizontally even though i'm drawing the rectangles a little differently from before maybe another array to be fair but honestly arrays are such a pain with the allocating and reallocating and so forth these kind of look like the beginnings of a linked list if you will where the name is where the number used to be even though i'm drawing it horizontally now just for discussion's sake and this seems to be like a pointer that isn't pointing anywhere yet but it looks like the array is 26 pointers some of which are null that is empty some of which are pointing at the first node in a linked list so that's really what a hash table might be in your mind an amalgam of a an array whose elements are linked lists and in theory this kind of gives you the best of both worlds right you get random access with high probability right you get to jump immediately to the location you want to put someone but if you run into this perverse situation where there's someone already there okay fine it starts to devolve into a linked list but it's at least 26 smaller length lists not one massive linked list which would be big o of n and quite slow to solve so if harry gets inserted and hagrid yeah you have to kind of chain them together so to speak in this way but at least you're not you've not painted yourself into a corner and in fact if we fast forward and put a whole bunch of familiar names in the data structure starts to look like this so the chains not terribly long and some of them are actually of size 0 because there's just some unpopular letters of the alphabet among these names but it seems better than just putting everyone in one big array or one big linked list we're sort of trying to balance these trade-offs a little bit in the middle here well how might we represent something like this here's how we could describe this thing a node in the context of a linked list could be this i have an array called word of type char and it's big enough to fit the longest word in the alphabet plus one and the plus one why probably the null character so i'm assuming that longest word is like a constant defined elsewhere in the story and it's something big like 40 100 whatever whatever the longest word in the uh harry potter universe is or the english alphabet or english dictionary is longest word plus one should be sufficient to store any name in the story here and then what else does each of these nodes have well it has um a pointer to another node so here's how we might implement the notion of a node in the context of storing not integers but names instead like this but how do we decide what the hash table itself is well if we now have a definition of a node we could have a variable in main or even globally called hash table that itself is an array of node star pointers that is an array of pointers to nodes the beginnings of linked lists number of buckets is kind of up to me i proposed verbally that it be 26 but honestly if you get a lot of collisions so to speak a lot of h names trying to go to the same place well maybe we need to be smarter and not just look at the first letter of their name but maybe the first and the second so it's h a and h e but wait no then harry and hagrid still collide but we start to at least make the problem a little less impactful by tinkering with something like the number of buckets in a hash table like this but how do we decide where someone goes in a hash table in this way well it's an old school problem of input and output the input to the problem is going to be something like the name and the algorithm in the middle as of today is going to be something called a hash function a hash function is generally something that takes as input a string a number whatever and produces this output a location in our context like a number 0 through 25 or 0 through 16 000 or whatever the number of buckets you want is it's going to just tell you where to put that input at a specific location so for instance albus according to the story thus far gave me back zero as output zacharias gave me 25. so the hash function in the middle of that black box is pretty simplistic in this story it's just looking at like the ascii value it seems of the first letter in their name and then subtracting off what capital a is 65 so like doing some math to get back a number between zero and 25. so that's how we got to this point in the story and how might we then resolve the problem further and use this notion of hashing more generally well just for demonstration sake here here's actually some buckets literally and we've labeled and advanced these buckets with the suits from a deck of cards so we've got some spades and we've got diamonds here and we've got what else here uh clubs and hearts so we have a deck of cards here for instance right and this is something you yourself might do instinctively if you're sort of getting ready to start playing a game of cards you're just kind of cleaning up or you want things in order like here is literally a jumbo deck of cards what would be the easiest way for me to sort these things well we've got a whole bunch of sorting algorithms from the past so i could go through like here's the three of diamonds and i could here let me throw this up on the screen just so if you're far and back so here's uh you know diamonds i could put this here three four i could do this in order here but a lot of us honestly if given a deck of cards and you just want to kind of clean it up and sort it in order you might do things like this well here's my input three of diamonds let's put it in this bucket four of diamonds this bucket five of diamonds this bucket and if you keep going through the cards here's seven of hearts hearts bucket eight bucket uh queen of spades over here and it's still going to take you 52 steps but at the end of it you have hashed all of the cards into four distinct buckets and now you have problems of size 13 which is a little more tenable than doing one massive 52 card problem you can now do four 13 size problems and so hashing is something that even you and i might do instinctively taking as input some card some name and producing as output some location a sort of temporary pile in which you want to uh stage things so to speak but these collisions are kind of inevitable and honestly if we kept going through the harry potter universe some of these chains would get longer and longer and longer which means that instead of getting someone's name quickly by searching for them or inserting them might start taking a decent amount of time so what could we do instead to resolve situations like this if the problem fundamentally is that the first letter is just too darn popular h we need to take in more input not just the first letter but maybe the first two letters so if we do that we can go from a through z to something more extreme like maybe h a h b h c h d h e h f and so forth so that now harry and hermione end up at different locations but you know darn it hagrid still collides with harry so it's better than before the chains aren't quite as long but the problem isn't fundamentally gone and in this case here anyone know how many buckets we just increased to if we now look at not just a through z but a a through z z roughly yeah so okay good so these answer 2 26 squared or 676 so that's a lot more buckets and this is why i only showed a few of them on the screen so that's a lot more and it spreads things out specific uh in particular what if we take this one step further instead of h a we do like h a a h a b h a c h z z and so forth well now we have an even better situation because hermione has her one spot harry has his one spot hagrid has his one spot but there's a trade-off here the upside is now arithmetically we can find their locations in constant time maybe technically three steps but three is constant no matter how many other names are in here it would seem but what's the downside here sorry say again memory so significantly more we're now up to seventeen 17 576 buckets which itself isn't that big a deal right computers have a lot of memory these days but as you can kind of infer you know i can't really think of someone whose name started with heq for instance in the harry potter universe and if we keep going definitely don't know of anyone whose name started with zzz or aaa there's a lot of sort of not useful combinations that have to be there mathematically so that you can do a bit of math and jump 2 randomly so to speak the precise location but they're just going to be empty so it's a very sparsely populated array so to speak so what does that really mean for performance ultimately well let's consider again in the context of our big o notation it turns out that a hash table technically speaking is still just going to give us big o of n in the worst case why if you have some crazy perverse case where everyone in the universe has a name that starts with a or starts with h or starts with z you just get really unlucky and your chain is massively long well then at that point it's just a linked list it's not a hash table it's like the perverse situation with the tree where if you insert it without any mind for balancing it keeping it balanced it just evolves but there's a difference here between sort of a theoretical performance and an actual performance if you look back at the tree the hash table here this is absolutely in practice going to be faster than a single length list you know mathematically asymptotically big o notation sure it's all the same big o of n but if what we're really caring about is real humans using our software there's something to be said for crafting a data structure that technically if this data were uniformly distributed is 26 times faster than a linked list alone and so there's this tension too in between like systems uh types of cs and theoretical cs where yeah theoretically these are all the same but in practice for making real world software you know improving this speed by a factor of 26 in this case let alone 576 or more might actually make a big difference but there's going to be a trade-off and that's typically some other resource like giving up more space all right how about another data structure we could build let me fast forward to something here called a try so a try sort of a weird name and pronunciation short for retrieval pronounced try typically a try is a tree that actually gives us constant time lookup even for massive data sets what do i mean by this in the world of a try you create a tree out of arrays so we're really getting into like the frankenstein territory just building things up with like spare parts of data structures that we have here but the root of a try is itself an array for instance of size 26 where each element in that try points to another node which is to say another array and each of those locations in the array represents a letter of the alphabet like a through z so for instance if you wanted to store the names of the harry potter universe not in a hash table not in a linked list not in a tree but in a try what you would do is hash on every letter in the person's name one at a time so a try is like a multi-tier hash table in a sense where you first look at the first letter then the second letter then the third and you do the following for instance each of these locations represents a letter a through z suppose i wanted to insert someone's name into this that starts with the letter a h like hagrid for instance well i go to the location h i see it's null which means i need to malloc myself another node or another array and that's depicted here then suppose i want to store the second letter in hagrid's name in a so i go to that location in the second node and i see okay it's currently null there's nothing below it so i allocate another node using malloc or the light and now i have h a g and i continue this with r i d and then when i get to the bottom of this person's name i just have to indicate here in color but probably with a boolean value or something like a true value that says a name stops here so that it's clear that the person's name is not h-a-h-a or h-a-g or h-a-r h-a-g-r it's h-a-g-r-i-d and the d is green just to indicate there's like some other boolean value that just says yes this is the node in which the name stops and if i continue this logic here's how i might insert someone like harry and here's how i might insert someone like hermione and what's interesting about the design here is that some of these names share a common prefix which starts to get compelling because you're reusing space you're using the same nodes for names like h a g and h a r because they share h and an a in common and they all share an h in common so you have this data structure now that itself is a tree each node in the tree is itself an array and we therefore might implement this thing using code like this every node is containing i'll do it in reverse order a an array i'll call it children because that's what it really represents up to 26 children for each of these nodes size of the alphabet so i might have used the constant for number 26 to give myself 26 letters of the alphabet and each of those arrays stores that many node stars that many pointers to another node and here's an example of the bull this is what i represented in green on the slide a moment ago i also need another piece of data just a zero or one a true or false that says yes a name stops in this node or it's just a path to the rest of the person's name but the upside of this is that the height of this tree is only as tall as the person's longest name h-a-g-r-i-d or h-e-r-m-i-o-n-e and notice that no matter how many other people are in this data structure there's three at the moment if there were three million it would still take me how many steps to search for hermione h e h-e-r-m-i-o-n-e so eight steps total no matter if there's two other people two million ten million other people because the path to her name is always on the same path and if you assume that there's uh there's a maximum limit on the length of names in the human world maybe it's 40 100 whatever whatever the longest name in the world is that's constant maybe it's 40 100 but that's constant which is to say that with a try technically speaking it is the case that your lookup time big o of n a big o notation would be big o of one it's constant time because unlike every other data structure we've looked at with a try the amount of time it takes you to find one person or insert one person is completely independent of how many other pieces of data are already in the data structure and this holds true even if one name is a prefix of another i don't think there was a daniel or danielle in the harry potter universe that i could think of but d-a-n-i-e-l could be one name and therefore we have a true there in green and if there's a longer name like danielle then you keep going until you get to the e so you can still have with a try one name that's a substring of another name so it's not as though we've created a problem there that too is still possible but at the end of the day it only takes a finite number of steps to find any of these people and again that's what's particularly compelling that you effectively have constant time look up so that's amazing right we've gone through this whole story for weeks now of like linear time and then it went up to like n squared and then log n and now constant time what's the price paid for a data structure like this this so-called try what's the downside dude there's got to be a catch and in fact tries are not actually used that often amazing as they might sound on some cs level here memory what would why in what sense exactly if you're storing all of these darn arrays it's again a sparse sparsely populated data structure and you can kind of see it here grant that there's only three names but most of those boxes most of those pointers are going to remain null so this is an incredibly wide data structure if you will it uses a huge amount of memory to store the names but again you got to pick a lane either you're going to minimize space or you're going to minimize time it's not really possible to get truly the best of both worlds you have to decide where the inflection point is for the device you're writing software for how much memory it has how much expensive it is and again taking all of these kinds of things into an account so lastly let's do one further abstraction so even higher level to discuss something that are generally known as abstract data structures it turns out we could spend like all day all week talking about different things we could build with these data structures but for the most part now that we have arrays now that we have linked lists or their cousins trees which are two dimensional and beyond that there's even graphs where the arrows can go in multiple directions not just down so to speak now that we have this ability to stitch things together we can solve all different types of problems so for instance a very common type of data structure to use in a program or even our human world are things called cues a queue being a data structure like a line outside of a store where it has what's called a fifo property first in first out which is great for fairness at least in the human world and if you've ever waited outside of tasty burger or salsa fresca or some other restaurant nearby presumably if you're queuing up at the counter you want the store to maintain a fifo system first and first out so that whoever's first in line gets their food first and gets out first so a fight a cue is actually a computer science term too and even if you're still in the habit of printing things on paper there are things you might have heard called printer cues which also do things in order the first person to send their essay to the printer should ideally be printed before the last person to send their essay to the printer again in the interest of fairness but how can you implement a cue well you typically have to implement like two fundamental operations nq and dq so adding something to it and removing something from it and the interesting thing here is that how do you implement a cue well in the human world you would just have like literally physical space for humans to line up from left to right or right to left same in a computer like a printer queue if you send a whole bunch of jobs to be printed a whole bunch of essays or documents well you need a chunk of memory like an array all right well if you use an array what's a problem that could happen in the world of printing for instance if you use an array to store all of the documents that need to be printed it could be filled right so if the programmer decided hp or whoever makes the printer decides oh you can send like a megabyte worth of documents to this printer at once at some point you might get an error message which says sorry out of memory wait a few minutes which is maybe a reasonable solution but a little annoy or hp could write code that maybe dynamically resizes the array or so forth but at that point maybe they should just use a linked list and they could so there too you could implement the notion of a queue using a linked list instead you're going to spend more memory but you're not going to run out of space in your array which might be more compelling you know this happens even in the physical world you go to the store and you know you start having to line up outside and down the road and like for a really busy store they kind of run out of space so they they make do but in that case it tends to be more of an array just because of the physical notion of humans lining up but there's other data structures too if you've ever gone to the dining hall and picked up like a harvard or a yale tray right you're typically picking up the last tray that was just cleaned not the first tray that was cleaned why because these uh cafeteria trays stack up on top of each other and indeed a stack is another type of abstract data structure in the physical world it's literally something physical like a stack of trays which have what we would call a lifo property last in first out so as these things come out of the washer they're putting the most recent ones on the top and then you the human are probably taking the most recently cleaned one which means in the extreme no one on campus might ever use that very first tray which is probably fine in the world of trays but would really be bad in the world of like tasty burger lining up for food if lifo were the property being implemented but here too it could be an array it could be a linked list and you see this honestly every day if you're using gmail and your gmail inbox that is actually kind of a stack at least by default where your newest message last in are the first ones at the top of the screen that's kind of a lifo data structure and it means that you see your most recent emails but if you have a busy day you're getting a lot of emails it might not be a good thing because now you're kind of ignoring the people who wrote you way earlier in the day or the week so lifo and fifo are just properties that you can achieve with these very specific types of data structures and the parlance in the world of stacks is to push something onto a stack or pop something out these are here for instance as an example of like why might you always wear the same color well if you're storing all of your clothes in a stack you might not ever get to like the different colored clothes at the bottom of the list and in fact to paint this picture we have a a couple minute uh video here just to paint this here made by a faculty member elsewhere let's go ahead and dim the lights for just a minute or two here so that we can take a look at jack learning some facts once upon a time there was a guy named jack when it came to making friends jack did not have the neck so jack went to talk to the most popular guy he knew he went up the loo and asked what do i do lou saw that his friend was really distressed well lou began just look how you're dressed don't you have any clothes with a different look yes said jack i sure do come to my house and i'll show them to you so they went off the jacks and jack showed lou the box where he kept all his shirts and his pants and his socks luce said i see you have all your clothes in a pile why don't you wear some others once in a while jack said well when i remove clothes and socks i wash them and put them away in the box then comes the next morning and up i hop i go to the box and get my clothes off the top lou quickly realized the problem with jack he kept clothes cds and books in a stack when he reached for something to read or to wear he chose a top book or underwear then when he was done he would put it right back back it would go on top of the stack i know the solution said a triumphant loo you need to learn to start using a queue lou took jack's clothes and hung them in a closet and when he had emptied the box he just tossed it then he said now jack at the end of the day put your clothes on a left when you put them away then tomorrow morning when you see the sun shine get you close from the right from the end of the line don't you see said lou it will be so nice you'll wear everything once before you wear something twice and with everything in queues in his closet and shelf jack started to feel quite sure of himself all thanks to lou and his wonderful cue just to help you realize that these things are everywhere in the world even in our human world if you've ever lined up at this place anyone recognize this okay so sweet green a little salad place in the square this is if you order online or in advance your food ends up according to the first letter in your name which actually sounds awfully reminiscent of something like a hash table and in fact no matter whether you implement a hash table like we did with an array and linked lists or with like three shelves like this this is actually an abstract data type called a dictionary in a dictionary just like in our human world has keys and values words and their definitions this just has uh letters of the alphabet and salads as their value but here too there's a real world constraint at what in what kind of scenario does this system at sweet green devolve into a problem for instance because they too are using only finite space spinite storage what could go wrong yeah yeah if they run out of space on the shelf and there's a lot of people whose names start with d or e or whatever and so they just pile up and then maybe they kind of overflow into the e's or the f's and you know they probably don't really care because any human is going to come by and just eyeball and figure it out anyway but in the world of a computer you're the one coding and have to be ever so precise we thought we would lastly do one final thing here um in advance we prepared a a linked list of sorts in the audience since this has become a bit of a thing i am starting to represent the beginning of this linked list and so far as i have a pointer here with seat location g9 uh whoever's in g9 would you mind standing up and what letter is on your sheet there okay so you have s15 and your letter say again f15 so i see you're holding a c in your node you are pointing to if you could physically f15 f15 what do you hold you have an s and who should you be pointing at f5 could you stand up f5 you're holding uh a five i see what would address f12 big finale f12 if you'd like to stand up holding a zero and null which means that was cs50 all right we'll see you next time you
Info
Channel: CS50
Views: 20,409
Rating: 4.973856 out of 5
Keywords: cs50, harvard, computer, science, david, j., malan
Id: F1N5JAayRo4
Channel Id: undefined
Length: 172min 5sec (10325 seconds)
Published: Mon Oct 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.