GTAC 2.6: Implementing a Graph Data Structure in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you probably only have to work with graphs for a very short time before you immediately start thinking about algorithms on graphs and there's a couple reasons for this one is that many of the basic facts that you're going to prove in graph theory are proven by actual constructions and those constructions are algorithms and then you prove the correctness of the algorithm to prove the correctness or the proof the truth of a of a theorem and so algorithms come up all the time um but the other way they come up is that well there are many uh concepts in computer science that are naturally modeled by graphs and you immediately end up with graph algorithms when you try to solve problems related to whatever those concepts might be so um there are a lot of algorithmic problems and if you want to get at least started actually writing code to solve some of these problems the first thing you're going to need is data structure so let me walk through i'm going to do kind of a live coding um uh it somewhat prepared but it'll somewhat off the top of my head here just kind of how you might go about constructing a data structure for a graph that more it's pretty efficient for representing the main ideas we've seen so far in graph theory and this is also going to give us a good way to solidify our main definitions so you really know you understand the definitions if you can implement them in some programming language and we're going to use python here so let me pull up a code window all right all right so this is python so i'm going to make a class for our graph so if i'm going to make a data structure generally i want it as a class and i need to initialize it and well the definition of a graph is just what a set of vertices and a set of pairs of vertices so instead of vertices and edges so let's just pass vertices and edges to my graph and before i actually even write out how what i will do inside it let's um let's get an idea of what we're actually going to do and do with it when we have it right so if you write code to use the code before you actually write the code that you want to use this is a great way to get an idea of what code you actually need to write so i'm just putting this little standard python idiom if name equals equals main this way if we if we like this library and we want to import it later this code will not get run it will get run when i run it here so what i'd like to do is i'd like to be able to create a graph just like this and pass it some uh vertices like one two three and some edges and i guess the edges will be like tuples is the easiest way to express some pairs so one two and two three so this is a graph with three vertices and two edges it's actually just a path of length two okay and i can run this and it ran which is maybe not surprising but it's also not so interesting let me see if i can get the bottom here to show can you see that there we go so you can see the bottom there it says it finished in 0.05 seconds so great it ran this code but it doesn't really do anything interesting yet all right so i want to store the vertices and edges so the simplest thing i could do is something like this [Music] this is not so good let me give you a couple reasons one is that um well the vertex set might not actually be a set um someone could have given me a list here in fact i should be okay with that like if it gives me a list i should be okay with this too um but i kind of want the internal represent representation to be a set because if i want to add a vertex or move a vertex i just want to or even just tell if a vertex is in the graph i won't be able to do that quickly and so having a set makes the most sense so the only assumption i'm going to make is that it gave me some iterable collection right something i could loop over and so that can turn that into a set this way and i you might think i'd want to do the same thing for e but you know there's a bigger problem with the edge set because the edges are actually supposed to be unordered pairs and what i gave it was ordered pairs now you might think oh we'll just replace those tuples those parentheses with curly braces and make them sets if you think that's a good idea i can do it here but many of you know that i'm going to get a little bit of i'm going to run into some trouble here because when i run that it gives me a syntax error oh no syntax error that's not the one i expected i expected a whole different error here we go unhashable type set right you can't have a set of sets so let's back that up let's go back here um to this set of tuples but we're going to have to kind of uh turn those ordered pairs into unordered pairs but we still want them in a set so there is a way to do this you just make a frozen set which is a python data structure that you know that store is a set but since the set can't be changed it's immutable it can also be hashable and so you can put frozen sets inside a set okay so um this will this will not work yet actually i think if i run it i'll get an error because it says i passed too many arguments right because this has to be some kind of uh some kind of iterable collection itself and i run it okay there we go so what i've done here is that i gave it some vertices some edges and it just took the vertices and edges and sort of repackage them up internally which is a good thing to do generally so if i have a a graph like the easiest kind of simplest questions you might ask to be able to like the information you'd like to be able to get out of the graph might be something like maybe the degree of a vertex let's try this so let's implement degree so i want to know the degree of a vertex v and right now i think what i'd have to do is something weird right like i have to iterate over all the edges and add up the ones that contain v right and if i wanna um so i can like add one [Music] for uh i guess for edges in self dot e if uh if v is in that edge this is like uh is that too fancy that might be too fancy and i i feel okay writing this kind of weird um fast uh one-liners that they're a little bit cryptic um because we're gonna throw that away because it's actually not such a good way to to implement this but let's do it as a first cut those slightly just make sure see if it works what do we expect to see if i get the degree of vertex 2 i think i expect it to print a 2 here let's see it does good all right so what about degree of one [Music] yep prints that and degree of three this should also be one and there you go all right so it seems like it works but like i said it's not so nice there's it's not just not nice because uh i've written a somewhat cryptic one-liner here it's also not nice because in order to get the degree i actually had to loop over every edge in the entire graph which is obviously a bad thing um so if i wanted to do this quickly i probably want to represent things a little differently internally and you know many of the things we do with graphs because the graph is is representing local connections we often just want to make local jumps like we'd like to take a vertex and ask what other vertices are adjacent to that vertex that is we want to know the neighbors of a vertex and if we could just get the neighbors then maybe we could easily get the degree right we could just count them and this turns out to be a really nice way to represent the graph so let me just add this in so even though i initialized the graph with just a vertex set and an edge set internally i'm going to store all these adjacencies and the way i'll do it is i'll make a little dictionary and the dictionary is going to map vertices to sets and every vertex will be mapped to its neighbor set so a set of neighbors so that's just a dictionary like this and so for each one of the vertices and i guess well really what i want is that for each one of the vertices i'm going to initialize this neighbors v it's just going to be initialized to be an empty set and and then for every one of the edges i'm going to have to add two entries into this dictionary right so um where i'm going to add for every for each end of the of the edge i'm going to add the other one into its neighbor set so it looks something like this right so for view v in the edges um we're going to have self.neighbors view add v and self dot neighbors v add u right so the neighbors of u is a set and i've added v to it and the neighbors of v is a set and i added u to it so i have um here then my um new representation here and now i think i can simplify this to be something more like um let's just return the length of that set so um if i run it okay i got one that's what i expected in fact if these are expectations let me just write them as assertions so i'm going to assert that the degree of of 1 should be equal to 1 cert degree of 2 should be 2 and this one assert that the degree of 3 should also be back to 1. and i like to do the following thing if i put a bunch of assertions in here really i should wrap them up into some tests or something but i like to print something so i know that they actually got through all the assertions and it actually ran and so everything seems to work now okay so this is actually a very basic and functional completely functional uh implementation of a graph and you can get pretty far with this actually um although um in terms of using it from the outside that is like if i create an instance of a graph and i want to then run other algorithms on it it would be pretty handy to have access to the neighbors right now i put the little underscore there because that data structure now is kind of internal to the class it's private as far as things can be private in python and uh but i wanna i wanna give access to that without actually giving access to the data structure so what i'll do is i'll just actually add a method here so i'll call it neighbors and so if i want the neighbors of a vertex it can return could just return the set of neighbors um this makes me a little uncomfortable i kind of am a very private person when it comes to my data structures and i kind of prefer that if i if a user the data structure shouldn't be able to mess around with the internals so i might actually instead of giving them the set i might just give them an iterator over the set so they can't actually change anything um it's sometimes for their own protection okay and now i'm going to also i want to clean this up a little because we have some operations that we would want to do on a graph like building it one step at a time like adding one vertex at a time or adding one edge at a time and um it's pretty clear to me at least that as soon as i have this method even before i've written any other code i have a use for it so if i want to add a vertex va to a graph then i know that i'm just going to have to add it into this vertex set um so it would be like self.v dot add v big v to l little v is being added to the big v and then i need uh to create um some neighbors for it right so this is what we did here and um and i could do the following now is that instead of doing this for every vertex that got passed in i could just add the vertex okay all right so i this almost works but there's something a little scary about it as well because you know someone might add the same vertex twice um and if they later add a vertex again um i kind of don't want it to destroy like eliminate all the neighbors that were already stored in this dictionary so i kind of want to make sure i don't overwrite things maybe we'll get to that but there's something else awkward about this so i have a set with all the vertices in it but i also have a dictionary whose keys are all the vertices so um this is really redundant i actually don't need both if i want the vertices i would just look at the keys in this neighbor dictionary so i can get rid of this which seems a little nuts because uh man if i'm going to store the vertices you would think that in the status structure i'd have the vertices but instead i have them just as the keys in a dictionary which map to the neighbors and so this kind of idea of storing just the adjacency is called the adjacency sometimes called the adjacency set it's a variation of the adjacency list there's also adjacency mapping but there's many of these kind of data structures where you just store the adjacencies usually in some kind of dictionary like object i'm not going to fix this problem with adding vertices more than once you'll notice that adding edges more than once is really okay because it's a set the edges are a set and the frozen sets you know if i add the edge more than once to the same set i still only have one copy of it because there are no duplicates in a set and the neighbors are stored as sets too so if it gets added more than once as a as a neighbor it still only appears one time because again sets don't have duplicates um so we can get pretty far not thinking too much about [Music] those corner cases here but really this kind of stuff should be also packaged up i think in a separate method which we'll call add edge uv okay and let's do that so add edge uv oops we need to include self right so we pass the graph object itself and the two vertices that we're going to connect u and v and that's it right so we add them in here now one weird thing about this is technically i can call this add edge method even if i haven't added the vertices and uh i could check to make sure that the vertices are there because if if i try to add an edge and the two end end points of the edge are not already vertices in here i'm going to get a key error because it's going to look up for the neighbors to try to add it add a new neighbor and it will get a key or won't find that vertex so i could do something like this [Music] like if it's not in there then i can add it and i think this is kind of the right behavior i think if if you try to add an edge and you haven't already added the vertex you might as well go ahead and add the vertices because it's much more likely that that's the the desired behavior here and i can do this again for v um but it's uh maybe let's just go ahead and re-add the vertices and and let's figure out what to really do about this case of um of adding in a new vertex if it's not already been added and i think the easiest thing to do oh and i can get rid of this this line too right this line is done because we we don't even have self.v so i'll just make sure if i haven't already added it i'll add it and if it's already been added i can just skip it right so so if it's in there it'll be it'll be in the dictionary already um so actually if it's not in there then i'll i'll create it otherwise i'll just move on piece of cake um there is a there's a built-in python method on the dictionary to do this kind of uh default setting um it does essentially the same thing but that's all so i think this will do let's just make sure that at least the little bit of code i have here at the bottom actually runs it doesn't it says it's looking for v i must have left in one of my ah yes right here um this should just iterate over the input let's try that boom okay good to go all right so what can we do with this we have a graph now it stores the vertices and edges i can get access to the degrees and i should in theory be able to get access to the neighbors of any vertex let's make sure i can actually do that so let's just see what do i get if i print g dot neighbors of 2 it gives me a set iterator object oh that's right because i only gave me an iterator over it so let's just put it in a list and print it out boom okay one and three so if the neighbors of two are one and three the neighbors of one should just be two the neighbors of 3 should just be 2 also good so there we go that is that is most of the public interface for a graph that we might want to use [Music] i'll just put this one in here in fact let me put it as a set that way i can check equality and i don't have to worry about the orders um and of course those are not the right neighbors this should be one and three all right so um so there we go um maybe this is good enough for now um i can create a graph i can add a vertex i can add an edge i can get the degrees um any other good stuff we would like to do i guess i would like to be able to know how many vertices and edges i have uh for a lot a lot of times you may see cases where this um this edge set is not explicitly stored like i can actually get away with not storing this self.e so the graph doesn't have a special set of edges somewhere but but i kept it around actually for exactly this one reason which is that i would like to be able to get the number of edges and the edges are almost always called m um and so i can return the length of self.e right so this data structure right here tells me how many edges if i didn't have that and i had to figure out how many edges there were just from the neighbor dictionary i think i would have to iterate over all the vertices add up all the s the degrees and then divide by two and that takes way too long to just give me the number of edges so here i have this and actually if i'm going to use it um i'm going to make this a property so that i don't have to put parentheses every time i ask for it right g dot m is going to be the number of edges and similarly i'm just going to do the same for vertices which is should just be called n because n is always going to be used to be the number of vertices in the graph and let's try that this should be what length of this neighbor dictionary right because the number of keys in the dictionary there's one key for every vertex so that should do it let's check i believe what should be true here we should have uh g dot n equals three and uh g dot m is equal to two so okay all right so um i can get number of vertices number of edges i can add edges i can add vertices i can also remove stuff let's do let's do that as well [Music] if i want to remove a vertex or an edge which one is easier i think removing an edge is way easier and the reason is that when you remove a vertex it also has to remove some edges so it'll be much cleaner for us to be able to already remove edges before we get to the vertices so if i want to remove an edge okay [Music] what do i do i guess i want to take the edge set and remove i gotta package it back up into uv and i guess i need this as well okay now that's not enough because in u i've got the neighbors of u contains v and the neighbors of v contain u so i have to remove both of those as well okay so i would go into the neighbors of u and i'm going to remove v and i go into the neighbors of v and i'm going to remove you okay um this is okay we could test it at least see if it makes sense um for instance right here i could do g dot remove edge uh one two and um it runs without error i think what i expect to see here now is that the number of vertices should still be three but the number of edges should be one okay and that's true the thing that's i don't know right now is what if i remove an edge that's not actually there um i think this should fail right because um i get a key error it looked for that it looked for the edge in the set when i was trying to remove it but didn't find it so we might as well check it first and i think maybe we could just do this in two steps here let's package up this edge there it is the frozen set let's call it e [Music] and if e is an edge then i could just check it this way and i'll only do the other stuff if it was actually was if it actually was an edge if it wasn't already an edge then i got nothing to do i can just return and so there you go now i can remove an edge and if i want to remove a vertex as well now what i'll do is i'm going to just iterate over all its neighbors which i can do because i haven't i can actually use the public interface here i can just iterate over the neighbors and for each one i'm going to remove that edge and then finally i'm going to remove the vertex so this should be straightforward [Music] so for every vertex in self dot let's do let's do it like this neighbors of u i could also use the i could use the private dictionary here and write this as neighbors of you um but i think it's a good practice whenever possible to try to use the public interface [Music] so let's do that so neighbor is a view because the public interface tends to be more stable so if things change you're much more likely to have fewer changes to make all right so for every vertex that's a neighbor i can just remove that edge let's do that remove edge u to v there we go so this should take the vertex and just remove all the edges coming out of it and now it'll be an isolated vertex and now we're just going to have to pop that out of the dictionary [Music] and to remove something from a dictionary i think you have to do this so let's do so dot neighbors u now um you might think why not just do this one here right because if it removes that it removes everything all its neighbors [Music] we would still have to go back um and remove the edges coming in right so from all the neighbors v of u we have to still remove u from their neighbor list so this is kind of just a clean way to do it it's not the most efficient because it's sort of removing one by one all the neighbors of you and then just deleting the whole set but in terms of being readable and hopefully idiomatic and clear that's i think this is pretty straight straightforward all right so um okay so i i uh i didn't actually call this function yet so let's do it let's do uh oh let's actually so first of all let's check uh removing edge one three actually is still okay um i don't know if you caught that but it runs now because it didn't uh didn't actually do anything it still has um it still has this one one edge and three vertices in fact let's put the edge back in the one we took out one two and then right so we put it back in and we should check it should have two edges again uh no it doesn't um [Music] interesting doesn't have two edges why not what are the edges uh oh it only has the edge two three so add edge is not doing what it's supposed to why not um ah yeah add edge should probably add it to um to this uh this set e here so let's do that let's make uh edge uv and add it in there okay and again we don't we don't have to think too hard about this right now if it was already in there um we could just skip it um and if we add it again it's a set it won't it won't cost us anything um all right so now it runs it has the right number of edges here and so i've added i'm back to a path it should go one to two two to three and i think if i did g dot remove vertex i want to remove a vertex with more than one edge coming out so let's do that and i think what i should have now if i remove that vertex 2 i should have two vertices 1 and 3 and no edges at all so g dot n should be 2 and m should be 0. and oh it's no good oh no i've done something terrible i actually uh i actually manipulated a set while i was iterating over it so you see this runtime error set change size during iteration yeah can't do that so where did i where did i go wrong i think it's right here you see this remove edge function actually [Music] changes self.neighbors and neighbors is iterating over it um so we have to be a little bit more careful here let's um i could just pull them out like this [Music] so i can make um just a list of of vertices to remove those of neighbors and then i can iterate over that instead all right so that way i did one iteration to get all the neighbors i've got them and now i'm hopefully [Music] this will work now okay so you do have to be careful about uh when you're editing sets that you're not also iterating over that set at the same time um all right this is uh pretty good all right this is recap so we we can build a graph from vertices and edges we can add an edge we can add a vertex we can remove a vertex remove an edge we can get the number of vertices number of edges there's n and m and we can get the degree of a vertex and we can also get the number uh or this an iterator over the neighbors of any one vertex it turns out that this gets you uh most of the way um to to being able to implement most of the kind of standard data structures you'd like to be able to implement and another video i'll actually show you how to implement some basic ones um and some fun ones um so we'll get to that pretty soon
Info
Channel: Don Sheehy Lectures
Views: 5,621
Rating: 5 out of 5
Keywords:
Id: uFaZY1dVnGs
Channel Id: undefined
Length: 31min 17sec (1877 seconds)
Published: Mon Aug 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.