6 basics data structures explained in one video

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so when data is stored in a linked list it's stored in a sequence of nodes and it's it's often compared to an array but instead of putting the values in these boxes here a linked list will put the values in in nodes that are linked with each other and when i say linked it means that each node let's call um these boxes nodes contain an address indicating where the next node is okay so uh let's compare this to an array so each box in an array is indexed and they're physically next to each other this means if you want to change a value of an array for example let's say you want to change the the fourth element 15 to 16. you can simply say uh go to the fourth box and which is index three and change 15 to 16. so it's going to be very simple but when it comes to linked lists we first need to go to the head and then follow along the link list after we come to the fourth fourth node then we'll be able to change that value so it would be much slower for example if the link list consists of like 1 000 nodes then we'll have to do that process a thousand times if you want to change the um the the last node so you might want to question then why would we use a linked list so if you use a linked list adding or deleting an element at the beginning of the the list would be quite simple so like in that case it's we say it's going to take um it takes constant time so the time for adding something at the beginning wouldn't matter would be the same whether we're dealing with one or two notes or a thousand a thousand notes so in that case um it's going to very it's going to be very fast and in some applications that can be very useful uh when i was uh mentioning arrays uh remember how i said uh the in an array the the locations are physically next to each other and the addresses are fixed so in that case when we want to uh add something at the beginning of an array all the values will have to shift and that can be a bit more complicated than than a linked list so that's an example of why a linked list would would be better than an array and here you can see this is a linked list called a doubly linked list so what we saw here like before was a singly linked list each node only contained the address of the next but in a doubly linked list each node will um contain the address of the next and also the previous and in this case uh adding an element at the end of the the linked list would be easier than a singly linked list um like the linked list we learned in the last video stacks and queues are also very basic linear data structures they both have flexible sizes which means that they're going to grow as you added more data and it's going to shrink as you take it out um but the main difference between a stack and a queue is the way data is removed stacks have a last in first out data structure it's also known as lifo the last one you put in would be the one that you take out the first that's why it's called a stack it's like a stack in real life for example if there's a stack of books and you want to remove a book from the stack you'll have to take away the book that is on the top and that book on the top would have been placed there at the last when we're putting something in a stack when you're putting an item or data in a stack we call it push and when we take it out we call it pop so push and pop would be the two main functions that we use when we handle a stack and for cues it's going to be fifo first in first out which works like normal q in real life just think of a line at a super mario cashier um the first person to to join the line would be the first one to be served so for cues when we add an item into a queue we say that we enqueued an item and when we take it out we'll say we decued it which means we took out the one that went that has been there for the longest time okay um so as you can see this is a tree structure we call the top node the root and the nodes that are directly under a node are called children so the node above would be the parent and if you see at the bottom of the tree there are nodes with no children these are called leaves here you can see that there are two children coming from one parent in this diagram in this case we call this a binary tree a tree that has nodes with no more than two children is called a binary tree so each parent is going to have a left and right child and when the smaller child is placed on the left and the larger placed on the right we call this a binary search tree but most cases a binary tree actually wouldn't look as perfect as this some of the left or right children nodes will be empty so it would be more looking kind of like like this so in a tree like this um we're gonna like talk about how we're going to uh insert a no nodes are always inserted as a leaf let's say we want to insert a node with 56 as a key in this tree we're going to start at the top and compare 56 to the root um so 56 is smaller than 100 so go left and it's larger than 52 so go to the right and it's going to be smaller than like 76 so go to the left but that spot that left spot is is empty so 56 would be placed there and that is how a node can be added into a tree as as a leaf it's also similar to when you want to search a node in this tree so let's say you want to find a node with 24 you also start at the root at the top and compare 24 to 100 it's smaller so go to the left smaller than 52 so go to the the left larger than 19 so go to the right and it's there so we found it so we can say that 24 is in this tree yes it's very similar how you insert it and how you search nodes the uh the advantage of using a binary tree is speed so when the tree is is well balanced when we search or insert data into a binary search tree the the big o is going to be o h which is better than an o n it's it's log n even though the tree is perfectly unbalanced which is like the worst case scenario it would still be linear which would be equivalent to a linked list so a binary search tree would be pretty useful when you're handling a large amount of data so this time we're gonna um a try is a tree structure which is mostly used for storing words each node will represent a word or part of a word and by connecting the path from the root to that node we can store words in this particular structure we do this because it lets us search words pretty fast uh one application for tries is autocomplete uh you know how like when you're like searching something in google and as we type in like a search word in the search bar google tries to guess what we're trying to type and the more letters we type in the more closer it gets is and tries can also use this to suggest what the next word would be look at this try over here the tri is currently holding the words aragorn aragog aragon eragon oregon oregano and oreo tries will have a root node at the top which will have children nodes one for each possible alphabet there are 26 letters in the alphabet so each node will have 26 children nodes just to keep it simple in this diagram i didn't draw all the like empty nodes but if you actually look into each node it will have an array where each index will represent all the possible characters in the alphabet and the ones that represent the children will have a pointer inside it that points to that child node for example let's look into this node that holds the r in aragorn and aragog this node will have an array with a size of 26 and the indices will be all nil except for the ones that represent a and g each node needs to hold this array containing the addresses of the next notes but it also needs um needs another piece of information the nodes need to have a boolean value indicating whether the node is an end of a word or not for example in the word oreo o r e will have false as a value for the is end value but the last node of oreo with the letter o will have true as the value indicating that that is the last letter of the word uh if we set the is n value of the r node in oreo we would be adding the word or in the list and or would be stored in the try along with the others now that we know the basic structure of tries let's see how we can insert and search words inside a try let's say we want to add a new word orc first we start at the root and look into the array that holds the address of the children then we'll see if there is something in the index that represents the first letter of orc which is o so yes it has it there is a pointer at that index that points to the child node for o so let's go to that node this is also a node just like the other nodes like the root notes or any other note so it will also have an array that holds the pointers of the children and this time we'll see if there is something in r because that's the second letter in org and yes there is so which means we have a node for that same goes for the next node we'll see if there is something in the index for c which is the third letter of auric so this is where we create a new node and add the address of the new node to the array index of c but this is not the end because we finished adding the word org we need to set the is n value of the last node as true okay so that's the end of inserting a word now that we successfully added the new word org let's go and try to search this word in the try the process is similar to inserting the node go to the root and just check if there is a match for the first letter then follow that pointer to get to the next node and check for the second letter r which is there and then to the next to search for the third letter c which is also there so we have o r and c in the try but there is one more last step we need to check if the is n value of the last character c is true if we try to search for a word that is not in the list for example orb it will immediately let us know that the word is not there once it finds an empty index of the children array now let's talk a bit about the time complexity of a try the big o will indicate how fast and efficiently inserting or searching in a try can be let's say the length of the keyword we're searching or inserting is is m in that case the time complexity will be om because we will have to follow down the tree m times however this is not always the case there can be a better scenario when we are searching the try if the search meets an empty spot for the next character it would immediately stop the search and return false so in that case it would be better than om if we use tries we can search words very fast but usually a try can take a lot of space think about each node has 26 children and the required space will exponentially grow as we add longer and longer words in into the try this is why tries are known to have a trade-off between time and space the data structure was originally introduced as a part of an algorithm called heapsort but the data structure that was introduced there was also useful for many other applications heaps are especially good for implementing priority queues in normal queues we take out the one that came in first right but in priority queues we take out the one with the highest priority which is where heaps can come in handy a heap is a data structure that can be expressed as a complete tree a complete tree means that all the levels in the tree are full except for the lowest level where some nodes can be empty but have all its node to the left we are looking at a binary heap here where the nodes have up to two children in the case for max heaps the largest key will be stored in the root and for all the nodes in the tree every parent will have a higher key than its children a max heap would be useful when we need to repeatedly remove a key that is the largest because we already know that the largest key is in the root this is a reason why getting the largest key can be so fast regardless of the size of the heap the opposite of the max heap is min heap where the root is the smallest key so in a min heap each parent node will have the smaller key than its children in this tutorial we're just going to be talking about max heaps because min heaps is just max heaps done opposite like we saw in the diagram so far heaps can be visualized as a tree but actually behind the scenes the keys are stored as an array where each node of the tree corresponds to an index of that array so strictly speaking heaps are actually just an array that operates like a tree this is possible because you can just easily calculate the indices of the children based on the parent's index and vice versa so with this simple formula we can get the index of the children for any index of the heap for example if we want to get the left child node index of index number 39 39 multiplied by 2 plus 1 will give us 79 which will be the index of the left child okay so now that we know the structure of heaps let's talk about the process of how we can insert keys in a heap whenever we do an insert we're going to add the new key at the bottom to the right of the tree in an array the bottom right node will be the last index after just mindlessly just placing the new node there we need to rearrange the nodes so that we can maintain the heap property which is to keep the parent key larger than its children we'll compare the new node to its parent node and swap it if the new node is higher we follow up the tree and keep on repeating this process until it gets to its place we call this process of rearranging the indices as heapify the heapify process is also needed after taking out an item of the tree so let's go and see how that works to extract the highest key in a heap we can just take out the root because we know that the root is the largest key after that we need to rearrange the node so that we can fill in the empty root while maintaining the heap property so first step right after losing the root we will fill the empty root with the last node so 34 which is the last key will go to the root position then we'll start the swapping process starting from the top look at these three nodes at the top just think out of the three who should be the parent like who should be on the top it should be 50 because 50 is the largest out of the three so 50 becomes the parent by swapping positions with 34. in other words the parent node 34 will be swapping places with its larger child the same goes for the next round 48 is the larger child so we will be comparing 34 to 48. um is 34 larger than 48 no so 34 doesn't deserve to stay there and it has to swap with 48. the time complexity of extracting and inserting a node in the heap would depend on the heapifying process simply adding an element at the end of the array or taking out a key from the first index of the array would not be the dominant part of the time complexity so we'll just ignore that part however the number of operations needed to heapify up or down would depend on how high the tree is so that's going to be the the one that affects the time complexity so we can express insert and extract as o of h where h means the height of the tree if we want to express the time complexity with n which would be the size of the array we can just replace h with log of n because the height and the number of indices have a logarithmic relation so the time complexity for uh inserting and extracting from a heap would be o of log n you can see that the heapifying process is the tricky part just adding and taking out an element is easy but rearranging the heap to maintain the heat property would be our main focus when we're doing the coding let's see this tree over here so in this tree we have nodes and we have edges and in trees the edges point from the parent to the children but what if we add a little bit more to this what if these edges got a little bit crazy and started to point to other things for example a child node starts to point at its parent and then they start to point to each other or even point to nodes in other branches making cycles and even pointing to itself this doesn't look like a tree anymore does it we can't tell which one is the root now can we this is what we call a graph so for graphs we call these vertices and these are called edges when the graph has many edges we call it dense a dense graph when it has relatively less edges we say that the graph is sparse so graph is an abstract data type and so there are many different ways to implement it you can make it a directed or undirected graph an undirected graph means that every edge is bidirectional it goes both ways you can also allow cycles or not you can also put weights on the edges for example you can use weights to calculate the time of passing a certain path from point a to b like finding a path in google maps and calculating how long it would take you might have also noticed that the tree structure is also a type of a graph but it has more restrictions or rules like for example children cannot point to a parent no cycles are allowed or if it's a binary tree there should be only maximum of two nodes as children and so on so i just want to say that there are a lot of different implementations of a graph which when you choose would depend on what it needs to do okay so now we know what a graph looks like and we also saw different kinds of graphs you can now tell that the graph you're looking at here is a directional graph it allows cycles and it's unweighted now let's think about how we're going to express a graph in code there are two common ways to represent a graph one would be the adjacency list and the other is going to be the adjacency matrix so for an adjacency list the graph is expressed as a list of vertices and each vertex will have a list holding the neighboring vertices so it's going to look like a list of lists and these lists can be a linked list or arrays dynamic arrays anything that can implement a list we will call the integer values of the vertices keys and if they're going to represent a set of information they should be unique okay so let's start filling in the adjacency list if you see the graph a one is pointing to two vertices two and four so these two should be in the adjacency list for one and same goes for vertex two it's pointing to one so there should be vertex one in the adjacency list of vertex two do you see how one and two are pointing to each other so if this was an undirected graph instead of a directed graph all the vertices should be looking like this if one has two then two should have one if one has four then four should have one when we want to add a new edge to the graph we can just add a new element to the adjacent list and when we want to add a new vertex we just need to add a new vertex to the graph list okay so that is how you can represent a graph with the adjacency list now let's look into the adjacency matrix we can also express a graph structure with a two-dimensional array we'll put the vertices of the graph as columns and rows and they're going to represent the from and two vertex for example if we have five vertices in the graph we would need a matrix of five by five and here there is an edge connecting from vertex one to two so that entry from one to two would be one and same goes for every other edge in this graph and if this was a weighted graph the entries would hold the weight instead of just holding one if we have an undirected graph the matrix would be symmetrical because undirected graph means that all the edges are actually bidirectional okay so let's go back to the directed unweighted graph and see how we can add things to it if you want to add an edge you just need to fill in the from and two vertices and if you want to add a vertex you'll have to add a row and a column but actually you just can't add dimensions to a matrix you'll actually have to make another one and then copy all the elements inside it and we'll talk about that a bit more later on now let's compare the two adjacency list and adjacency matrix first of all which one do you think would take less space if the graph is sparse using the adjacency list can let us save space but for an adjacency matrix you will always need a space for v squared where v is the number of vertices no matter how fast the graph is you'll need that much space what about looking up an edge like checking whether two vertices are connected or not or checking whether who follows who in that case the adjacency matrix is much faster because we just need to do an array lookup which would just take constant time but if you try to do that in an adjacency list we need to traverse the list of adjacent vertices which in the worst case we would need v steps if the number of vertices is v let's think of some other operations adding a vertex could be easily done in an adjacency list because we just need to add an element to the list but for the adjacency matrix we need to copy the whole array to add new dimensions to the matrix which would be o of v squared that goes the same for removing the vertex 2. we need to create a new matrix with less dimensions then we need to copy the whole array like all the elements in the array when you remove a vertex from an adjacency list you need to remove all the related edges and for the worst case all the edges can be on one vertex so that would be an o of e where e is the number of edges now what about adding or removing edges an adjacency matrix can easily do this adding or removing a value in the matrix can be done in constant time for the adjacency list adding can be done in constant time we just need to add an element to the list of neighbors but when we want to remove an edge from the list it needs to be o or v because we need to traverse to find the edge so overall if your graph doesn't have too many edges the adjacency list would be better than the adjacency matrix when it comes to space and for the operations if you need to do a lot of lookups and not much modifications the adjacency matrix would be better okay so now we learned about the basics of the graph representation and some operations let's go and see how this would look like in code
Info
Channel: Junmin Lee
Views: 3,980
Rating: undefined out of 5
Keywords: data structures, data structures and algorithms, dsa, ds&a, linked lists, stacks, queues, stacks and queues, tries, trie, trees, binary search trees, heaps, graphs, graph representation
Id: _9swtAph9jM
Channel Id: undefined
Length: 26min 31sec (1591 seconds)
Published: Thu Feb 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.