Hello, everyone, and welcome to an introduction
to data structures. My name is Steven, and over the next three hours, we'll be tackling the
topic of data structures in relation to computer science. More specifically, we'll be talking
about what they are going over some different types of data structures, and discussing how we
can use each of them effectively with examples. This course will be a general overview of
data structures, and so we won't be confining ourselves to one specific programming language.
However, this series will require you to have a basic understanding of programming. If you are
completely new to computer science, I would recommend our introduction to programming series,
which will be linked in the description below. That video will give you all the information
you need to traverse these topics comfortably. Now, we're almost ready to hop into things.
However, there are a few housekeeping items that I want to go over beforehand for those
wishing to enhance their learning experience. If you're not interested in that and want to skip
right to the content, go to the time shown on your screen now. For those of you staying though there
are just a few things I would like to mention. Firstly, in the description, you will find
timestamps for each major topic covered in this video, as well as timestamps taking you to every
smaller section contained within those topics. So please feel free to skip around if you're
already comfortable with one topic or only interested in a certain data structure.
Next, we've decided to include the script and visuals used for the series also in
the description below. That way you can follow along if you like or simply read the
script if my soothing voice isn't your style. Additionally, for each segment, I'll be including
the references and research materials used to write the script for each topic. That way, if
you ever feel as if I haven't explained a topic well enough, or we're just simply like more
information about a certain topic, you'll have a readily made list of articles and websites,
which you can use to supplement this series. If you feel like you have any questions throughout
the series about maybe something I said or a certain visual, please ask your questions in the
comments below. I'll try to be as responsive as possible for the first few weeks on questions
you guys may have about the series and such. And finally, in terms of housekeeping, I just
like to say if you're not already subscribed to our channel, no pointer exception, then consider
doing so if you enjoy this type of content. as me and my co host Sean regularly post videos in this
style. We're trying to hit 1000 subscribers before the fall semester of college begins. So check
us out if you end up enjoying the video. With my shameless plug out of the way, we're finally
ready to tackle the topic of data structures. In this introduction segment, we'll simply cover
a general overview of what data structures are, and then go over what type of information will
be covered throughout the duration of the series. So the obvious question to start with
is what exactly is a data structure? Well, in computer science, a data structure is a
way to store, organize and manage information or data in a way that allows you the programmer to
easily access or modify the values within them. Essentially, it's a way for us to store
a set of related information that we can use for our programs. data structures, and
the algorithms used to interact, modify, and search through them provide the backbone for
many of the programs that you'll end up writing, I can almost guarantee that in 99% of your
programs data structure will be involved. Each of the data structures that I'll be
talking about are designed for the sole purpose of storing information and allowing the
end user to access and manipulate that information in an efficient, effective way. But each one
differs in the manner that they accomplish this. If you have a basic understanding of programming,
you probably know about a few different data structures already, such as the array and the
array list, also known as the list in Python. But if you're going to be pursuing a career
in computer science, just knowing these two is definitely not going to cut it. Well
basic data structures such as the array are used frequently by companies throughout their
code. More advanced data structures are being put to use all around you the Undo redo button
in Google Docs, Google Maps on your phone, even the autocomplete feature through your
text messages all require the use of more advanced data structures, which makes them
extremely useful to learn and comprehend. Okay, now, I can't see you per se, but I can just
tell that you're jumping up and down in your seat hooked on data structures and how useful they
can be. But before we jump too far into things, let's discuss the information that we'll be
covering in this series. Now, before we talk about any specific data structures, we're going
to have a brief talk about efficiency. We'll discuss the metrics used to judge the speed and
efficiency of different data structures, which will then give you a better understanding of why
one data structure might be used over another one. From there, we'll start by diving headfirst into
what I believe are the basic data structures, those being array. An array lists. While you may
already have a good understanding of what these are, I would highly suggest that you still watch
these segments, because we'll be going into a little bit more depth as to why they're so useful
based on differences in how they're stored in the computer's memory. After that, we'll move on to
what I'll call the intermediate data structures. These are a little bit more complicated than the
basics and have a few special attributes which make them stand out from the rest. We'll begin
by taking a look at stacks which are the backbone for all recursive processes in your computer. Then
we'll look at the antithesis of a stack the queue. Moving on from there, we'll be covering
linked lists and their evolved form in the doubly linked list. Before moving on to the
final of our intermediate data structures, the dictionary which includes
a mini lesson on hash tables. Then we'll wrap up the series talking
about trees and tree based data structures, less linear and more abstract data structures
beginning with the tree itself. We'll then move on to tries a very useful data structure used for
a lot of word processing algorithms. And finally end off the series with a discussion on heaps and
graphs. So in total, I'll be taking you through four different segments containing 12 of the most
common data structures that are practically used, as well as providing examples of where and when
to use them. section one on efficiency, section two on basic data structures, section three on
intermediate data structures, and wrapping it up with section four on tree based data structures.
With this knowledge, you'll be able to take charge of your programming career and hopefully gain a
competitive edge by implementing them when needed. Like I said, however, before we jump straight
into the different ways to store information, I'd like to have a quick discussion on how we
score the efficiency of these data structures using what is known as Big O
notation. So let's jump right in. Okay, so before we talk about all these crazy
data structures, like maps, and heaps, we want a quantifiable way to measure how efficient a
certain data structure is at the different tasks, we might ask a bit. If we're going to be
storing extremely large amounts of data, being able to search through, modify, or access
the information within a data structure needs to be fast and efficient. As we briefly mentioned
before, the industry standard for this kind of implementation is big O notation. So what exactly
is big O notation? And how do we measure it for a specific data structure? Well, that's what we'll
be covering in this segment. For most of the basic and intermediate data structures in this series,
we're going to be spending some time talking about its efficiency using big O notation. So this
is definitely a topic you're not going to want to skip. With that being said, let's begin. So
because there are so many different ways to store information, like we talked about in the previous
segment, programmers have developed this idea of big O notation as a way to basically score a
data structure based on a few different criteria. This criteria can change based on who you
ask, but for the purposes of this video, we will be using four criteria representing
the most common functions you might want from a data structure. The ability to access a
specific element within the data structure, search for a particular element
within the data structure, insert an element into the data structure and
remove an element from the data structure. Now by measuring how efficiently a certain
data structure can do these four things, we can basically create a report card of sorts,
which measures how efficient a certain data structure is. At a glance, this gives us a
pretty good overview of what a certain data structure is good at, and what it is bad at.
And it can help us decide which one to use. If we need to store data that is easily accessible
to the end user. For example, we might choose a data structure which can access elements the
quickest, and vice versa, if accessing elements isn't the most important thing to us, but we
need a data structure which can be easily added to and deleted from, we would go for one which is
most efficient and that specific functionality. By looking at a data structures
report card, if you will, we can get a quick sneak peek at what
they're good at and what they're bad at. Now, if we use big O notation to basically create
a report card, like I said, there must be some way to actually grade each of these functions. And
there is the four criteria mentioned, accessing searching, inserting and deleting are all scored
using big O notation time complexity equations. Now what is the big O notation time complexity
equation? Well, I'm glad you asked. Besides being in an annoyingly long phrase, a big O notation,
time complexity equation, or just time complexity equations, work by inserting the size of the data
set as an integer n and returning the number of operations that need to be done. indicated by
the computer before the function can finish. The integer n is simply the size or amount of elements
contained within the data set. So for example, if we have an array, one of the data structures
we'll get into soon, with the size of 10, we would place 10 into the different efficiency
equations for accessing searching, inserting and deleting that represent the array, and returned
back to us would be the number of operations that need to be conducted by the computer. Before
completion of that function. It does get a little bit more complicated than this. But for the sake
of keeping this series simple, all you need to know is that these equations help represent
efficiency amongst different data structures. Also, an important thing to note here is that we
always use the worst case scenario when judging these data structures. This is because we always
want to prepare for the worst and know which data structures are going to be able to perform under
the worst conditions. You'll see what this means in greater detail a little later on once we
start putting this into practice. But for now, just keep in mind that when judging data
structures, we always use the worst case scenario. Moving on, the reason that it's called
Big O notation is because the syntax for these particular equations includes
a big O, and then a set of parentheses. inside these parentheses will include some
function, which will correctly return the number of operations needed to be run by the computer.
So for example, let's say we have a fake function, it can really be anything, the purpose in this
case is irrelevant. Now for this fake function, let's say it's time complexity equation
was the one shown on your screen now, we would pronounce this time complexity equation
as o of two, meaning it takes two operations from the computer before our make believe
function can finish. If the time complexity equation was o with a five in the parentheses,
instead, it would be o of five, and so on. Now, these are examples of time complexity
equations, which run in constant time, meaning no matter the size of our data set, it will always
take the same number of instructions to run. This is usually not going to be the case though,
when it comes to the four criteria we want to mention for our data structures. Most of the
time are integer n, which again contains the amount of elements within the data set is going
to have some adverse effect on how many operations it takes to say search through our data
structure, which makes sense, a larger data set means it's going to take more operations from the
computer to search through that entire data set. Now we score these four functionalities in number
of operations performed, because measuring by how long the function takes to run would be silly.
If we measured by a metrics such as time taken to completion, our results would be highly
biased by the hardware used to run the function. a supercomputer used by Google is obviously going
to be able to search through a data structure much faster than a laptop. time complexity equations
essentially level the playing field by instead returning the number of operations to eliminate
the bias in processing power that exists. So to sum up everything that we've learned so far, we measure the efficiency or speed of a
data structure based on how well it can perform four basic tasks accessing, searching for
inserting and deleting elements within itself. Each of these criteria is modeled by an equation
which takes in the size of the data structure in number of elements and, and returns back the
number of operations needed to be performed by the computer to complete that task. By measuring
these four metrics, we can get a pretty good understanding of what the data structure is
good at, and what the data structure is bad. Now, it's important to note that this isn't
the end all be all for deciding on which data structure to use in your program. As you'll
see, as this video progresses, many of the data structures were structured with specific
quirks or features, which separate them from the rest and provide additional functionality. Big
O notation is incredibly useful, and something that you should definitely take into consideration
when determining which data structure to implement into your program, but it should not be the only
thing that you use. Cool. Now that we know a little bit about how we measure the efficiency of
a data structure using time complexity equations, along with the four criteria actually used to
measure them. Let's dive straight into what the actual equations mean in terms of efficiency.
That way, when we start grading data structures, you'll have a good idea as to how good or
bad each one is at that particular metric. Basically, we're going to cover the six
most common time complexity equations from most efficient to least efficient. Now, the
absolute best that a data structure can score on each criteria is a time complexity equation
of o of one. This essentially means that no matter what the size of your data set is,
the task will be completed in a single step. If your data set has one element, that
computer will finish the task in one step. If your data set has 100 elements, one step
1 million elements one step 800 quadrillion elements, it does not matter, the computer
will finish the task in a single step. This is why when we look at the graph of volume
of data versus instructions required for o of one, the line remains constant at one. No matter the
volume of data stored within the data structure, that computer can complete that
task in a single instruction. Whether it be accessing searching, inserting,
or deleting, O of one is the gold standard absolute best top of its class time complexity
equation. It is basically the Michael Jordan of big O notation when it comes to time
complexity equations. Now the next fastest type of time complexity equation is O log
n. While not as fast as instantaneous time, a function having a time complexity of o log n
will still provide you with very fast completion. Now if you don't fully understand logarithms
entirely, just know that this efficiency is slower than instantaneous time of one and faster
than the next level of efficiency known as O of n. Additionally, because the volume of data
versus time graph follows a logarithmic curve, the larger the data set that you use, the more
bang for your buck that you're going to get. Basically, as the amount of data you try to
perform one of our four criteria on increases the rate of change of the amount of operations
it takes to complete that certain task decreases. So as you can see, at the beginning of the
graph here, when we increase the volume of data, our number of operations skyrockets. But
when we do the same for larger and larger sets of data, our number of operations
increases much more slowly than before. A good example of a non data structures related
function, which has a time complexity of o log n is the binary search. If you know what
a binary search is, then you'll understand how it is that when the data set gets larger,
the number of instructions needed to find the item you're searching for, doesn't increase
at the same rate. If you don't know what a binary searches, but would like to know,
in order to better understand Oh login. A link in the description below will take
you to that point in our introduction to programming series where you can learn about
it. If not, let's move on to the next equation. O of n is the next common time complexity equation
that's going to come up frequently during this lecture. The graph of volume of data versus
instructions needed for this function is linear, meaning that for every element you add to the
data set, the amount of instructions needed to complete that function will increase by the
same amount. So to perform a function with the time complexity of O of n on a data set
with 10 elements, it will take 10 instructions 50 elements will take 50 instructions, 1000
elements, 1000 instructions, and so on O of n is really the last good time complexity equation
that exists. Anything above this is considered inefficient and not very practical when it
comes to data structures in computer science. The next type of equation that will come up is o n log n. This equation is the first that
is relatively bad in terms of efficiency. The graph of volume versus operations shows a
somewhat linear but increasing graph meaning unlike o log n, it won't be better in terms of
efficiency as the size of the data set increases. Instead, the slope actually
increases as the volume of data does. The last two types of equation are o
n squared and o two to the N. These are both incredibly inefficient equations,
which should be avoided if at all possible. Because they're both exponential in structure,
which can be seen from their graphs of volume versus operations. The larger the data set that
you use, the more inefficient it will become. While there are more time complexity equations
that exist such as o n factorial, which is even worse than the two I just mentioned, the data
structures we'll be talking about, we'll never have time complexity equations that exists outside
of the five we've just covered. So we're going to stop there. Now the final thing I want to say is
just a reiteration of something I said before, these time complexity equations are not the only
metric you should be using to gauge which data structure to use. As we get deeper and deeper into
this series, you'll see that we might have some data structures which don't seem that efficient
at all based on their time complexity equations for provide some other functionality or feature,
which makes them extremely useful for programmers. Okay, now that we have some knowledge
on how we actually grade these data structures in terms of efficiency, let's
hop into our very first data structure. The first data structure we're
going to be covering is the array. Now I understand the array is a pretty common
data structure taught in most programming classes, and that many of you might already know
about the properties and uses of an array. But in this segment, we'll be going into
more detail on the array as a data structure, including its time complexity equations,
storage methods, and more. So check the description for a timestamp to skip ahead to
that section. If you already know the basics. If you're not as versed in the array,
though, or just want a general review, then stick around, because right now we'll
be discussing the basics of an array. An array is fundamentally a list of similar
values grouped together in a central location, which makes sense because if you
remember, we use data structures to store sets of similar information, so
that we can easily use that information. Basically, arrays can be used to store anything,
usernames for an app, high scores for video game prices for an online shop, pretty much any
list of values which are fundamentally similar meaning of the same type. So integers, strings,
floats, objects, the list goes on and on. Any primitive or advanced data type you
can think of can be stored within an array. Every item contained within the array is
referred to as an element of that array. And we call the collective total of elements,
the array, this is going to be true for almost all data structures we talked about, by the
way, so just keep that in mind. An array will also have three attributes associated with
it, the first being a name for the array, the second a type for the array, and the third
a size for the array. Let's start with the name. The name, if you couldn't tell is just simply a
name for the array, which can be used to reference and interact with it. Say for example, we had
an array called names, which contains the list of company employees. And then we also had an
array called salaries which contained a list of salaries for those employees. If we wanted to
search through the list of salaries, we would do so by referencing the salaries array with its
name. This way the computer knows that you want to search through that specific array containing
salaries, and not the names arraigns that this works vice versa as well, of course, if we
wanted to search through the names array instead. Now another thing to note really quickly, while we
have this up is that these two arrays are also an example of parallel arrays. Parallel arrays are
two or more arrays containing the same number of elements and have the property wherein
each element in one array is related to the element in the other array, or arrays have the
same position. So from our example, we can say that the first element in our names array, john
smith, corresponds to the first element in the salary array, the integer 10,000. This would
mean that john smith has a salary of 10,000. This is really good, because as you'll see, later
on, we can't store different types of variables in the same array. So we couldn't have the salary
integers be stored in the same place as the name strings, making parallel arrays extremely useful
for storing different types of data about the same entity. Alright, tangent over, let's get back
to the attributes of arrays. And the second one, we'll be talking about the arrays type. Now an
arrays type is simply what type of information is stored or will be stored within that array. One of
the biggest stipulations about arrays is that it has to hold all the same type of information. So
you cannot have an array containing integers and strings, it would have to be either an array of
only integers or only strings. This works for any type of information you would want to store within
the array, meaning every element or spot within the array must be of the same type. The third
and final attribute that an array has is a size. This size is a set integer that is
fixed upon the creation of the array. It represents the total amount of elements
that are able to be stored within the array. An array size cannot be changed. I'll repeat
that again because it's extremely important and array size cannot be changed once created.
Once you write the line of code which instantiates an array You give it a defined size or fill
it with elements until it has a defined size. And then at that point, the array size cannot be
changed by conventional methods. This is actually pretty rare for a data structure. And it may seem
really counterintuitive and useless. But later on when we talk about an arrays efficiency, you'll
see that this is for a specific reason, and not just to be annoying to the programmer. So there
you have it, the three attributes of an array, a name to reference it a type to fill it with
an A size to control it. Let's now shift focus and talk about how we actually define an array
ourselves how we reference and change values within an array, and then dive into
the idea of two dimensional arrays. There are actually two different ways to
instantiate an array in most languages, you can either populate the array with elements
that you want contained within it right then and there. Or you can set a specific size for the
array, and then slowly populate it later on as the program runs. We'll be discussing how
to do both, starting with the first creating the array with the elements already stored.
Now defining and filling an array as soon as you create it is mainly used for when you already
know which values are going to be held within it. For example, let's say you're creating an
array, which is going to hold the salary of 10 different workers at a company. And that
information is being read in from a text file. Well, you already know the 10 salaries
for those workers from the text file, and so you're able to immediately
populate the array when you create it. This way, you'll already have the information
available to you as the program runs. The way you do this varies amongst
different programming languages. But it's shown here in three
different languages, Java, Python, and C sharp. To give you a basic gist of how this
is accomplished. All three lines of code on your screen now will create an integer array of size
three, containing the numbers one, two, and three. You can see that all of them involve
the declaration of an array name, which is then set equal to the values you want
to include within the array encapsulated by some sort of brackets or braces. This isn't 100% how
it's going to be for all languages, but for the most part, it will follow this skeletal outline.
As you can see, Java and C sharp require you to define the array type, whereas Python does not.
But for the most part, the syntax is the same. Again, that is name of array set equal to list of
comma separated values encapsulated by brackets or braces. This also develops the size of the array
for you automatically. So since we populate the array with the numbers, one, two, and three, the
array would instinctively have a size of three, as it now contains three integers. If we were to
say go back and add in a fourth number, where we define the array, the size would dynamically
increase to for the next time we run the code. Now the second way in which we can instantiate an
array is by setting an initial size for our array, but not filling it with any elements and then
slowly populating it as the program runs. This would be used in the case that you
don't actually know what information you're going to store in the array,
but will as the program runs through. The most common case in which this happens
is in the case where the end user will enter information into the program. And then that
information will be stored inside the array. Since the information varies based on which user
runs the code and what they input during that particular run, we have no way of populating
the array as soon as it's instantiated, forcing us to add the information later.
shown on your screen now are two ways in which we can do this in Java and C sharp. Now
you'll notice that there is no Python section because of the fact that creating populate
later arrays is not conventionally possible in the base version of Python, without the use
of more advanced data structures or packages. You can see However, with Java and C sharp that
just as with the other example, there are slight differences in the syntax on how we do this, but
it generally follows some set pattern, the type of the array followed by a name for the array,
which is then set equal to a size for the array. Remember, this size is final and cannot
be changed outside of this initial call. If you end up creating an array with the size 10.
But then later in the program, find that you need more space for it. You can always go back to
where you instantiated it and change the size. But after that line of code runs, there's
no conventional way to increase it. You may also be wondering what the brackets
mean when instantiating an array and that's true Way to signify to the computer that we are
indeed trying to create an array and not just a variable. Now that we know the two different ways
to instantiate an array, we need to know how we, as the programmer actually get information that is
stored within the array so that we're able to use it. And the simplest way to answer that question
is through the means of a numerical index. an index is simply an integer, which corresponds
to an element within the array. Now, the most important thing to know about indexes is that
for a certain array, they begin at zero instead of one. So if we have an array of 10 elements,
the first index would actually be index zero, the second would be index one, and so
on all the way up to the ninth index. It's a little bit weird at first,
but trust me, you get used to it. Now, to retrieve information from a certain
position within the array, you would reference it using both the arrays name, and then the
index number of the element you wish to retrieve. Say we have our array called numbers here, which
contains the integers one through 10. To print out the element at the fifth index, in this case, the
number six, we would reference the array name, in this case numbers. And then in a set of brackets,
we would place the index we want to retrieve, in this case, the number five. What this piece of
code is basically telling the computer is to print out the fifth index of the numbers array, which
again is the integer six. Now because indexes start at zero instead of one, it's very important
to note that to get the 10th element of the array, you would actually need to reference it using
the number nine since there is no 10th index. If you do end up making this mistake, it
will result in an array out of bounds error, since there is no 10th index in
the bounds of the numbers array. Referencing and arrays index number is how we
actually replace elements within the array as well. Continuing with our numbers array, let's say
we wanted to change the last element, the integer 10 to be the integer 11. Instead, what we would do
is reference the ninth index of the numbers array, the one which currently contains the
integer 10, and set it equal to 11. This tells the computer to take the element at
index nine and replace it with the integer 11. Essentially swapping 10 for 11. Now the last
thing I want to talk about before we jump into the time complexity equations for an array is
the practice of putting arrays inside of arrays. An array with an array at each element
is known as a two dimensional array. Think of a two dimensional array as a
matrix. It's similar to any other array, except for the fact that instead of a primitive
data type like an integer how's that each element, we would instead have a whole other
array with its own size and indexes. two dimensional arrays are useful for a variety
of purposes, such as if you were programming, a chess board, a bingo board, or even an image
where each element in the two dimensional array contains an RGB value, which when combined
creates an image. Now referencing an element within a two dimensional array is mostly
the same as with one dimensional arrays, except you now need two indexes, the first number
would be the index of the column that you want to reference, and the second number would be the
index of the row that you want to reference. By combining these two numbers, we can single
out an element within the two dimensional array that will then be returned to us. As an example,
let's create a two dimensional array with 16 names contained within four rows and four
columns. Now to access the name Carl, you'd first single out the column, which
contains the name you're looking for. In this case, Carl is in the third
column, meaning it's in the second index. Next, you would single out the row which contains
the name you're looking for. In this case, the name Carl is in the third row. So we would
again use the second index to reference it. Combining these two pieces of information gives us
the index location two comma two. And if you look, Carl is indeed at index location two
comma two. Just as another example, Adam is in the first column two rows down. And
so to reference Adam, you would simply call upon the element at index locations zero comma one.
two dimensional arrays are just the beginning. You can also have three dimensional arrays, four
dimensional arrays, and so on for containing large amounts of advanced relational data. But two
dimension is where we're going to stop for today. All right, that concludes the
background information on arrays. Now that we know what they are, let's actually
talk about arrays as a data structure. This is where we're finally going to be using
our big O notation knowledge from the previous segment. to judge the array as a data structure.
We'll be going through the four criteria we talked about previously, accessing elements within an
array, searching for an element within an array, inserting an element into an array, and deleting
an element from an array, and scoring it based on how effectively they can complete these
four tasks using time complexity equations. Now accessing an element within an array can
be conducted in an instantaneous of one time. This means that for any element within your
array that you want to know the contents of, you can do so immediately by simply
calling upon its index location. This has to do with the way that
an array is stored within memory. For example, sake, let's look at a
fake portion of memory in the computer. You can see we have some information near the
beginning, maybe some integers or strings that we've initialized, that's not important. What's
important is that when we get to the part in our code, which instantiates an array, we already know
exactly how large the array is going to be. Either because we've populated it with the elements
needed, or given it a definitive final size. This means we also know exactly how much space
we need to reserve in memory for that array. Combining these two pieces of knowledge means
we can instantaneously figure out the location of every single element within the array
by simply taking the starting location of the array in memory, and then adding to it the
index of the element that we're trying to find. So let's say our array contains three elements,
each of them integers. And we want to know what the element at the first index is. Well, we
know that the array start storing data here. And we also know that every element within
the array is stored contiguously together in memory. And so by taking the starting location
for the array in memory, and adding one to it, we now know that that's where the first index
of our array is going to be, and can return that stored value appropriately. This is the main
reason why array sizes cannot be changed. All of the information within an array must be stored in
this one place. So that we're able to do this, the contiguous structure of an array prevents you from
adding space to it after it's been initialized, because it literally can't without breaking up the
data and adding in the space to store new elements down the road away from the initial array. This
would make accessing elements instantaneously not possible. Hopefully now you can see why both
accessing an element within an array is O of one, as well as why array sizes are final, because
not many other data structures allow you to have instantaneous accessing power, meaning arrays have
a huge advantage on the others in this metric. Next up searching through an array is O of n. This is because for the most part, you
will be working with unsorted arrays, those being arrays which are not in alphabetical
numerical or some sort of quantifiable order, meaning that to find an element within the array,
you may have to search through each element before you can find it. This is known as a linear
search. And if you want to know more about it, click the card in the top right corner,
and it will take you to that section in our introduction to programming series. A link
will also be provided in the description below. Now while there are faster ways to search
through an array, those only work when the array is sorted. And so for a basic array, in
worst case scenario, searching through it to find a particular element is going to be O of n.
Now Sure, there are going to be cases where the element you're searching for is the first element
in the array, or even somewhere in the middle. But remember that when working with big O
notation, we always use the worst case scenario. And in this case, the worst case scenario is that
the item that you're searching for ends up being the last element within the array, which for an
array of size n would take n operations to reach. Finally, inserting and deleting elements from an
array both have a time complexity equation of O of n for inserting this is because adding an
element into the array requires you to shift every element that's after the index, you want
to insert the value at to the right one space. For example, if you have an array of size
five, currently filled with four numbers, and you want to insert the number one into the
array at index zero, you would first need to shift every element to the right of the zero with index
right One, essentially freeing up the space for the new integer, while also retaining all of the
information currently stored within the array. Then, and only then can you actually insert it.
This requires you to traverse through the whole array of size n, which is why the time complexity
equation is O of n. Now, sometimes you might not have to shift the whole list, say in the case,
you had the same array of numbers, only this time the fourth index was free, and you wanted
to insert the number five in that open spot. Since you don't have to move everything to the
right one, in order to make space for this new element, does that make inserting into an array
have a time complexity equation of o of one? Well, no, because again, when we talk about a
functions time complexity equation, we always refer to it in the worst case scenario, the most
amount of operations that we're going to have to conduct before we can insert an element into an
array is n, the size of the list, making it's time complexity equation O of n. Deleting an element
from an array follows mostly the same idea, you shift every element to the right of the one, you
want to delete down one index. And essentially, you have deleted that element. Again, let's say
we had an array of numbers one through five, only this time, we want to delete
the number one at the zeroeth index, what we would do is set the zeroeth index
to whatever is contained at the first index, in this case, the integer two. And then we
simply repeat this process until we get to the end of the array, finishing it off by setting
the last element in the array to be some no value. This deletes the element from the array by
basically replacing it with a value to the right. And then this process is repeated into it
frees up one space at the end. Essentially, we're just moving the entire array down one.
Now worst case scenario, we're going to have to shift every element in the array of size n, making
deleting from an array also have a time complexity equation of O of n. So there you have it the
four time complexity equations for an array, accessing is O of one and searching,
inserting and deleting are all O of n. Arrays are a really good data structure for
storing similar contiguous data. Its ability to access any item in constant time makes it
extremely useful for programs in which you would like to be able to have instant access to the
information contained within your data structure. It's also a lot more basic than some of the other
data structures that we'll end up talking about, making it both easy to learn and reducing the
complexity of your code. Now, some disadvantages of the array are the fact that the size of the
array cannot be changed once it's initialized. Inserting and deleting from an array can take
quite a while if you're performing the function on larger data sets. And if you have an array
which is not full of elements, you're essentially wasting memory space until you fill that index
location with a piece of data. Overall, the array is a pretty reliable data structure. It has some
flaws as well as some advantages. A program that you write could use the array if need be, and
it would work just fine. But sometimes you might want some extra functionality. And that's where
more advanced data structures come into play. One of those more advanced data structures
is what's known as the ArrayList. The ArrayList fundamentally can
be thought of as a growing array, we just finished talking about the array and
how one of its major flaws was the fact that once initialized an array size could not
be changed using conventional methods. Well, in contrast, an ArrayList size expands as
the programmer needs. If you take a look at the ArrayList on your screen now full of four
elements, and you decide to add one to it, it will simply expand its size to fit five
elements. As you can probably tell this is extremely useful. Which begs the question of
why not just always use ArrayList. I mean, in comparison, the array list seems to provide
all the functionality of an array and then some Well, that's definitely a valid question, and
one we will get to later on in this section. But before we can do that, we need to cover
the basics of an ArrayList including some of the properties and methods associated with it.
Alright, let's hop in. So as we said before, an ArrayList is simply a resizeable array,
making them extremely similar in structure. This is furthered by the fact that an ArrayList
is actually backed by an array in memory, meaning that behind the scenes of your code, the
ArrayList data structure uses an array as its scaffolding system. For this series, we need not
go further than that. But for us, it just means that a lot of the functionality will be the same.
Now, this doesn't mean everything is going to be the same, which you'll see later on. But it's
still important to make note of before we get too deep into things. The next thing I want to do
before we talk about functionality is actually go over how we initialize an ArrayList. To do this,
again, is going to vary based on which language you're using. So shown on your screen now are
two different ways to do so in Java and C sharp, you may notice again, that there
is no Python on the screen. And this is because in the base version of Python,
arrays and array lists are actually not separate entities. They are frankensteined together into
a single data structure called lists. Lists take some functionality from
arrays and some from array list. It's a lot more complicated than that. But for
this series, that's all you're going to need to know as to why we are not including Python. In
this section. We discussed initializing Python lists in the previous section. So if you're
interested, you can go back and look at that. Now going back to ArrayList initializations.
Another thing you may notice is that it looks a little bit awkward in the way
that these statements are structured. And that's mainly due to the fact that the
ArrayList is actually its own separate class outside of the base version
of these two languages. Now, I'm not going to get into class hierarchy
and object oriented programming right now, because that's a whole other topic with
big concepts and even bigger words. For right now, this just means that to create a
new ArrayList, we have to invoke the ArrayList class when defining it, which as you can see is
done at the beginning of both initializations. After that, you can see that we give it a name,
which is then set equal to new ArrayList with a set of parentheses. Now in this parentheses, you
have a few options, you can either enter in an integer, which will then be used to define a size
for the ArrayList. Or you can just leave it blank. Leaving the parentheses blank like that will
automatically define a preset size of 10 for the ArrayList. Again, just as a reminder, this
can be dynamically increased as time goes on if we add enough elements, but
it's just meant as a base size. Now you may be wondering if we can actually
populate the array with elements when initializing it as we could with arrays. But array lists
actually do not support this type of declaration. Moving on, let's talk about functionality. Now
the array list can be thought of as pretty much the evolved form of an array. It's a bit beefier
has a little bit more functionality and is overall more powerful than array. That's certainly not
to say that it's better in every case. But for the most part, the ArrayList is going to be
thought of as the older sibling amongst the two. This is attributed to the fact that it belongs
to the prebuilt ArrayList class, which we talked about earlier. The fact that the ArrayList belongs
to a class means it's going to come with pre built functions that are already at our disposal from
the moment we define and instantiate an ArrayList. More specifically, the ArrayList comes with
methods we can use to access change, add to or delete from an easily if you were using an
array, you would have to program most if not all of these methods by hand. And so having them pre
built into the ArrayList class makes it especially useful. You'll see that this is the case with
a lot of the data structures down the road, we're having a data structure belonging to its
own class cuts out a lot of programming time, you have to spend making appropriate methods
for the functionality that you might want. Now the types of methods that you're
going to get will vary based on language. For example, in Java, you'll have a variety of
methods to use, including ones to add elements to the ArrayList, remove them from the
ArrayList, clear the ArrayList entirely, return it size, etc, etc. as well as
tons of other more specific functions. And another language though, such as C sharp,
you'll have some of the same methods as the Java ArrayList. But you might also have some methods so
that the Java version does not and vice versa. The C sharp version might not have some of the methods
that the Java version does. The same is going to apply for any other language you use, which
might implement the ArrayList data structure. Because of the variability surrounding the
ArrayList amongst languages. In this series, we're simply going to be covering six of the most
common methods that are both useful and can be found in 99% of the ArrayList classes. These six
methods are the Add method, the Remove method, the get and set methods, the clear method, and
the to array method. This may look like a lot but remember, all of these are pre programmed
for you And so you don't have to make them all by hand, all you have to do is call them on
a pre made array list and you're set to go. Speaking of pre made array lists, before we dive
into each of these functions and how they work, let's first create an example ArrayList, we
can use to manipulate and show how they work. Let's call it example a list and give it a
size of four so that it's not preset to 10. And we're set to go. Now, the Add method actually
comes in two different types. One, which takes in only an object to add to the end of the ArrayList.
And one which takes in both an object to add to the ArrayList, as well as an index value
representing the index to insert the object at. Let's start with the simpler of the two, one,
which simply takes in an object. This method is for more basic cases where you don't care about
where in the ArrayList, the object you wish to add is going to end up, it will simply append the
object you pass in as an argument to the end of the ArrayList. So let's take our example ArrayList
and run the Add method on an integer of two. Now, normally, ArrayList only hold objects, not
primitive types, like the integer two that we're trying to pass in. However, the computer will
automatically convert our primitive integer into an integer object with a value of two, so that
we can still add it easily. This is known as auto boxing. And it's going to be used throughout the
rest of the series to modify data structures with primitive types. So I just thought I'd mention
it now so that you're not confused later on. Okay, so when we run our code, since the
ArrayList, is empty, and we ran the Add method, which doesn't care about location, it's going
to add the integer to at the first index, index zero. Now if we run another add method, and
this time pass in the integer five as an argument, since the zero with index is already taken,
it will be slotted in as the first available open index, that being the first index.
So as you can see, it's pretty simple. It takes in an object and we'll put it
at the first open location available. Moving on to the second type of add method, one
which takes in an object to add to the ArrayList, as well as an index to place it. This
one works very similarly to the previous only makes sure that the object that you're
adding is appended at the index provided. Again, let's say now we want to add the
number one to our example ArrayList. But we want to make sure it's placed in numerical
order. In this case, at the zero with index, what we will do is call the Add method, providing
the integer one as an argument. In addition to the index zero, we want to add to the ArrayList. Once
the code has run, the ArrayList will automatically shift our integers two and five to the right
one, in order to make space for the integer one. This works for any index contained
within the bounds of the array list. So if we wanted to do it again, only this time,
insert the number three at the second index, so that the list remains in numerical order.
We'd call example, a list dot add, and then in the parentheses passing the integer three and
the index location two. After the code has run, you'll see that we have added
the integer into our ArrayList. Now it's important to note that because there
are two integers being passed in as arguments, you must know which one your computer's treating
as the integer and which one the computer is treating as the index location. mixing these up
could cause the computer to try and insert the attempted index location at a different location
than the one you were attempting to insert at. So just be careful and knowledgeable as
to the order your methods arguments are in. Now the next method that comes pre packaged in
the ArrayList class is the Remove method. And this one also comes with two different types. The first
takes in an integer as an argument, and just as the name suggests, will remove the object if there
is one at the index location provided and return it back to the user. The second takes in an object
and will simply remove the first instance of that object within the ArrayList If present, and will
then return true or false whether an object was removed. So if we wanted to remove the number five
from our ArrayList, we would have two different options. We could call example, a list dot remove,
and inside the parentheses placed the index of the value we want to remove, in this case three, and
the program will remove the object at index three. The other option would be to run another
remove method, only this time passing an integer object of five. It has to be an integer
object because if we were to just use five, the computer would try to remove the fifth index
of the ArrayList, which doesn't even exist. By creating an integer object, we
can ensure that when the code runs, the computer knows that we want to remove
the first instance of the number five in our ArrayList. And running this will return
true and remove the integer from our list. Now, if there is no integer five in the ArrayList,
the Remove method will simply return false. Now I quite like the number five, so I'm
actually not going to permanently remove it from the ArrayList just yet. Up next is
the get method. Now the get method is pretty much the same as referencing an index for an
array, it takes in an index location, and will return back to you the value at that location. So
example a list dot get with an argument of zero would return one example a list dot get with
an argument of two would return three, and so on. The next method is the set method, which
is how we replace elements within an ArrayList. Much like the name implies, it takes
in an index and an object as arguments. And we'll set the index location of the index
you passed in to the object you also passed in. So if we wanted to set the number five in our
ArrayList, to be four instead, so that it matches nicely with the other integers, what we would do
is call example, a list dot set, and within the parentheses pass in the index location of the
element, we want to set, in this case three, and then also the object we want to replace
at that index. In this case, the integer for this method call will override the element
at position three to be four instead of five. Now, you should be really careful when you're
using this method, because you don't want to accidentally override an important element within
the ArrayList. Next up is the clear method for when you simply just hate your ArrayList. This
is perhaps the simplest of them all. It has not taken any arguments, and will simply clear
the ArrayList deleting every element entirely. Calling example a list clear on our ArrayList
would delete all the objects within it. But I don't really want to do that right now, especially
with one more method to go. So for the sake of this series, let's just keep the ArrayList
filled with the values that it currently has. The final method that we'll be covering in this
section is a little bit different from the rest. And that's the to array method, which is
used to convert an ArrayList to an array. I thought I would include this one because it's really useful for combining the strengths
and weaknesses of arrays and array lists. The two array method takes in no arguments, and
will simply convert the array list into an array. Now for this to work, of course, you need to set
it to be equal to the creation of a new array, like shown on your screen now. But if it is done
correctly, you'll end up with a brand new array, which contains all of the contents that used to be
in the old array list. You may notice though, that instead of an array of integers, it's actually
an array of objects. This mostly has to do with that object oriented programming stuff we talked
about in the beginning. But for now, it won't make too much of a difference, we can still treat
it as a regular array, printing out indexes to the console, placing elements within it. Typical
array functionality. The only thing that changes is that it now contains integer objects instead
of primitive integer types. So there they are the six major functions that come with any given
version of the ArrayList class. Having these at your disposal will account for much of the
functionality, you might use an ArrayList for making them extremely valuable to now Let's now
move on to the array list as a data structure, again, we're going to be looking at its four time
complexity equations for accessing, searching, inserting and deleting. Now, if you remember back
to the beginning of this segment, we mentioned that the ArrayList is backed by an array in
memory. And this means just like the array, it too will have over one accessing power. Essentially,
this means that when we use our get method, which takes in an index location, it will
return to us the value at the index provided in instantaneous time. Now you might be wondering,
how is this possible since the data stored within an ArrayList is certainly not contiguous?
With is actually due to a really interesting reason. So interesting that before scripting the
series, I actually had no idea was the case. So because it is my series, I'm going
to take a moment to talk about it. So if we pull up our example ArrayList and memory,
you can see that it looks a little bit different. Let's break down what's going
on. instead of storing the actual objects which are contained within itself, an
ArrayList actually stores references or pointers to the locations of those objects in memory.
So the zeroeth index based on the ArrayList is stored at the 87th location in memory,
which is currently storing the integer one. Checking back to our example ArrayList, you'll
remember that that is indeed what was stored at the zero with index of the example ArrayList. And
this goes for every element within the ArrayList. The first is stored at the 91st memory
location, the second at the 100th, and so on. So as you can see, while the actual data is not
stored contiguously the references to that data are. So when we run our get command, all it
has to do is return the value that's stored at whatever the index location points towards.
It's a bit more complicated than that, especially the way that these references get stored. But
that covers the gist of it. This is the reason our ArrayList can have instantaneous accessing
power without having the data stored contiguously. Alright, tangent over, let's get back to
big O notation, time complexity equations for the ArrayList. So we know accessing is going
to be o of one. And searching is going to be O of n. This is for the same reason that arrays
were o n. If we want to find out if an object is contained within our ArrayList. And that
object is at the last index of the array list, we're going to have to search through
each and every element within it of size n to make sure because remember, we
always go based on the worst case scenario. Now inserting into the ArrayList is going
to be O of n. Because worst case scenario, if we are inserting an element at the beginning
of the ArrayList, we need to shift every element after the index we're inserting to the right
one, just like we needed to do for the array. This requires a number of operations equal
to the size of the array, making inserting O of n. Deleting is O of n for the exact same
reason. If we want to delete the first element within the ArrayList, we would then have to
shift every element down one to save space. Additionally, if we want to delete an
object contained at the last index, we have to search through the whole list
to find it. Either way, it will be O of n. Alright, so there are four time complexity
equations, accessing his old one and searching for inserting and deleting are all O of n. If that
sounds kind of familiar, that's because these are the same as the arrays time complexity equations.
See I told you they were similar. Now this does bring back up the question we posed at the
beginning of the episode. Why even use an array in the first place in comparison to the ArrayList
the array just does not seem as useful? Well, let's get into that by sizing these two data
structures against each other mano a mano. Let's compare. An array is a collection with
a fixed size, meaning it cannot be changed, whereas an array list has a dynamic size, which
can be updated to fit the needs of the programmer. Arrays can store all types of data, meaning both
primitives and advanced types, whereas array lists can only store objects, not primitives. Now this
problem is mostly solved through the auto boxing situation I talked about previously, but the
fact still stands. Moving on an array is built into most languages, meaning it doesn't have any
methods pre made for you to interact or modify it. Whereas an array list is its own class, meaning it
comes with useful methods to help you utilize it. Finally, an array is very primitive in nature and
doesn't require a lot of memory to store or use. Whereas an array list is again a class, meaning
it requires more space and time to use than an array will. Hopefully now you can see that while
the array list is more powerful, it still does have some drawbacks which make using an array
sometimes more appropriate. In general, you want to use arrays for smaller tasks where you might
not be interacting or changing the data that often and array lists for more interactive programs
where you'll be modifying the data quite a bit. So that's the ArrayList. As a review, it is a
dynamically increasing array which comes with a slew of methods to help work it as the array
list is a hefty topic in computer science. If you feel I didn't do a good enough job
explaining some of the concepts in the video. The sources used to write the script for this
video will be linked in the description below. But if not, let's move on
to our next data structure. The next data structure we'll be talking about
is the stack. Now at this point in the video, we'll be diverging from what are known as random
access data structures, ie arrays and array lists, and diving into sequential access data structures.
What's the difference? Well, if you remember back to our discussion on arrays and array lists, we
were able to access any element within the data structure by calling upon its index location. This
would then result in the computer instantaneously returning the value at that location. Each element
was independent of the one before or after it, meaning obtaining a certain element did not
rely on any of the others contained within the data structure. That basically describes
the gist of a random access data structure, one where each element can be accessed
directly and in constant time. A common non computer science example of a
random access would be a book. Getting the information contained on a certain page doesn't
depend on all the other pages within that book. And getting an element contained within a
random access data structure doesn't depend on all the other elements contained within that data
structure. In contrast, elements in a sequential access data structure can only be accessed in
a particular order. Each element within the data structure is dependent on the others and may
only be obtainable through those other elements. Most of the time, this means that accessing
a certain element won't be instantaneous. A common non computer science example of
sequential access would be a tape measure. To get to the measurement of 72 inches, you would
first have to go through inches one through 71. There's no way to just instantly get to 72 inches without first breaking every law of thermodynamics
and probably the space time continuum. These are also sometimes called limited access
data structures, because unlike random access, the data is limited by having to obtain it through
a certain way. So there you have it, random access data structures, which allow instantaneous
accessing power and independent elements such as the array and array list. And then
you have sequential access data structures, which only allow accessing through a
particular order with dependent elements. For the next few segments, we'll be covering a few
of the popular sequential access data structures, such as stacks, queues, linked lists, and so
on. Beginning of course, with the stack, we've already danced around the subject long enough. So
let's finally talk about what a stack actually is. Now, the stack, by definition, is a sequential
access data structure in which we add elements and remove elements according to the
lifepo principle. The lifepo principle, which stands for last in first out, means that
whichever element we added to the stack last will be the first one we retrieve. Hence, last
in first out, think of this as a stack of books. Each time you add another book to the top,
you do so by stacking it on top of the others. Then, if we want to get a book in the middle
of that stack, we would first have to take off all the books on top of it, we can't just
grab that book from the stack and expect the entire thing not to fall in on itself
like some sort of literature ristic Jenga. The same goes for the stack as a data structure,
we add elements to the top of the stack and retrieve them by taking them off the top of the
stack. There's only one way in and one way out for the data. We can't simply access or modify an
element within the stack all willy nilly like we were able to do with the array and the array list.
This might seem like a weakness limiting the flow of data to one single point. But remember what
I said during this segment on time complexity, many of these data structures have a built in
functionality, which gives them an edge over the others. And the stack with its life.
Oh property is a prime example of that. Next up, let's talk about some of the
methods commonly associated with the stack. Now the stack class will always come with two
methods pre built into its class, those being push and pop. These are the fundamental methods
which we use to interact and modify the stack, making them extremely important to comprehend.
In addition to those two, I'm also going to cover two more methods peak and contains, which are
both useful and can be found in the stack class associated with most programming languages. Now,
just like we had an example ArrayList let's also create an example stack to help us show off these
methods. And just like that, we're ready to roll. Now push is a method which pushes an object
onto the top of the stack. All it takes in as an argument is an object to add to the stack
and will return no value. When we do this, that object becomes the form front of the stack,
and its size is dynamically increased by one. Let's start running some push commands
on a variety of different strings, you'll see that the stack slowly gets
built. Now channel this to subscribe, a series of completely random words
seamlessly pushed onto the stack. As you can see, we're only adding
information to one spot the top, there's no insert method where we can just
jam a string into the middle of the stack. Moving on, the pop method is used to remove
an element from the top of the stack, it does not take in any argument and will return
the element that is popped off the stack back to the user. Once a pop method has run, the element
that was at the top of the stack is removed and returned back to the programmer. And the element
which was second from the top becomes the new top of the stack. And so on our example stack, if we
popped off each element, you'd see that each time one of these completely random strings is taken
from the stack and returned back to the user until we are left with a sad, empty stack with no
shameless promotion. Let's add the strings back, obviously, just for the sake of the
next two methods that we need to cover, and not for continuing the free advertising. So push and pop are how we add and remove the
data within our stack. So they're fundamentally the backbone of the data structure. The next
two I want to talk about peak and contains are more so used to interact with the data inside
the stack without actually changing it. Not as useful for modifying data, but still extremely
important. The first of these is the peak method, the peak method allows you to get the value at
the top of the list without actually removing. We talked about before that the only way to
access elements in a stack was through the top. And this method is simply a way to look at
what the top value is without having to pop it off. It takes in no arguments and we'll just
return back the contents have the top element. In our case, if we ran it on our example stack,
subscribe would be returned back to the user. Now if we popped off the top element and ran
it again, two would be returned, it's that you get the idea. Again, let's push subscribe
back on at the top for educational purposes. Now the final method I'll be
covering is the contains method. This one is used for searching throughout the
stack. It takes in an object as an argument and will return a Boolean of whether or not that
item is contained within the stack. Essentially, this method allows us to search through the stack
without popping off every element until we find the one that we're looking for, as the contains
method does not modify the stack in any way. So for example, example stack contains with
an argument of subscribe would return true example contains with an argument
of this would also return true. But example contains with an argument of Hello
would return false. So there they are four common stack functions which are going to be vital if
you ever want to construct a stack based program. Moving on to time complexity, for accessing the
stack has a time complexity equation of O of n. This is because in order for us to reach a certain
element within the stack, we first have to pop off every element that's above. Think of it like
this, if we had a stack of stones, and needed to get to the bottom one, we'd first have to take off
every stone on top of it. So worst case scenario, if the element we want is at the bottom of
the stack, we first have to pop off every element above it. This would require a number
of operations equal to the size of the stack, making the time complexity equation Oh event. This
is one of the major drawbacks to using stacks. with arrays and array lists, we could easily
access any element within the data structure instantaneously. And with a stack, that's just not
possible because of the way that it's structured. Searching is going to be O of n for the exact same
reason. Worst case scenario, if we're searching for an element that's at the bottom of the stack,
we have to go through the whole thing just to find it. Now, inserting and deleting make up for this
by having time complexity equations of o of one. This essentially boils down to the fact
that using our push and pop methods really only requires one operation. Since the
data only flows in and out of a single point, inserting or removing an object from that point
can be done immediately. For the push method, we simply add it to the top of the stack. And
for the pop method. We just take it off from the top. It's not rocket science. Actually, it's
computer science but that's Besides the point, there's no need to rearrange data or move elements
around like there was for the array and ArrayList, because we don't have to the data that
we're modifying is right there on top. So there you go, the time
complexity equations for the stack. Now, you might be wondering if there are even
uses for a first in first out data structure, because it seems kind of out there, I mean,
limiting your data to a single entry point. But you would actually be mistaken. stacks are
used everywhere, both in the actual writing of other code as well as in real world situations.
In fact, one of the most fundamental processes of computer science uses stacks as a way of
keeping track of active functions or subroutines. Of course, I'm talking about recursion. Now,
I won't get into recursion too much here. But basically, it's the process of
functions repeatedly calling themselves when the function calls itself, that call is
added to a stack of processes. And when that stack finally reaches what's known as a base case,
the functions are all then popped off one by one. It goes much, much deeper than that. But we don't
have time for a full blown lesson on recursion. Right now. If you want that, you can click the
card in the top right corner of your screen, or the link in the description below, which will
both take you to that part in our introduction to programming series where we cover it. Either
way, stacks are the backbone for recursion. And recursion is the backbone for a plethora
of computer science related functionality, such as traversing data structures, keeping
track of active subroutines in code, and much, much more. Some examples of stack based functions
outside of computer programming, which you use every day include the Undo redo button in word
processors and the back paging on web engines. Both use stacks. Similarly, they continually
add tasks you've completed to a stack, either web pages that you've visited or words that
you've typed. And then when you press Undo, or the back button in a web browser, all we have to do is
pop off whatever the last task was off the stack. And bam, you're right back to where you were a
second ago. It's like magic, but better in a way. As you can see, the stack has a lot of real
world applications, both on the consumer side and the client side. You interact and use stacks
every day without even realizing it. And we'll use stacks frequently as you continue along your
computer science journey. And so by learning them, you're opening up a world of opportunities. That
concludes our discussion on the stack. To review. It is a sequential access data structure in which
we use the lifepo principle to add and remove elements from and again, life O stands for
last In First Out of next we'll be talking about an equally useful data structure that
functions very differently than the stack while also working very similarly. And
that data structure is known as the queue. So now that we've talked about the stack,
a sequential access data structure, which follows the life principle, we need to
cover the opposite, which in computer science, we obviously call a queue. Now by definition,
a queue is a sequential access data structure, which follows the FIFO principle, or first in
first out, the stack and q are very much a dynamic duo when it comes to the world of comp sigh. So
you'll notice a lot of similarities between the two and the way that they're structured and
how they work. Today, and in this segment, we'll cover a lot of the same topics as we did
with the stack, only for the queue. Now timestamps can be found in the description correlating to
these topics. But if you're sticking around, let's dive deeper into what the queue actually
is. Well, the queue like the stack is a sequential access data structure, meaning we can only access
the information contained within it a certain way. Now, if you remember back to stacks this certain
way was through the lifepo methodology, or lastin. First out with the last element pushed onto the
stack would always be the first one we popped off, similar to a stack of books
that we add to and remove from. Now in contrast, the queue follows what's known
as the FIFO methodology, or first in first out, where the first element added to the queue
will always be the first one to be removed. We can think of this as a line to your favorite
amusement park ride. Mine as a kid was always the Ring of Fire. So let's just use that one
as an example. The first one to get there, assuming we don't have any cutters will always
be the first one who gets to go on the ride. The later you show up, too long do you have to
wait. This is the strategy used for cues when adding and removing objects. The first element to
be added will also be the first one to be removed. Another big difference between stacks and queues
is the location we add and remove elements from. You might remember that with the stack, we added
and removed elements from one spot the top. With the queue. However, this is different. We add
elements, the back, also known as the tail, and we remove them from the front, also known
as the head. This allows us to make sure that we 100% follow the FIFO methodology. So there's your
background information on the queue, sequential access FIFO methodology, add elements the back
and remove them from the front. Got it? Good. Now, let's dive headfirst into how we can actually use
a queue by jumping into some common queue methods. So just like the stack, we're going to
have two methods used to add and remove elements from the queue. For the stack, we added
elements with push and removed them with pop. In contrast, with a queue, we add elements
using on cue and remove them using dq. In addition to these two methods, we're
also going to cover peak and contains, which if you watched the previous segment should
look pretty familiar. Alright, let's start. on cue is the first method and the one we use to
add elements to the tail of the queue. It takes in an object to put at the end of the queue, and
simply adds that object, we'll also increasing the size of the queue by one, let's pull up an example
queue, which can see currently has a size of zero. But say we call on queue on a completely random
string, let's say the string now, that would be added to the tail of the queue, and the size would
increase by one. Now, because there's only one element in the queue, at this point, the string
now is acting as both the head and tail for us. We can of course fix that by on queueing a few
more completely random strings. If we add the string video, the size goes to two, we can add
this and it goes to three, like makes it four, you get the idea. And now we have a fully
functional queue, as you can see, like is acting as the tail being that it was the last
string added. And now is being treated as the head which makes sense considering it was the first
string to be added. Moving on. dq is the method we use to actually remove elements from the head
of our queue. It takes in no arguments and will return the element that was removed from the queue
back to the user. So if we ran a dq command on our example queue, you'd see that now is both returned
back to the user and also removed from the queue. Additionally, the size of our queue
has been dynamically decreased by one. If we run it again, video is returned and removed
from the queue and the size goes down by one yet again, you get the idea. We can keep doing this
for every element in the queue until it's empty. But the next methods we're going to talk
about need some information to work with. So for now, let's refill the queue back to its
original four elements. The next method that I'll discuss is peak. Now we've actually covered peak
just a few moments ago in our segment on stacks. But if you forget or just didn't watch that
particular part, peek returns the object that's at the forefront of our queue. It doesn't take in any
arguments and simply returns the foremost object of the queue without actually removing it. The
key word there being without this method allows you to look at the head of the queue before you
actually be queuing. There are a multitude of reasons that you might want to do this, maybe
to make sure that the element that you're about to dq is the correct one, or to check to
see if an element you need is still there, etc, etc. Whatever the case is, we can
use the peak method to fulfill our needs. If we were to run it on our example queue, you'd
see that the string now is returned. But if we dq the first two elements and run it again,
you'll see that the string This is returned, pretty simple, but extremely effective. Again,
let's add video And now back into the queue for our next and final method. That method of course,
being the contains method, the name pretty much says it all. The contains method will take an
object and will return a Boolean of whether or not the queue contains that object. running it
on our example queue with an argument of queue would return false because as you can tell,
there is no q string contained within our queue. However, if we ran it on a string, such as video,
it would return true because as you can see, the string video is indeed in the queue. And
there they are all together now on cue dq peak and contains four methods which will help you
utilize a queue to its maximum efficiency. Speaking of efficient See, that takes us perfectly
into the time complexity equations for the queue. Now accessing an element within a queue is going
to be Oh event, let's say you had a queue full of three elements. If you want the object at the
tail, you first have to dq every element off the front into the one you're looking for is the head
of the queue, Then, and only then can you actually get the value contained. Since this may require
you to go through the entire queue of size n, accessing is going to be O of n. Remember, now
queues are sequential access data structures and not random access, meaning we can just
grab an object from the middle of the queue. That's just not how things work. Searching is
going to be O of n for the exact same reason, trying to find an element contained at the tail
of a queue requires you to iterate across the entirety of that queue to check for it. So in that
scenario, we have to check every element within that queue of size n, making the time complexity
equation O of n, inserting two and deleting from a queue are both going to be an instantaneous o
of one. This is because just like with the stack, we're only ever on queuing at a single point,
and we're only ever D queuing at a single point. This means that no matter how large the size
of the queue is, it will always take the same number of operations for any magnitude
to either insert or remove an element. And there they are, in all their glory, the time
complexity equations for the queue, you may notice that they're identical to the stack, which if
you've been paying attention is the truth for most of the properties about a queue. They're very
much a yin and yang, one in the same type deal different slightly in terms of the methodology
you use to add and remove objects from them. You'll often see these two data structures talked
about together frequently, just because of how similar their functionality is. The final thing
I want to cover on the topic of queues are just some common uses for them within programming.
What are these things actually used for? And the answer is quite honestly, a lot on your
computer. Right now. queues are being used for what's known as job scheduling. The process
through which the computer determines which tasks to complete for the user. And when like
opening up a web page or a computer program. It's used many times in printers to keep track
of when multiple printers Try to print and determining whose documents get printed first.
Heck, if you're looking for real world examples, Google even uses cues and their new pixel phones
to enable what's known as zero shutter lag, in which they strategically use cues to eliminate
the time between when you take a picture and what the phone actually captures. So yeah,
in terms of functionality, the queue can be used in a variety of fields. So it's good now that you
know what they are and how to use them. This also concludes our discussion on the queue. To review
the queue is a sequential access data structure, which follows the FIFO principle, were first in
first out to add elements to the back and remove elements from the front. Up next, we'll
continue on the sequential access data structures train and talk about one of
my personal favorite data structures. Next, we'll be covering the linked list, and
it's a good one. So strap into your seats. Let's just jump into things by talking about
the elephant in the room. What exactly is a linked list? Well, to answer that a linked list
is a sequential access linear data structure in which every element is a separate object
called a node. Each node has two parts, the data and the reference, also called the pointer,
which points to the next node in the list. Wow, that is a pretty big sentence with a lot of ideas
within it. So let's break it down part by part. The sequential access part of that statement means
that the information contained within the link list data structure can only be obtained through
a certain methodology. We talked about this in our segment on the stack. If you remember, during
that segment, we compare them to a tape measure. Because Similarly, you can only get measurements
from a tape measure through a specific method. The specific method for a linked list
will be covered a little bit later for moving on. The linear part of the definition
simply means that the information or data is organized in a linear fashion in which
elements are linked one after the other. Now when we state that every element is a separate
object called a node, this means that unlike an array or an ArrayList, where every element
is just let's say a number, each element in a linked list will actually be an object which
can have multiple attributes or variables. And I won't dive too far into object
oriented programming right now. If you want a good introduction to that, you
can check out the link in the description below, which will take you to our introduction
to object oriented programming lecture. For this series, we essentially mean that the
objects or note stored within each element of our linked list will hold two separate pieces
of information. These two pieces of information come together to form the node. And these
nodes are what make up our linked list. More specifically, those two pieces of information
are the actual data or information stored within that node, and the reference or pointer to the
next node in the linked list. The data is where our strings or integers or Boolean values are
kept the actual contents of our data structure. And the other piece of the puzzle is the pointer.
This reference or pointer points to where the next node in the linked list is stored in memory.
This helps link all of the nodes in a linked list together to form one long chain of information.
Kind of like a computer science conga line, you should now be able to understand what
a linked list is. Again, it's a sequential access linear data structure in which every
element is a separate object called a node, in which each node contains both data and a
reference to the next node in the linked list. This cumulatively creates a string of nodes which
we call the length list. Feel free to rewatch this explanation if you're still confused, because I
know it can get pretty hectic pretty fast. The next thing I want to do in this section is just
visualize how one of these linked lists is set up. That way, you can actually see what I mean when
I say nodes, error, and pointers and everything else, etc. So every linked list starts with
what's known as the head node of the list. This is just an arbitrary label, which represents
a node containing some form of data, and also a reference to the next node in the linked list. For
now, let's just store the integer one in our head note. Now, since this is the only node so far in
our linked list, the head node will simply point towards a no value, which is just to say it's not
pointing anywhere. Essentially, it's pointing to nowhere and just being used as a placeholder until
we actually give it something to point towards. Let's do that by adding another node to
the link list and store the integer to inside of it. Instantly, you can see that now our
head node points to this second node instead of a null value. You'll also notice that this new
node, which I'll call the two node points to a no reference value, just like the one node used to
do, as it is now the last node in the linked list. This is a pattern you'll notice as we continue
adding nodes. The last node in the linked list, also known as the tail node will always point
towards a null value. This is our way of telling the computer that we reached the end of our linked
list and that there are no more nodes to traverse towards. Anyways, let's now create our third node
and inside store the integer three, this node now becomes the tail end points towards a North value.
And the second node that we created, the one that used to point towards a null value. Now points to
this new node which contains the integer three. Essentially, we're just adding a node to the tail
of a linked list and then setting the references of the previous node to point towards that new
node. And if you can understand that concept, you pretty much comprehend the gist of how these nodes
interact. The two main takeaways are that every node has both information and a pointer. And the
last node in a linked list points towards a null value. That's pretty much the setup for a linked
list. Definitely more abstract and fluid than, say an array or ArrayList, but hopefully not too
complicated. Next up, we're going to be discussing how we add and remove these nodes from a linked
list. Unfortunately, for us, adding and removing elements from a linked list isn't going to be as
simple as it was with the other data structures such as the stack or the queue. This has to do
with the simple fact that we are actually able to insert and remove elements easily within a linked
list at any location. With a stack or a queue, we weren't actually able to do this because the
data could only flow in and out of a few specified points. This made it so we couldn't remove an
element from the middle of the stack or jam an element in the center of a queue. The way that
they're structured just didn't allow for this. Using link lists. However, we can easily
remove elements from the middle or jam elements in the center. And so today, we'll
be covering the different methods used to do so. More specifically, we'll be covering three
different ways to both insert and remove nodes from a length list. Adding to and
removing a node from the head, the middle and the tail of a linked list. These are all
going to revolve around those pointers we talked about at the beginning of the episode. Because
whenever we change up a node in a linked list, we also have to change its pointers. And that can
get pretty complicated pretty quickly. Luckily, that's why I'm here. I've set up a basic linked
list on your screen now with three nodes that we can use to play around with. Each node has a value
of course representing the data inside the node, and also a pointer which points to the next note,
the green and red coloring on the nodes with the integers one and three, simply indicate which node
is the head node and which node is the town node. Green means head node, and red means it's the
tail node. Now in an actual link list, these pointers would be locations in memory. But for
the purpose of this series will be representing the pointers visually perfect. So let's cut
the chitchat and just jump right into it. The first method we're going to be covering is
adding and removing nodes from the head of a linked list. Lucky for us, this is pretty simple.
To add a new node to the head of a linked list, literally, all we have to do is make that new
nodes pointer point to whatever the current head node of the linked list is. By doing this,
we simply take away the title of head node from the current head and bestowed upon this new
node that we're adding, it looks a little bit something like this. Let's take a node with the
integer zero and add it to the head of the linked list. All we have to do to do this is set its
pointer to point towards the current head node. Now, you'll see that none of the other nodes have
changed in the slightest. The only thing that has changed is that this new node with integer zero
is the head node, and it now points towards the old head node. extremely simple, and the
best part is that it works in reverse as well. Removing a node from the head of
a linked list is just as simple. All we have to do is set the head nodes pointer
to some null value. Once we do, the head node will get cut off from the flow of information
essentially removed from the linked list. If we did this on our example linked list, you'd see
that once we set the zero nodes pointer to No, because it no longer points to the one node and no
node points towards its location. It has been cut off from the rest of the nodes and exiled from
the linked list. The old head node regains its position. And it says if this integer zero node
never even exists. Moving on the next methods we're going to cover are inserting and deleting
a node from the middle of the linked list. These two methods are definitely the most
difficult of the bunch. And this is because we need to insert the node in such a way that the
pointers get readjusted accordingly without losing any of the information. If we accidentally set
the pointers wrong, or do things out of order, the data could get lost forever. But luckily,
I'm here to teach you how to do it the right way. We'll start with adding a node
to the middle of a linked list. Adding to the middle of a linked
list is a two step process. We first make the pointer of the new node point to
the node after the location we want to insert at. Then we set the node before the location we want
to insert that to point towards the new note. So if we wanted to insert a node with the double
value 1.5 after the node containing the integer one, but before the node containing the integer
two, what we would first do is set the new nodes pointer to point to the node containing
the integer two, then we would make the node containing the integer one point towards
our new node which contains the double 1.5. By adjusting the pointers of the nodes before
and after the location, we want to insert that we can strategically jam this node in the correct
place without moving or modifying the entire list. As you can see, now we have a linked list of
length for where the link has not been broken and is also still contiguous. Removing a node
from the middle of a linked list is even simpler. All we have to do is make the pointer of the node
previous to the one we're removing to now point to the node after the one removing. Then if we set
the pointer of the node we want to remove equal to ignore value, we again cut the node off from
the linked list and it is removed. Simple as that. So following these steps, if we now wanted to
delete our 1.5 node, we'd make the one node again point towards the two node instead of
the 1.5 node. Then if we delete the pointer of the two node, by setting it equal to null, it
gets cut off from the flow of the linked list. No changes are externally made to the rest
of the list. Just one lonely node removed. The final type of insertion and deletion we'll be
covering is inserting to and deleting from the end of a linked list. Doing this simply requires To
modify the tail node of the length list, the one which currently points towards some null value.
for adding a node to the tail, you simply make the current tails pointer, which is currently set
to no to point towards the node you want to add. So if we wanted to add a node with the integer
four, we would just make the three node point towards this new node. Then by setting the four
nodes tail to point towards No, we've made this new four node the tail of the linked list, and the
old tail node with integer three now points to the new tail. Removing the tail of a linked list is
just as simple. If we want to remove the tail, we just set the previous tail to point towards
a null value instead of the current tail. This leaves the current tail node with no
connection to the linked list, isolating it from the pack. Doing this on our list would
look like making the three node point towards No. And now because no node now points to our tail
node continuing integer four anymore, it gets cut off from the linked list and essentially removed
making the old tail node, the one containing the integer three, the current tail node once again.
And so after a bit of pointer readjustment, hopefully now you can understand the pseudocode
behind inserting and removing elements from a linked list with ease. Up Next are the time
complexity equations for a linked list. Now accessing an element within a linked list is
going to be O of n. This is because linked lists, again are sequential access data structures.
This should be all too familiar if you watch the sections on stacks and queues. But
if not, let's just use a simple review. sequential access data structures can only be
accessed through a particular way, meaning we can't get any element we want instantaneously.
resulting from this is the stipulation that if we want to get a certain element within a link list,
we need to start at the beginning of the list and cycle through every node contained within it.
Before we can finally access the one that we want. We do this by using the pointers
as a little map for the computer. First go to node one node one's pointer
gives you the location of node two in memory. And node twos pointer will take you to
the memory location, which contains the node with the information that you want. It's
like a computer science treasure map and a way for a linked list of size n. This means
you could have to traverse the entire linked list before finding your information
making it's time complexity equation O of n searching is O of n for the exact same reason, we
check a node and if it's not the one that we want, we use that nodes pointer to take us
to the next node and check that one. We do this for every node until we either find
the node containing the value we want, or get to a node which points towards no indicating
that we've reached the end of the linked list that the value that we're searching for just isn't
contained within that particular length list. Inserting and deleting from a linked list
is a little bit complicated. Since linked lists usually only store the head and sometimes
the tail nodes location in memory permanently. If we want to insert or delete an element
at the beginning or end of a linked list, we can do so instantaneously using the methodology
we talked about previously. However, if we want to insert a node within the linked list, things
become a little bit more complicated. In order to insert a node within the link list, we must first
traverse to the location we want to insert it. This means following that treasure map until
we reach the insertion point. And then and only then can we actually follow the instructions
to insert or delete a note. So depending on how and where you want to insert or delete a node at
its time complexity equation will be either O of n or o of one a little confusing, yes, but necessary
dimension for when it inevitably comes up. So in review, accessing searching, and sometimes
inserting and deleting are all going to be O of n. And other times inserting and
deleting will be instantaneous. Cool. Now let's finish up by talking about some
real world applications of the linked list. Now, something I haven't talked about before, but
would be scolded by people in the comments for not mentioning is that linked lists can actually
be used in the backing of other data structures. What do I mean by this? Well, basically,
we can use linked lists to make stacks, queues and some other data structures we haven't
talked about. This is in the same vein as earlier when I mentioned that the ArrayList uses the
array as a backing support system in memory. Now this goes a little bit above an introductory
series, but it's one of if not the most important uses of a linked list in computers. Science.
So I thought I'd mentioned it here. Basically, because of the way that it's structured. Having a
stack or queue use the nodal methodology that we talked about during this episode, which comes with
the link list to be the back end of it structure makes a lot of sense. If you're curious, I'll have
an article linked below explaining this idea in better detail. I just thought it was a little bit
above the complexity for an introductory course. Now, that's not to say link lists can't be useful
in other ways. A queue on Spotify, for instance, where each song in the queue doesn't contain
just an integer or a string, but an entire song with WAV data, a title, a length, etc. Then,
when the track is done, it automatically points to the next song in the playlist. Another
example could be a photo viewing software where each node is an image, and the pointers
simply point to the next photo in the list. Like I said, the main functionality of a linked
list might be to back other data structures, but it also has tons of uses
in other areas of expertise. In Review, a linked list is a sequential access
linear data structure, in which every element is a separate object called a node containing
two parts, the data and the reference pointer which points to the next node that comes in the
linked list. These pointers allow us to easily shift around each node, adding or removing it
without having to move massive amounts of data like we would have to do in the case of
an array or some other data structure. One thing I haven't mentioned yet about linked
lists is that this pointer system does come with one big drawback. With a normal linked list, we
can only ever go forward with our pointers never backwards from the computer's eyes. Once we follow
a pointer to a certain nodes, location and memory, there's no way to go back or undo to the previous
one. Much like a shark, we can and only will ever go forward. This problem is fixed however,
through the use of a data structure known as the doubly linked list, which coincidentally is
the subject of the next segment in this video smallworld All jokes aside, Next up, we're
going to be covering the doubly linked list so strap into your seats. A doubly linked list
is almost exactly the same as a linked list. That is to say it's a sequential access data
structure which stores data in the form of nodes. Except with doubly linked lists. There's one
small difference. With the doubly linked list, we're able to traverse both forwards to
the next node in the list and backwards to the previous node in our list. Again,
using pointers. How? Well let me explain. With regular link lists. Each element was
a node composed of both a data section and then a pointer which would take you to the
memory location of the next node in the list. Using this, we were able to traverse a
linked list easily to search for information, access data within a link list, or add and
remove nodes from within the list with ease. A doubly linked list simply builds upon this
by also having a pointer which points to the previous node location in memory. It's an undo
button of sorts, which allows you to fluidly go through the link list in either direction, instead
of just limiting yourself to go in one direction. This is great since it allows us
to jump around the linked list and have a lot more flexibility when
it comes to modifying information. Now because this can get pretty confusing
very quickly, I want to do a visualization. Before that though, let's use some lingo to help
consolidate the terminology that I'll be using. When I refer to a nodes. Next, I'm
referring to that particular nodes pointer which points to the next object in the
list, whether that be another node or no value. Similarly, when I refer to a nodes previous
abbreviated to prevx, on the vigils, I'm talking about its pointer which points
to the previous object in the linked list, again, either another node or no value. Doing
this just helps keep things a little bit more simple because as you'll see having both
a previous and a next pointer makes things a little bit more complicated. Now on to what
these doubly linked lists actually look like. Just like with a regular linked list, every doubly
linked list is going to start with a head note. Since it's the first node in the list, both its
previous pointer and its next pointer will point towards a null value. This is of course, because
it can't point to information which isn't there. We can fix this though by adding another node.
Once we do, you can see that a few things have changed. The head nodes next pointer now
points towards this new node instead of No, the new nodes previous pointer now points to the
head node. And the new nodes next pointer now points towards a null value. As you can see, it's
Little bit more complicated than adding a node to a regular link list in which we only had to worry
about one set of pointers as opposed to two, but still manageable. Let's add one
more node for demonstration purposes. And you'll see the same thing happens again. The
second node now points to this new third node. And this third node gets initialized with
a pointer which points both to the previous second node and also forward to some null
value. You can see that with any two nodes, the next pointer of the first and the previous
pointer of the second come together to form sort of a cyclical connection which ties the two nodes
together. Then at the head and tail of the list, there's a connection which ties them both
towards some no value. Most of this should be common sense, but it's still good to go over.
doubly linked lists are a little scary at first, but once you break them down, you realize that
they're just an evolved form of linked lists. of next we're going to be talking about adding
and removing nodes from a doubly linked list. Again, using the three different methods
that we talked about in the previous segment, adding and removing from the head, the middle,
and the tail. I've also set up a basic doubly linked list with three nodes containing three
strings to help us out with this next segment. Finally, just like the last segment, the green on
top of a node signifies that it's the head node, and the red signifies it's the tail node. Alright,
with all that being said, let's get into it. Now adding to the head of a doubly
linked list is quite simple. The first step is to take our new node that we
want to insert and set its previous to no second, we set the new nodes next to point towards the
current head node of the linked list. Then all we do is set the current heads previous to point
towards this new node instead of a null value, and we're set to go. Doing this rearranges the
pointers in such a way that the new node we want to add becomes the head of the doubly linked list.
Pretty simple. So if we wanted a new node with the string, aim to be the head of our doubly
linked list, we would just follow the steps. First, we set the aev nodes next to point towards
the atom node, then we set its previous to point towards the null value. And finally, by setting
the atom nodes previous to point towards the AV node, we have successfully added that node to
the head of the list making it the head note. Removing a node from the head is even simpler, we
first set the head nodes next to point towards a null value. And then by setting the second nodes
previous to also point towards a null value, we can easily remove the head node from the
list. This is how it would look in our example, we'd first set the aid nodes next to no, then we
would set the atom nodes previous to also be no and then we're done. Now because the aid
node doesn't have anything to point towards, nor does it have anything pointing towards
it, it will be deleted from the list. The Atom node will regain its position as
the head node and the program will carry on. Up next is inserting and deleting from
the middle of a doubly linked list. This is where things get really tricky,
so make sure you're paying attention. To insert a node into the middle of a doubly
linked list, we follow a three step process. The first step is to take the node we want to add
and set its previous to point towards the node previous to the position we want to insert at.
Then we take that same node the one we want to add. Instead it's next pointer to point towards
the node after the position we want to insert. Finally, we set the next on the node at the
position before the one we're inserting, and the previous on the node after the position we're
inserting to both point towards this new node. That is a lot of words, some of which might not
even make sense to you. So let's open up the space on our screen real quick and do an example with
our list. Let's take a new node with the string Chris and add it between the Adam node and
the bed node by simply following our steps. First, we want to set the new nodes previous to
point towards the node previous to the position we want to insert at. This means setting the Chris
nodes previous to point towards the Adam node. Simple enough. Second, we set the new nodes next
to point towards the node after the position we want to insert that this entails setting
the Chris nodes next to point towards Ben. So far, so good. Then the last step is to set the
node on the next before we're inserting and the previous on the node after we're inserting to both
point towards this new node. So in this case, the atom nodes next and the Ben nodes previous both
get set to the new Chris node. This completes the addition into list. Obviously, this is a little
bit more complicated than inserting or deleting from the head. But as you can see, we've now
inserted this new node into its rightful place. It may be hard to see since the pointers are a
little bit messy, but the flow of the list has remained constant without any breaks in the
pointers, which is the most important part. Removing a node from the middle of a doubly
linked list is also a three step process. First, we set the node before
the one we want to remove next to point towards the node after the one
we want to remove. Then we set the node after the one we want to remove the previous two point
towards the node before the one we want to remove. The final step is to set both pointers of the
node we want to remove to point towards a null value. Again, a little complicated, so let's
take it to our example list and try it out. Let's delete the Chris note just for
the sake of keeping things consistent. Following the steps that we've laid out, we
would first set the next of the node before the one we want to remove to point towards
the node after the one we want to remove. So in this case, we said the atom nodes
next to point towards the Ben note, essentially skipping over the crisnet. Then
we set the previous of the node after the one we want to remove to point towards the
node before the one we want to remove. So we set the bend node to point towards the
atom node, again skipping over the Chris node. Finally, we have to set the Chris nodes
pointers to point towards null values. Now that we've done this, because no
nodes point towards it, and it doesn't point towards any nodes, the Criss node
gets deleted from the doubly linked list, the list is back to its original form with no
breaks or errors. Adding a node to the tail of a doubly linked list is also a three step process.
Step one entails setting the next pointer of the current tail to point towards the new node, we
want to become the tail. Step two is setting the previous of the new node that we're adding
to point towards the current tail of the list. And step three is making the new nodes next point
towards a null value. Again, let's do an example where we add a new node containing the string
Ethan to the tail of the doubly linked list. Following our steps, we first set the next
pointer of the current tail, the coral node to point towards the Ethan node. Then we make
the Ethan nodes previous point towards Carl, and it's next to point towards an O value. And
we're all set. Ethan has been successfully added as the new tail of the list in three quick
and easy steps. Removing from the tail of the list is even easier and only requires two
steps. We first set the tail nodes previous to point towards No. And then we set the second
to last nodes next to also point towards No. On our example list, it looks like this. We first
set the Ethan nodes previous to point towards No. Then we set the coral nodes next to also point
towards No. This isolates any pointers going to or from the Ethan node and deletes it from the
list making the coral node the tail once again. adding and removing information from a doubly
linked list might sound like a hassle. But remember, we only have to use this pseudocode to
program these functions once in whatever language we're implementing them in. And then we're
able to reuse them an infinite amount of times. Now for time complexity equations, since the
doubly linked list is really just an extension of the linked list, it's time complexity
equations are going to be exactly the same as they were for the link list. And for the
exact same reasons, O of n for all four cases and sometimes over one for both inserting and
deleting. If you didn't watch the linked list segment and are wondering how we arrived at these
equations, you can check the description for a timestamp which will take you to the discussion
we had on linked lists time complexities. Finally, in terms of doubly linked lists, I just
want to talk about some of the uses for a doubly linked list because there are a ton. The back
and forth functionality of a doubly linked list lends itself to be implemented in a lot of stack
like functionality, ie cases where you want to go back and forth between information that you're
storing for the user. Some examples of this could be the browser caches which allow you to go
back and forth between webpages, the Undo redo functionality in many programs, applications which
allow you to utilize an open recent functionality. The list goes on and on. Basically any case in
which you need to store a list of objects with multiple attributes, a doubly linked list
is going to be a safe bet to get it done. linked lists and their evolved
form and the doubly linked list are a great way to store information because
of the adaptability of the nodal structure. Since we're not working with raw information
like primitive types, and we're storing All information inside of a shell, it makes it
a lot easier to move the information around. This combined with the pointer system allows for
non contiguous but still fast operating speed, making these two data structures a
staple in computer science. Up next, we'll be covering dictionaries. And with that a
little mini lesson on hash tables. This is the last of what I refer to as the intermediate data
structures at the beginning of the lecture, and is honestly personally one of my favorites. So let's
not wait any longer and just jump into things. Before we get too far, though, we should probably
clarify that when we say dictionary, we're not referencing that thick book you probably have
lying around your house and haven't used in years. Actually, dictionaries are one of the most
abstract data structures which exist in computer science and can be used for a variety of purposes.
Another thing I'll clarify really quickly is that dictionaries are also sometimes called
both maps and associative arrays by the computer science community. This moreso has to do with
language specifics and preferences and not any functional differences, since all of them work in
almost identical ways. But for this series, we're going to be referring to them as dictionaries,
just to keep things simple and organized. Now a dictionary in computer science, by
definition is an abstract data structure, which stores data in the form of key value pairs. This
means we have two main parts to each dictionary element, the key and the value. Each value within
a dictionary has a special key associated with it. And together they create a pair which is
then stored in the dictionary data structure as an element. Think of a key value
pair like a social security number. Each social security number is a key,
which is then paired with a value, that value being an individual person. These
social security number key value pairs then come together to form a dictionary of every
human being living in the United States. This is very different from many of the data
structures we've talked about previously, because we index dictionaries using
these keys instead of a numerical index. For example, with an array, we would index each
element within the data structure according to a numerical value, which started at zero and ran the
length of the array. With a dictionary, however, we index each element by using its key instead
of some arbitrary integer and obtain information through that index instead. So what exactly
are these key value pairs going to look like? Well, they can be just about anything. The keys
and a key value pair can be any primitive data type that you can think of, we can have a
dictionary which has integers as the keys, one with strings as the keys, one with doubles
as the keys, there's really a lot of flexibility. As for the values, but we have even more
flexibility with those. We can have keys and our dictionary correspond to pretty much
anything, strings. Sure. Billions, of course, another dictionary with its own set of key value
pairs, you can even do that too. This allows us to have tons of combinations in the way that we store
our data, making dictionaries extremely powerful. as powerful as they are, though, there are two
extremely important restrictions that we have to cover when it comes to dictionary. And they
are as follows. Every key can only appear once within the dictionary, and each key can only
have one value. Let's start with the first one. Each key has to be unique and cannot
be replicated, duplicated, cloned, copied or anything else that would cause there
to be two keys of the same value in a dictionary. Think of it like this, when you create a key value
pair, the computer creates a little locked box in memory to store the value, then the computer
spends days and weeks creating a customized handcrafted one of a kind key that
corresponds directly to that locked box. This key cannot be replicated in the slightest.
And that's the same mindset you should use when working with dictionaries, no two keys can
be the same. If you were to try and make a dictionary with too similar keys, you
would be thrown in error by the computer. The second stipulation is that each key can
only have one value. Think back to our custom key example. It wouldn't make sense for this one
of a kind key to be able to open multiple boxes. The same is true for our keys and computer
science. They can only correspond to one value. Now one rule that we don't have to follow is
that there can be duplicates of values within a dictionary. Meaning we can have two separate
keys both point towards equivalent values. As long as the keys are different. The computer does not
care. Alright, now that we know what dictionaries are and how they work, let's jump into the time
complexity. For a dictionary, or at least try to, let me explain. Now for a dictionary, the time
complexity equations are a little bit funky. Previously, we talked about linked lists and how
they are sometimes used as the backing of other data structures in memory. For example, a stack
might implement the linked list structure as its way of storing information. Well, the most common
way to initialize a dictionary is to implement it with what's known as a hash table. Hash tables are
a little more advanced than the content I wanted to cover in this series. But they are a huge part
of dictionaries, especially when it comes to time complexity. So we're going to give a little mini
lesson on them today. Doing this will help you better understand the time complexity equations
for a dictionary, as well as give you some background on Advanced Data structures. Alright,
let's begin with a little thought experiment. Imagine this scenario for a second. say we
have a dictionary which contains a few key value pairs like shown on your screen. Now,
let's just assume that to store all this information in the computer's memory, we use an
array like how it is shown on your screen now, with a position that a cell is in the table
also corresponds to the key and the key value pair stored in our dictionary. So the zero with
cell would contain a key with the integer zero, which leads to a value. The fourth cell
would contain a key with the integer four, which leads to another value, and so on. Any cell
which we don't have a key for is empty, or what the computer scientists refer to as nil. Why does
this not work? Why can't we just use an array like this to back our dictionary in memory, since it
looks like actions such as accessing and searching would run in constant time, since all the keys
correspond to their table values in memory, all we'd have to do to get a value is simply
reference the key at the index we're looking for? Well, this actually isn't the case, because it's
based upon the simple assumption that the size of the array is not too large. Sure, this might work
great for this dictionary where we have 10 or so elements. But let's say instead of 10 elements
evenly spaced out, we want to have one which has 10 values, which are sporadically placed from one
to a billion. We want to keep the keys as being in the same index position as their values, so we
can know exactly where they are and reference them easily. But by my count that is 999,999,990
nil values, just taking up space in memory. And that is not good whatsoever. There's
got to be a better way to do things, right? Well, this is where our hash tables come
in. Hash tables are fundamentally a way to store this information in such a way that we're able
to cut down the amount of nil values, while also allowing for the information to be stored in such
a way that it is easily accessible. Basically, by using hash table, we'd be able to store these
10 keys in our dictionary, which range from one to a billion in a table with only 10 elements,
while also being able to keep constant time. How is this possible? Well, through the use of
a trick program is used known as hash functions. A hash function will take all the keys
for a given dictionary and strategically map them to certain index locations in an array,
so that they can eventually be retrieved easily. Essentially, by giving a hash
function both a key and a table, it can determine what index location to store
that key add for easy retrieval later. These hash functions can be pretty much anything which
takes in a value and returns an index location. The goal of a good hashing function is to take in
a key, whatever that may be, and reliably place it somewhere in the table so that it can be accessed
later by the computer. So with that being said, let's just jump into an example. Because I know
a lot of you might be confused. Let's say we have our dictionary which contains keys in the form of
integers from one to a million by a factor of 10. So the keys are 110 101,000 10,000 100,000, and so
on. A good hash function for this data set would be to take the key divided by itself, and then
multiply the result by the number of digits in the key minus one. So to find out where to store
the value corresponding to the 1 million key, we would take a million divided by itself, which
yields one and then multiply that by the number of digits in that integer minus one, in this
case, seven minus one or six. This means we'd store the 1 million key at index location six.
If we do this for every key in the dictionary, we can see that each key in the key value pair
is stored at some index from zero to nine. We've consolidated the 10 keys for One to
a billion into 10 index lots instead of a billion. A pretty good improvement in my opinion,
you might think this is a little over complicated for storing 10 elements. But remember, sometimes
we might be working with 1000s of keys at a time, ranging in the billions, or even strings
as keys, which is a whole other story. Now the best part about these hash functions,
let's say that we now want to get the value which is paired with a 1 million key, all we need to
do is put 1 million back into our hash function, and it will tell us where the value tied
to that key is stored at within the table, in this case at the sixth index. Doing this
allows us to pretty much instantaneously find where any key value pair is stored at within
the table. All of this information may be pretty confusing. If you still aren't 100%, I
would recommend watching this part over again. However, in layman's terms, and for
this series, all I want you to be able to comprehend is that dictionaries are built
upon these hash tables. And the keys in our key value pairs are stored in these hash tables and
indexes which are determined by hash functions. And if you can understand that you've got a
pretty good base for understanding hash tables. Now the final thing I want to talk to
you guys about in regards to hash tables, and the reason dictionary time complexity
equations are so funky has to do with one small problem. What happens when we run two
different dictionary keys into a hash function, and the computer tells us to store them at the
same index location. For example, let's say we have two dictionary keys with the strings, Steven
and Shawn. And when we run both of them through our hash function, the computer instructs us to
store both of them at the ninth index location, this should be impossible. How are we supposed
to put both of them at index location nine. This little problem is what's known as a hash
collision and can be solved one of two ways, either open addressing or closed addressing. With
open addressing, we just put the key in some other index location separate from the one returned to
us by the hash function. This is usually done by looking for the next nil value in the table,
ie the closest location which contains no key. So with our example, the key Steven would
get hash to the index location nine, mostly because it's a better name. And the inferior
key Shawn would get stored at index location 10, because it's the closest open location available.
This does make it harder to interact with the data in our dictionary later on, and can lead to
problems if we eventually have a value which hashes to the index location 10, which is why
computer scientists developed closed addressing. Closed addressing uses linked lists to chain
together keys that result in the same hash value. So the keys Steven and Shawn would get stored
together in a linked list and index location nine. The main drawback to this is that whenever we
want to interact with the value stored within a key value pair for either Steven or Shawn, we end
up having to look through the linked list for that piece of data that we want. If there are 200 keys
hash to one index, that's 200 keys, we might have to search through to find the one that we want.
And that's not very good. And with that concludes our mini lesson on hash tables. Now we can get
back into dictionaries, and finally talk about their time complexity equations. Now remember back
to the segment on time complexity really quickly. Back then I mentioned that for
these time complexity equations, we generally measure a data structure based
on its worst case scenario. But when it comes to dictionaries, if we are to assume the worst
case scenario, things get a little outrageous, we basically end up assuming that our hash
function makes it so that every key value pair ends up in the same index, meaning that each
of our keys gets stored in a linked list, assuming that closed addressing is
being used, then worst case scenario, we have to assume that every operation functions
how it would for accessing searching for inserting or deleting from a length list, which,
if you remember is O of n for all four. This is of course preposterous and would probably
never happen with the bare minimum decent hash function. Which is why in addition to the worst
case, scenario, time complexity equations, I'll also go over the average time complexity equations
for these four operations. Now Lucky for us, they're all going to be over one. This has to do
with these hash functions we talked about before. To access search for insert or delete a key value
pair from our dictionary. All we need to do is run that key through our hash function, and it will
tell us what index in the hash table to go to in order to perform that operation. There's no time
wasted looking through preset indexes because we, the programmer generate the indexes ourselves. And
that's the power of the hash table in the flesh. Overall, dictionaries are a very useful data
structure when it comes to computer science for a few reasons. They differ from the rest
and quite a few big ways. The fact that they don't use a numerical index to retrieve values,
rather a preset key, the notion that those keys and the key value pairs can be a range of types
from strings to integers, to chars, and so on the implementation of the hash table, which allows
for super quick utilization that is sure to come in handy in any program that you write. In
conclusion, and overall, the dictionary is just a solid data structure that can be used to fill
in the gaps in your data structures knowledge. And with that, we have now completed all of
the intermediate data structures if you will. Next, we'll be moving on to trees and tree based
data structures. This is the point in this series where we will be dropping our discussion on time
complexity equations, because if you thought that the dictionary time complexity equations were
complicated with hash tables and hash functions, trees, and there are many variants only further
complicate things to the point where I no longer feel comfortable including them in an introductory
course, we might mention that certain tree based data structures are good for accessing or
searching. But as for discussions on all four time complexity equations, dictionaries are where
we leave things. With that being said, let's move on to the final few data structures as we round
out this introduction to data structures series. Next up on our list of data structures
is the tree. No, not those big green things that inhabit the outside world and
that you can climb on and get fruit from, rather the computer science data structure,
which is way more fun in my opinion. Now, before getting into trees, we need
to talk about data structures in general. Every data structure we've covered up until this
point has been stored linearly. arrays, stacks, linked lists, all of these had a definitive start
and end to their data, and you could point them out easily. Even the dictionary could be laid
out in such a way to be represented linearly. Everything was so nice and neat and easy to
visualize. On the other hand, trees store data hierarchically as opposed to linearly. What does
that even mean? Well, besides being an extremely hard word to pronounce, it's an alternative way
to store data that we're going to be using for the remainder of this series. It's a little hard to
visualize. So I'm going to start with an example. The most common real world example of
hierarchical data would be a family tree, each person would be an element of the family
tree, and connections wouldn't be based on a simple linear fashion. Rather, connections
would be more abstract and can lead to multiple paths or branches, instead of just a single data
point. Each generation is ordered on a hierarchy, which is where the name comes from. And
as you can see, there's no definitive end to the family tree. Sure, there are the ones
at the bottom, which you could argue are the ends of the family tree, but which
one of them is the definitive end, there is none. Another example could be
a file structure system on your computer, you'd have a base folder, such as your desktop.
And then inside of that, you might have multiple other folders for different types of information.
And then inside of those, you might have more folders representing more types of information.
And then finally, you would have your documents. Again, it sets up a network of those files on
different levels, just like how we had with the family tree. Trees and the trees. Many variants
are all dependent on storing data hierarchically, which of course finally begs the question of what
exactly a tree is. Well, a trees in abstract data structure which consists of a series of linked
nodes connected together to form a hierarchical representation of data. The actual information or
data is stored in these nodes. And the collection of nodes along with the connections between them
is what's known as the tree. This is sort of like a linked list only instead of each node only
pointing to one location, it has the option of pointing towards multiple, and also branching
off on its own or pointing to no other nodes. Each of the nodes in a tree can also be called
vertices and the connections between vertices. What linked two nodes together are called edges.
Now the free flowing structure of a tree lends itself to a lot of different configurations. And
with that comes a plethora of terminology about certain nodes, vertices, and just the structure
of a tree in general. So what we're now going to do is go over a lot of the terms associated
with specific nodes based on where they are on the tree and how they're connected to other nodes
while also visualizing the general structure of a tree at the same Time. Okay, let's start. Let's
begin with the things we've already talked about. A vertices is a certain node in a tree,
and an edge is a connection between nodes. The first new thing is that every tree starts with
what's known as a root node. This is always going to be the topmost node of a tree. Let's add a node
with the integer 50 to serve as our root node. Next up, let's connect the two vertices
to our root node using two edges. These two nodes we just added are what's known as
child nodes, since they are both connected to the node containing the integer 50. child nodes can
thus be defined as a certain node which has an edge connecting it to another node one level
above itself. Vice versa, the root node, the vertices containing the integer 50, is now what's
called a parent node to these two child nodes. Thus, we can define a parent node as any node
which has one or more child nodes. Think back to our family tree. If we were using people
instead of integers, it makes perfect sense that the nodes directly connected to each
other have some sort of familial relationship. Let's continue on by adding two child nodes to the
node containing the integer 20. When we do this, the 30 node becomes a parent node, and the
two nodes we've just added become child nodes. We have now branched off from the 20 node
completely, and that the two child nodes containing the integers 10 and 15, respectively,
share no direct association with it. Speaking of the 20 node, since
it does not have any children, it is what's known as a leaf node, or a node
in a tree which doesn't have any child nodes. In this context, the 10 and 15
nodes would also be leaf nodes. Finally, let's add one more node as a child of
the 10 node and there's our tree in all its glory. As a quick review, the 50 node is the root node,
the 30 and 20 nodes are children of that root node and the root node is thus the parent of
the 30 and 20 node, then the 10 and 15 nodes are children of the 30 node. And the 30 node is
thus a parent node to both the 10 and 15 nodes. The five node is a child of the 10 node and
the 10 node is a parent to the five node. And finally, the 515 and 20 nodes are all leaf
nodes because they do not have any children. As you can see, one node or vertices on
our tree can have many different titles depending on where it is in the tree and what
other nodes are connected to it. For example, the 30 node is both a parent node to the 10
and 15 nodes, but also a child of the 50 node. The 15 node is both a child of the 30 node,
and also a leaf node as it has no children. This terminology really comes in handy when
we start talking about trees which contain 1000s of vertices and edges, and the data
becomes very complicated to order and maintain. Moving on the next two pieces of
terminology I want to go over with you guys are the height and depth of a tree. The
height is a property of a particular tree in it of itself. And the depth is a property of each
individual nodes are contained within a tree. Let's start with the height. The height of a tree
is the number of edges on the longest possible path down towards the leaf. So in our tree
example, since the longest path in our tree, which leads to a leaf is from the 50 node to the five
nodes, and there are three edges in that path, the height of the tree would be three. The depth
of a certain node is the number of edges required to get from that particular node to the root
node. For example, let's take our 30 node. Since there's only one edge connecting it on the
way up to the root node, its depth would be one. For the 15 node. However, since there are two
edges, which separate it from the root node, the 15 node would have a depth to that's pretty
much all you need to know about the terminology associated with trees and computer science. As a
review, we have the height and depth obviously. Then we have vertices, edges, root
nodes, parent nodes, and leaf nodes. Now there's probably something that's
been scratching at the edge of your mind for quite a while now. And that's why the heck
are these called trees. They look nothing like trees. Regular trees are not upside down like
this data structure would lead you to believe. So who named trees and why? Well,
there's actually a simple answer to this. The tree is said to have been invented during
the 1960s by two Russian inventors. And the last time that a computer scientists got up from
the chair and went outside to actually see a tree is rumored to have happened back in 1954. Ever
since then, it's been grinding arrayed screens watching hours upon hours of their favorite youth.
superchannel so please forgive them for their confusion when it came to naming conventions, they
must have just forgotten trees aren't upside down. Now regular trees, like the ones we just created
are great for storing hierarchical data. But their power can really be heightened when you start
messing around with how the actual data is stored within them. by imposing rules and restrictions
on what type of data is stored within a tree, as well as where we can effectively use the
tree data structure to its full potential. I could talk about the different types of
trees for a long time, so long that many of them are full segments in this lecture. But for
now, I just want to cover one popular variant. This will be a good introduction to how trees can
vary without simply diving into a huge deviation from what we already know. More specifically,
the tree variant I want to talk to you guys about is the binary search tree. A binary search tree
is a simple variation on the standard tree, which has three restrictions put on it to help
organize the data. The first is that a node can have at most two children. This just helps make
it easier to search throughout the tree as we don't have to spend time looking through each
of the eight children for a particular node. Keeping it limited to two helps us do this. The
second restriction or property is that for any given parent node, the child to the left has a
value less than or equal to itself. And the child to the right has a value greater than or equal
to itself. This might seem weird, but it comes with certain advantages and disadvantages over
using normal trees, which we'll get to in a bit. The final restriction put on binary search trees
is that no two nodes can contain the same value. And this is just to prevent weird things from
happening when we begin searching through the tree. Now, how do imposing these restrictions
on a tree actually help us? Well, the biggest advantage of binary search trees is that we're
able to search through them in logarithmic time. Because there is a natural order to the way that
the data is stored. It makes it extremely easy to search for a given value. logarithmic time, if you
remember back to the segment on time complexity is the equation in which we get more bang
for our buck the greater number of elements or nodes have in our data structure. It works
like this. All we have to do when searching is tell the computer to go left if the value
we're searching for is less than the current node, and right if it's greater than the current node.
We can then wash rinse and repeat this strategy until we find our desired note. This makes
binary search trees really popular for storing large quantities of data that need to be easily
searchable. Of course, this also translates to inserting, deleting and accessing elements
within the data structure. But for the most part, searching efficiency is what really sets
the binary search tree apart from the rest. stepping away now from binary search
trees and into trees in general, let's talk about common uses for them in
computer science. The most common uses for trees in computer science includes storing
data with a naturally hierarchical structure. These are like the examples we touched
upon at the beginning of this segment. Data Sets such as file structure systems,
family trees, a company's corporate structure, all of these could be stored and implemented
using the tree data structure very easily. These are all general examples though. As I
mentioned before, when we put restrictions on trees, like in the case of the binary search
tree, we can expand it to uses even further. A trees based structure is incredibly useful,
and it can be modified in so many ways, which only add to its functionality. One of
these ways is through what's known as a try and is the next data structure on our
agenda. So stay tuned for that. Next up, we'll be talking about another one of the
special trees with restrictions. We just finished discussing the binary search tree. And with that
we mentioned how trees usually become even more useful once you start setting restrictions on
how and where the data can be stored within them. Well, a try is another one of these special trees,
which have special restrictions put in place to help store the data in an effective manner. This
data structure is often overlooked since it's only used in specific situations. But without the use
of tries within those specific situations. Some important features of your computing life would
be a whole lot harder. So we get it Steven tries a special tree with restrictions. But what are
those restrictions and how can they help us? Well, let's start with the basics. A try is
a tree like data structure whose nodes store letters of an alphabet in the form of characters.
We can carefully construct this tree of characters in such a way Which allows us to quickly retrieve
words in the form of strings by traversing down a certain path of the try. These are also sometimes
called Digital trees or prefix trees by the computer science community. We'll be calling them
tries for today's lecture. Trees are used in the retrieval of data in the form of words, which is
actually where the name try comes from, as it's smack dab in the middle of the word retrieval.
Essentially, in layman's terms, we use tries to retrieve words extremely fast by traversing
down a tree of store characters. This is hard to explain without actually showing you. So let's do
an example of what a try might actually look like. Now, just like with a tree, every try is going
to start with a root node only in our case, that root node will be empty with either some null
value or a blank string. Now also stored within this node will be a set of references, all stored
within an array. These references are set to know at the beginning of the Tris existence, but can
be slowly filled with references to other notes. For example, let's say we were creating a try
to store words that start with the letter D. The root node would contain an array which
contains a reference to a node containing the character D, signifying the start
of our word. Now imagine for a second that this D node also has an array
containing references. Only this time, it contains references that point towards all
characters in the English alphabet, which serves as the first two characters of a word in the
English dictionary. So da serves as the first two characters and a lot of English words such as dad,
or Tao, or dad if you really want to go that far. And so the array contained within the D node would
hold a reference to an a node. Now since there are no English words, which start with DB that I
know of, we wouldn't have a reference to a b node. Since we know we will never have to retrieve
a word from our try, which starts with dB. We continue this process for all letters in the
alphabet, only including references to characters which serve as the first two characters in an
English word. But that is a lot to visualize. So for now, let's just put up on the screen two
nodes that this first D node would point towards. More specifically, the node, we've already added
the a node, as well as the E node. Now to continue building our try, we would simply repeat this
process for all of the nodes that this D node points towards. So we'd store pointers in the a
node, which would point towards characters and the English alphabet that serves as the first three
characters of a word in the English dictionary. So you'd have a B, A, D, A y, and many, many more
nodes as well. But we're just going to stick with those three for now. For E, you'd have pointers
which point towards nodes containing characters like n and w. Obviously, there'd be more nodes
than the one shown on your screen right now for all levels of nodes. But what we've
created here is a very simple try. As you can see, it's exactly like I said it was
in the beginning, a tree like data structure, which stores characters that can be used
to make words by traversing down paths. As we traveled down different paths of
our tree, we can make different words, dab, Dad day down the one path,
and then and do down the other. By following the nodes downwards, we
can easily build strings of words, which you'll learn towards the end of
the episode can be extremely useful. We can even take this process further by having
one path contain multiple strings. For example, if we isolate the den path, we could continue the
process to go on to words like dense or Denver or dent, the list goes on and on. We wouldn't have
to just stop at the word then since then, is also a prefix for numerous other words, and that's what
makes storing data and character format so useful. One path down to try can represent multiple words
depending on where you stop along the process. This does provide a bit of a problem
for computer scientists though, how do we know when a word has ended? Let's take
the Denver path as an example. If we wanted to retrieve the word den from this try, how would
we know to stop at the end node and not continue along to Denver? Well, there's actually a pretty
simple fix to this and it's known as flagging. We simply mark the end of the word by having it
also point towards a flag to let the computer know that the end of a word has occurred. So in our
Denver example, not only would the end nodes array contain pointers to whichever characters follow
it, it would also contain a pointer to a node with a flag to tell the computer to stop there.
In this case, I've just chosen a period as a flag. This way we can control tries in such a
way where an each word is marked by an ending point. And the different words that
may branch off from the prefix can continue to use that word as a prefix in whichever
retrieval program ends up getting used. Okay, so now it's time to talk about
the general use cases of a try. If you remember back to the beginning of this
segment, I said that the use cases for a try were limited, but extremely effective. And now we're
going to talk about what that statement means. Now, have you ever used the autocomplete
feature on your iPhone or spellcheck on a Google Doc, because if so you have already
experienced the overwhelming power of tries, as they are used for both of
these extremely useful features. This mainly has to do with the fact that
for big programs like iOS or Google Docs, they're not just storing a try containing a
few words, or even all the words starting with a certain letter, like we tried to replicate
their storing the entire English dictionary. storing the entire dictionary in a try seems
like a tall task. But it can and has been done. Each node would simply have 26 nodes connected
to it. And those 26 would have as many nodes connected to them as needed, and so on. And so
now I can hear you asking you in the comments. How does this help us with autocomplete and
spellcheck? Well, let's say you're typing out a word. For the sake of this video, let's
say it's the word subscribe. You start with the s. At this point, that computer has already
eliminated 95% of words that you could be typing. We know it's not going to be a word which starts
with n, or with a or o or H or l etc, etc. Then you type the letter U and bam, another
95% of possible options get deleted. With each character you type, you eliminate millions of
possible words that you could be working towards. Using this fact, as well as popularity data,
which could also be stored within the node, the computer can start making educated guesses
using that information, as well as context from previous words, to make a suggestion as to what
word you're trying to type out. The AI that works. This feature is immensely complicated, but it
is backed by the tri data structure and helps autocomplete work easily on your phone. As for
spellcheck, well when you spell a word wrong, a spell checking algorithm can use the root of
your word and compare it against try data to make an educated guess as to what you were trying to
spell. The slightly misspelled words will usually have a similar path from the root of the try. If
you accidentally type chalk a lotta instead of chocolate, the computer can take the first few
characters of the word you typed in correctly, and see where you may have gone wrong. Basically,
by comparing the misspelled word to certain paths of the dictionary try, they can pretty accurately
detect which word you were meaning to say and correct you accordingly. So there you
have it, the try. Like I said, it is often underrated in the computer science world since
it can only be used in certain situations. But as I hope this segment has shown you, those
situations are still important. Nonetheless, we are nearing the end of our introduction to data
structures series. Now, only two more left before we part ways. So stay tuned, because now we're
going to talk about heaps. Now back in our segment on trees, we talked about the binary search tree,
a special type of tree which has a few properties. Each node can only have two children, the child
to the left must have a value less than the parent node, and the child to the right must have a value
greater than the parent node. And also no two nodes could contain the same value. Well, a heap
is sort of like this, but a little bit different. More specifically, by definition, a heap is a
special tree where all parent nodes compared to their children nodes in some specific way,
by being either more extreme or less extreme, ie greater than or less than. This specific way
determines where the data is stored and is usually dependent on the root nodes value. There are two
different methodologies generally used in computer science, and they are known as min heaps and Max
heaps. In a min heap, the value at the root node of the tree must be the minimum amongst all of
its children, and this factor must be recursively. The same for any and all parent nodes contained
within the heat. So each parent node must have a value lower than all of its children nodes. As
you can see from our example, on the screen now 10 is the root node and also the minimum value in
the tree. In addition to this fact, if we pick any parent node on the tree, and look at its children
and their children and so on, the parent node will have the lowest value of the mall. Take 14 for
example, its value is less than 2631 4435. In 33, this must be the case for every single
subtree, which is contained within the heap. Max heaps on the other hand are
the exact opposite. In a max heap, the value at the root node of the tree must be
the maximum amongst all of its children. And this fact must be recursively, the same for any
and all parent nodes contained within the heap. If you take a look at the example max
heap we have on the screen now, again, you'll see that this is the case 44 is the root
node and also the largest value within the heap. If you take a look at say the sub
tree, which is parented by the 35 node, you'll see that 35 is a maximum value amongst all
nodes in that subtree greater than both 19 and 27. When we store data like this, and heap, whether
that be a min heap or a max heap, it makes it extremely easy to insert or remove from. This
lends itself to a lot of useful implementations within computer science. Now to show you
this, I'd like to build an example max heap, the one which has the greatest integer stored
in the root node. For the sake of keeping things simple, let's pull up an array of seven
elements with integers ranging from zero to 100, and simply convert it into a heap one element
by one. This can be done in a few easy steps. Step one is we add the first integer in as the
root node. So 70 would get inserted into our tree as the root node, then we add another node
to the tree in the bottom left most positions available. So we would first insert a node as the
child of the root node to the left. For our heap. This means adding the integer for as a child of
the 70. The final step is to recursively go up the heap and swap nodes if necessary. Now when
we say if necessary, we mean that if the node we just added is more extreme, either greater
than or less than the node above it, depending on the type of heap that we're trying to create,
we swap them to maintain order amongst the heap. Since we're building a max heap, and 70 is greater
than four, no swaps are necessary in this case. Now we just simply repeat steps two
and three until we've built our heap. So next, we want to add the integer 90. And since
the leftmost slot on the tree is already taken by our four node, the right slot ends up being the
leftmost location on our tree. So we insert it there. And then since 90 is greater than 70, we
actually end up swapping the 90 and 70 nodes. Doing this helps keep the max heap property
intact, where every level is greater than the one below it. Let's keep going. Next, we add 45
to the heap. Now, since we've run out of space, on the second level, we move on to the third
level and add 45 as a child of the four node, then we compare it to four, which is greater
than so we swapped the nodes. Now we have to compare this node to the node above it
again. Because remember, we recursively go up the tree until we reach the root or don't
need to swap. Now 45 is not greater than 90, so it stays put where it's at. Next up,
we add 23 as another child of the 45 node, only this time on the right side, we compare and
since it's not greater than 45, we keep it as is. Moving on, we next insert 76 into
the tree as a child of the 70 node, then we compare and swap the 76 and 70
nodes as 76 is indeed greater than 70. We then compare 76 and 90. And since 90 is greater
than 76. Keep the 76 node in place for now. The next node we add is the 100 node, we compare
it to the 76 node and see that it's greater, so it gets swapped. And then we compare it
again, this time to the 90 node. And since it's also greater than that integer, it gets
swapped yet again to become the root node, signifying it is the greatest node in the heap.
As you can see, it's a pretty simple concept, you add a node and then keep trying to swap up until
the node you've added is in its rightful place. Deleting from heap is also pretty
simple, at least in our case. This is because the type of deletion I want
to talk about is just the process of removing the root node from the heap. And you'll see why
later on. To delete the root node from a heap, you follow a three step process. Step one is
actually removing the root node from our heap. So in our case, we delete the 100 node. What
you do with it is up to the programmer, you can return it back to the user stored somewhere
else, print it out, etc, etc. Either way, then step two is replacing it with a node furthest to
the right in this case It would be the 76th note. Finally, for step three, we do what's known as
a heapify. To fix up the heap, we start with the root node and compare it to its children to see
if we need to swap any values. So for the 76 node, we compare it to its two child node. And since
76 is less than 90, we end up swapping those two. Then we wash rinse, repeat for every subtree
that we have. So on the right side, we swapped 90 with 76. But since 76, remains the greatest
integer on that particular subtree, it stays in the same spot. On the left side, we didn't change
anything, but 45 is still the greatest integer amongst the three nodes in that subtree. So no
swapping is necessary on that branch of the heap. And so we've completed our heapify, and
all heap properties are still intact. That is inserting and deleting nodes in
a nutshell. Now let's talk about how we can use this to our advantage. Heaps are most
commonly used in the implementation of heapsort. heapsort is a sorting algorithm which takes
in a list of elements, builds them into a min or max heap, and then removes the root
node continuously to make a sorted list. Because heaps always start with the minimum
or maximum value contained within them at the root node, we're able to simply just
remove the root node over and over again, keep refining the data structure after every pass.
And after every element has been removed, we are left with a sorted list. On your screen, you'll
see we have an unsorted list on the left. In the middle, we've inserted all of those integers, and
in doing so created a max heap. And then finally, on the right, we have continuously removed those
elements from the root node into a now sorted list. heapsort is a really cool algorithm,
and it will be part of our upcoming series on sorting algorithms, which is kicking off very
soon. So if you're interested in that, make sure you subscribe to our channel so that you don't
miss it, a link will be in the description below. Another extremely popular use of heaps is
through the implementation of priority queues. priority queues are an advanced data structure,
which your computer uses to designate tasks and assign computing power based on how urgent a
certain matter is. Think of it like a line at the hospital. You wouldn't want your line to follow a
regular queue methodology which implements first in first out. Since then you could have patients
with extremely urgent matters like heart attacks, waiting behind people coming in for a routine
checkup. And the same way you wouldn't want your computer to update an application before it
finishes rendering a video. Otherwise your entire progress would be lost. priority queues take care
of all the task scheduling done by your computer, and the heap data structure is used as the
backing system for it. And with that ends our discussion on heaps. To review, they are a
special tree in which each level contains nodes with values more extreme either greater than
or less than the nodes on the level above it. Next up is unfortunately the last segment in the
series on yet another tree based data structure the graph. The graph is arguably the most
dynamic data structure in our introduction to data structure series and an absolute banger
to go out on. So let's just hop right into it. Before we get into the nitty gritty details,
I first want to do a short little exercise. Visualize for a second a few of your favorite
places to eat on a map around your town. For me personally, it'd be places like five guys chick
fil a Panera, Wawa and Domino's. Now imagine for a second that you are ravished absolutely starving.
And so your plan is obviously to drive around to each of your favorite places to eat in order
an ungodly amount of food from each location. Each food location has a few possible paths
going to and from it. And so we add those to the map as well. Now you can see what we now
have looks like a network of delicious foods, we can start anywhere, and all we have
to do is make sure to hit all five. You may not know it, but what we've essentially
done here is set up a simple graph. Basically, graphs are composed of pieces of information like
the restaurants and the path set run between them. Of course, this is just generally, by definition
graphs are a nonlinear data structure consisting of nodes and edges. There are a finite set of
these nodes or vertices, which are connected by edges. nodes and edges should be familiar
to you if you watch the segment on trees. The big difference between trees and graphs
however, is that with a tree we had a specific starting point. Sure, there were multiple paths
down the tree that branched off from the initial starting point. But you always had to begin at
the root note. In contrast with a graph there is no specified starting point, we can begin from
any node and traverse to any node. Just like how on our food example, we are able to start at
any restaurant. graphs are a huge concept and escaped the bounds of computer science often being
used in many places you wouldn't even think of. But before we get into anything crazy,
though, like the difference between a directed or undirected graph or a cyclic versus
cyclical graphs, let's get down the basics. Now, every graph is composed of these nodes or
vertices and the edges that connect them. Let's pull up a sample graph really quickly and
talk about it. We represent graphs visually like this a lot, because it makes it way easier
to comprehend. But notationally wise, a graph actually looks like this, which is much harder
to comprehend. So let's break it down. First, we have the vertices set, which contains a comma
separated list of all vertices within the graph. That's the simple part. Each comma separated
value simply represents a node within the graph. Then we have the edge set, which is a little bit
more complicated. Each element of the edge set is an ordered pair which describes relationships
between nodes. For example, the first one describes a relationship between the six and four
nodes. The fifth indicates a relationship between the five and two nodes, and so on. Using these
two sets, we're able to visualize a graph pretty easily by laying down both the information
and the connections that fall between them. One final thing I want to mention about the
basics of graphs is about the relationships that occur between two nodes. If we have an edge, which
connects two different vertices, they are known as adjacent to one another. So for example,
the five node would be adjacent to the four two and one nodes. Okay, now that we have
the basics down, I now want to jump into the different attributes that a particular graph might
have. Starting with directed versus undirected. an undirected graph is one in which the direction
you traverse the nodes isn't important. This is most prominently indicated by a lack of arrows
pointing to specific nodes. Such was the case with our first example graph, or even the food
example from the beginning of the episode. We can hop between nodes or even go back
and forth between them without problem. A good way to visualize undirected graphs
is like a network of friends on Facebook, where each edge indicates
a friendship, if you will. Because of the fact that when somebody accepts
to be your friend on Facebook, you are both added to each other's friends list. The friendship
goes both ways and direction is unimportant. In contrast, a directed graph is one in which the
direction you traverse the nodes is important. This is usually indicated by arrows representing
which nodes a certain node is able to traverse to. The edges could point both ways but they don't
have to. It's very possible the edge only points one way. A good way to visualize directed graphs
is by thinking them as a network of friends on Instagram. Sure, I can follow famous celebrity
Will Smith, but the odds that he follows me back fairly low. And so in that case, the
relationship only goes one way. undirected and directed graphs both have their uses. As
we discussed with the social media example, both provide different functionality which will be
useful to you and your computer science journey. Just like the next type of property a
graph can be classified as either cyclic or a cyclic acyclic graph is one which contains
a path from at least one node back to itself. So you can see by the example on your screen now
that the four node leads to the three node which leads to the two node which leads to the one node,
which finally leads back to the four node, forming a cycle, essentially, we're able to follow at
least one path that eventually leads back to our starting point. A small thing to note here is that
all undirected graphs end up being cyclical. The bi directional nature of nodes within undirected
graphs theoretically forms a cycle between any two nodes. So judging by that logic, all undirected
graphs end up being cyclic. In a acyclic graph is one which contains no path from any one node which
leads back in on itself. This property can really only apply to undirected graphs like we mentioned
previously. Essentially, this means that for any given node, there is no path which will eventually
lead back to itself. undirected directed acyclic and a cyclic are all properties we can use
to classify types of graphs based on their nodes. But the last property I want to talk about
actually applies to the end. Have a graph instead. And it's the process of waiting. Waiting the edges
of a graph means associating a numerical value with each edge, often called the cost of that
edge. Each weight represents some property of the information you're trying to convey. For example,
again, going back to our food location scenario, since the information that we're trying to convey
is a good route, which takes us to each location, a good weight for our edges, and that scenario
could be the distance between nodes. This comes in handy a lot, especially with navigation,
such as the case with our restaurant example. As we of course, always want to find the path of
least cost or weight between the different nodes. So there are the major properties of heap
that the different nodes and edges can have directed or undirected, cyclic or a cyclic and
weighted or unweighted. There are a couple more obscure ones out there, but those six are what
we will be covering today. Combining these three properties together leaves us with numerous types
of graphs, which all have different strengths and weaknesses. It would take a while to talk about
each of these and there are implementations. So for now, I'll just pick out three types of
graphs which are used in the most popular cases. Now probably the most famous
implementation of the graph data structure is through the undirected
cyclical graph with weighted edges. This one gets a lot of use, especially through its
implementation in Dykstra shortest path algorithm. This algorithm given a graph and a source
vertex within that graph, compiles a list of the shortest possible paths from that source
vertex to all other nodes. As you might be able to tell just from its description, this has a
multitude of uses across the entire tech world. Google uses this algorithm for Google Maps.
It's used in the process of IP routing, and it can even be implemented in telephone
networks. Another type of graph, which you probably use quite often is the unweighted
cyclical graphs both undirected and directed, as these make up the follower systems of a
majority of social media websites. We already talked about these in the cases of Facebook,
which would use a cyclical representation, as well as Instagram which would use an a cyclical
representation. However, this example encapsulates much more than just those to Snapchat, Twitter,
tik tok, even all these platforms can represent your follower following base through a graph
and oftentimes do Facebook even has a graph API, which you can use to interact with the models that
they use to illustrate each user's web of friends. As you can see graphs and there are many different
forms provide a lot of the functionality that you interact with in everyday life, contributing
to almost any facet of the internet. And with that concludes our discussion on graphs.
As we review, a graph is a data structure which is represented by a set of nodes and edges. These
come together to form a visualization of data, whether that data be food locations on a map or
friends on social media. The different types of graphs provide a multitude of implementations
in computer science. And with that concludes our introduction to data structure series. After
around three hours and 12 data structures, you should now have the basic knowledge to take
with you as you continue along in your computer science journey. And if you've made it this long,
you obviously liked what you saw. So stick around for an extra 30 seconds while I'll pitch to you
why you should check out my personal channel. Now I introduce myself at the beginning of the
series, but again, my name is Steven. I'm an 18 year old computer science student who runs a
YouTube channel called nullpointerexception with one of my good friend Shawn. We've been producing
content like this seriously for about four months now. And I've grown a good audience of around 800
subscribers, which we couldn't be happier about. If you'd like to join us, you can check out
our channel linked in the description below and just try out a few of our other
videos. And if you like what you see, you can subscribe. That ends my pitch and with
that I have nothing else to say. I hope you have enjoyed this series as much as I've enjoyed
making it. Thank you so much for watching.