10. Understanding Program Efficiency, Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or view additional materials from hundreds of MIT courses, visit MITOpenCourseWare@ocw.mit.edu. PROFESSOR: OK, folks. Welcome back. Hope you had a nice long weekend with no classes. You got caught up on all those problem sets that have been sneaking up on you. You enjoyed watching the Patriots and Tom Brady come back. Oh, sorry, I'm showing my local bias. Before we talk about today's topic, I want to take a second to set the stage. And I want you to stop and think about what you've seen so far in this course. We're coming up on the end of the first section of the course. And you've already seen a lot. You've certainly learned about fundamentals of computation. You've seen different kinds of data structures, both mutable and immutable, so tuples and lists, dictionaries, different ways of pulling things together. You've seen a range of algorithms from simple linear code to loops, fors and whiles. You've seen iterative algorithms. You've seen recursive algorithms. You've seen classes of algorithms. Divide and conquer. Greedy algorithms. Bisection search. A range of things. And then most recently, you start pulling things together with classes-- a way to group together data that belongs together along with methods or procedures that are designed to manipulate that data. So you've had actually a fairly good coverage already of a lot of the fundamentals of computation. And you're starting to get geared up to be able to tackle a pretty interesting range of problems. Today and Monday, we're going to take a little bit of a different look at computation. Because now that you've got the tools to start building up your own personal armamentarium of tools, we'd like to ask a couple of important questions. The primary one of which is how efficient are my algorithms? And by efficiency, we'll see it refers both to space and time, but primarily to time. And we'd like to know both how fast are my algorithms going to run and how could I reason about past performance. And that's what we're going to do with today's topics. We're going to talk about orders of growth. We'll define what that means in a few minutes. We're going to talk about what's called the big O notation. And we're going to begin to explore different classes of algorithms. Before we do that though, let's talk about why. And I want to suggest to you there are two reasons this is important to be considering. First question is how could we reason about an algorithm something you write in order to predict how much time is it going to need to solve a problem of a particular size? I might be testing my code on small scale examples. And I want to know if I'd run it on a really big one, how long is it going to take? Can I predict that? Can I make guesses about how much time I'm going to need to solve this problem? Especially if it's in a real world circumstance where time is going to be crucial. Equally important is going the other direction. We want you to begin to reason about the algorithms you write to be able to say how do certain choices in a design of an algorithm influence how much time it's going to take. If I choose to do this recursively, is that going to be different than iteratively? If I choose to do this with a particular kind of structure in my algorithm, what does that say about the amount of time I'm going to need? And you're going to see there's a nice association between classes of algorithms and the interior structure of them. And in particular, we want to ask some fundamental questions. Are there fundamental limits to how much time it's going to take to solve a particular problem, no matter what kind of algorithm I design around this? And we'll see that there are some nice challenges about that. So that's what we're going to do over the next two days. Before we do though, let's maybe ask the obvious question-- why should we care? Could be on a quiz, might matter to you. Better choice is because it actually makes a difference. And I say that because it may not be as obvious to you as it was in an earlier generation. So people with my gray hair or what's left of my gray hair like to tell stories. I'll make it short. But I started programming 41 years ago-- no, sorry, 45 years ago-- on punch cards. You don't know what those are unless you've been to a museum on a machine that filled a half a room and that took about five minutes to execute what you can do in a fraction of a second on your phone. Right. This is to tell you're living in a great time, not independent of what's going to happen on November 8th. All right. We'll stay away from those topics as well, won't we? My point is yeah, I tell old stories. I'm an old guy. But you might argue look, computers are getting so much faster. Does it really matter? And I want to say to you-- maybe it's obvious to you-- yes, absolutely it does. Because in conjunction with us getting faster computers, we're increasing the sizes of the problems. The data sets we want to analyze are getting massive. And I'll give you an example. I just pulled this off of Google, of course. In 2014-- I don't have more recent numbers-- Google served-- I think I have that number right-- 30 trillion pages on the web. It's either 30 trillionaire or 30 quadrillion. I can't count that many zeros there. It covered 100 million gigabytes of data. And I suggest to you if you want to find a piece of information on the web, can you write a simple little search algorithm that's going to sequentially go through all the pages and find anything in any reasonable amount of time? Probably not. Right? It's just growing way too fast. This, by the way, is of course, why Google makes a lot of money off of their map reduced algorithm for searching the web, written by the way, or co-written by an MIT grad and the parent of a current MIT student. So there's a nice hook in there, not that Google pays MIT royalties for that wonderful thing, by the way. All right. Bad jokes aside, searching Google-- ton of time. Searching a genomics data set-- ton of time. The data sets are growing so fast. You're working for the US government. You want to track terrorists using image surveillance from around the world, growing incredibly rapidly. Pick a problem. The data sets grow so quickly that even if the computers speed up, you still need to think about how to come up with efficient ways to solve those problems. So I want to suggest to you while sometimes simple solutions are great, they are the easy ones to rate-- too write. Sorry. At times, you need to be more sophisticated. Therefore, we want to reason about how do we measure efficiency and how do we relate algorithm design choices to the cost that's going to be associated with it? OK. Even when we do that, we've got a choice to make. Because we could talk about both efficiency in terms of time or in terms of space, meaning how much storage do I have inside the computer? And the reason that's relevant is there's actually in many cases a trade-off between those two. And you've actually seen an example, which you may or may not remember. You may recall when we introduced dictionaries, I showed you a variation where you could compute Fibonacci using the dictionary to keep track of intermediate values. And we'll see in next week that it actually tremendously reduces the time complexity. That's called a trade-off, in the sense that sometimes I can pre-compute portions of the answer, store them away, so that when I've tried to a bigger version of the answer I can just look up those portions. So there's going to be a trade-off here. We're going to focus, for purposes of this lecture and the next one, on time efficiency. How much time is it going to take our algorithms to solve a problem? OK. What are the challenges in doing that before we look at the actual tools? And in fact, this is going to lead into the tools. The first one is even if I've decided on an algorithm, there are lots of ways to implement that. A while loop and a for loop might have slightly different behavior. I could choose to do it with temporary variables or using direct substitution. There's lots of little choices. So an algorithm could be implemented many different ways. How do I measure the actual efficiency of the algorithm? Second one is I might, for a given problem, have different choices of algorithm. A recursive solution versus an iterative one. Using divide and conquer versus straightforward search. We're going to see some examples of that. So I've got to somehow separate those pieces out. And in particular, I'd like to separate out the choice of implementation from the choice of algorithm. I want to measure how hard is the algorithm, not can I come up with a slightly more efficient way of coming up with an implementation. So here are three ways I might do it. And we're going to look at each one of them very briefly. The obvious one is we could be scientists-- time it. Write the code, run a bunch of test case, run a timer, use that to try and come up with a way of estimating efficiency. We'll see some challenges with that. Slightly more abstractly, we could count operations. We could say here are the set of fundamental operations-- mathematical operations, comparisons, setting values, retrieving values. And simply say how many of those operations do I use in my algorithm as a function of the size of the input? And that could be used to give us a sense of efficiency. We're going to see both of those are flawed somewhat more in the first case than the second one. And so we're going to abstract that second one to a more abstract notion of something we call an order of growth. And I'll come back to that in a couple of minutes. This is the one we're going to focus on. It's one that computer scientists use. It leads to what we call complexity classes. So order of growth or big O notation is a way of abstractly describing the behavior of an algorithm, and especially the equivalences of different algorithms. But let's look at those. Timing. Python provides a timer for you. You could import the time module. And then you can call, as you can see right down here. I might have defined a really simple little function-- convert Celsius to Fahrenheit. And in particular, I could invoke the clock method from the time module. And what that does is it gives me a number as the number of some fractions of a second currently there. Having done that I could call the function. And then I could call the clock again, and take the difference to tell me how much time it took to execute this. It's going to be a tiny amount of time. And then I could certainly print out some statistics. I could do that over a large number of runs-- different sizes of the input-- and come up with a sense of how much time does it take. Here's the problem with that. Not a bad idea. But again, my goal is to evaluate algorithms. Do different algorithms have different amounts of time associated with them? The good news is is that if I measure running time, it will certainly vary as the algorithm changes. Just what I want to measure. Sorry. But one of the problems is that it will also vary as a function of the implementation. Right? If I use a loop that's got a couple of more steps inside of it in one algorithm than another, it's going to change the time. And I don't really care about that difference. So I'm confounding or conflating implementation influence on time with algorithm influence on time. Not so good. Worse, timing will depend on the computer. My Mac here is pretty old. Well, at least for computer versions. It's about five years old. I'm sure some of you have much more recent Macs or other kinds of machines. Your speeds may be different from mine. That's not going to help me in trying to measure this. And even if I could measure it on small sized problems, it doesn't necessarily predict what happens when I go to a really large sized problems, because of issues like the time it takes to get things out of memory and bring them back in to the computer. So what it says is that timing does vary based on what I'd like to measure, but it varies on a lot of other factors. And it's really not all that valuable. OK. Got rid of the first one. Let's abstract that. By abstract, I'm going to make the following assumption. I'm going to identify a set of primitive operations. Kind of get to say what they are, but the obvious one is to say what does the machine do for me automatically. That would be things like arithmetic or mathematical operations, multiplication, division, subtraction, comparisons, something equal to another thing, something greater than, something less than, assignments, set a name to a value, and retrieval from memory. I'm going to assume that all of these operations take about the same amount of time inside my machine. Nice thing here is then it doesn't matter which machine I'm using. I'm measuring how long does the algorithm take by counting how many operations of this type are done inside of the algorithm. And I'm going to use that count to come up with a number of operations executed as a function of the size of the input. And if I'm lucky, that'll give me a sense of what's the efficiency of the algorithm. So this one's pretty boring. It's got three steps. Right? A multiplication, a division, and an addition-- four, if you count the return. But if I had a little thing here that added up the integers from 0 up to x, I've got a little loop inside here. And I could count operations. So in this case, it's just, as I said, three operations. Here, I've got one operation. I'm doing an assignment. And then inside here, in essence, there's one operation to set i to a value from that iterator. Initially, it's going to be 0. And then it's going to be 1. And you get the idea. And here, that's actually two operations. It's nice Python shorthand. But what is that operation? It says take the value of total and the value of i, add them together-- it's one operation-- and then set that value, or rather, set the name total to that new value. So a second operation. So you can see in here, I've got three operations. And what else do I have? Well, I'm going to go through this loop x times. Right? I do it for i equals 0. And therefore, i equal 1, and so on. So I'm going to run through that loop x times. And if I put that together, I get a nice little expression. 1 plus 3x. Actually, I probably cheated here. I shouldn't say cheated. I probably should have counted the return as one more operation, so that would be 1 plus 3x plus 1, or 3x plus 2 operations. Why should you care? It's a little closer to what I'd like. Because now I've got an expression that tells me something about how much time is this going to take as I change the size of the problem. If x is equal to 10, it's going to take me 32 operations. If x is equal to 100, 302 operations. If x is equal to 1,000, 3,002 operations. And if I wanted the actual time, I'd just multiply that by whatever that constant amount of time is for each operation. I've got a good estimate of that. Sounds pretty good. Not quite what we want, but it's close. So if I was counting operations, what could I say about it? First of all, it certainly depends on the algorithm. That's great. Number of operations is going to directly relate to the algorithm I'm trying to measure, which is what I'm after. Unfortunately, it still depends a little bit on the implementation. Let me show you what I mean by that by backing up for a second. Suppose I were to change this for loop to a while loop. I'll set i equal to 0 outside of the loop. And then while i is less than x plus 1, I'll do the things inside of that. That would actually add one more operation inside the loop, because I both have to set the value of i and I have to test the value of i, as well as doing the other operations down here. And so rather than getting 3x plus 1, I would get 4x plus 1. Eh. As the government says, what's the difference between three and for when you're talking about really big numbers? Problem is in terms of counting, it does depend. And I want to get rid of that in a second, so it still depends a little bit on the implementation. I remind you, I wanted to measure impact of the algorithm. But the other good news is the count is independent of which computer I run on. As long as all my computers come with the same set of basic operations, I don't care what the time of my computer is versus yours to do those operations on measuring how much time it would take. And I should say, by the way, one of the reasons I want to do it is last to know is it going to take 37.42 femtoseconds or not, but rather to say if this algorithm has a particular behavior, if I double the size of the input, does that double the amount of time I need? Does that quadruple the amount of time I need? Does it increase it by a factor of 10? And here, what matters isn't the speed of the computer. It's the number of operations. The last one I'm not going to really worry about. But we'd have to really think about what are the operations we want to count. I made an assumption that the amount of time it takes to retrieve something from memory is the same as the amount of time it takes to do a numerical computation. That may not be accurate. But this one could probably be dealt with by just agreeing on what are the common operations and then doing the measurement. So this is closer. Excuse me. And certainly, that count varies for different inputs. And we can use it to come up with a relationship between the inputs and the count. And for the most part, it reflects the algorithm, not the implementation. But it's still got that last piece left there, so I need to get rid of the last piece. So what can we say here? Timing and counting do evaluate or reflect implementations? I don't want that. Timing also evaluates the machines. What I want to do is just evaluate the algorithm. And especially, I want to understand how does it scale? I'm going to say what I said a few minutes ago again. If I were to take an algorithm, and I say I know what its complexity is, my question is if I double the size of the input, what does that say to the speed? Because that's going to tell me something about the algorithm. I want to say what happens when I scale it? And in particular, I want to relate that to the size of the input. So here's what we're going to do. We're going to introduce orders of growth. It's a wonderful tool in computer science. And what we're going to focus on is that idea of counting operations. But we're not going to worry about small variations, whether it's three or four steps inside of the loop. We're going to show that that doesn't matter. And if you think about my statement of does it double in terms of size or speed or not-- or I'm sorry-- time or not, whether it goes from three to six or four to eight, it's still a doubling. So I don't care about those pieces inside. I'm going to focus on what happens when the size of the problem gets arbitrarily large. I don't care about counting things from 0 up to x when x is 10 or 20. What happens when it's a million? 100 million? What's the asymptotic behavior of this? And I want to relate that time needed against the size of the input, so I can make that comparison I suggested. OK. So to do that, we've got to do a couple of things. We have to decide what are we going to measure? And then we have to think about how do we count without worrying about implementation details. So we're going to express efficiency in terms of size of input. And usually, this is going to be obvious. If I've got a procedure that takes one argument that's an integer, the size of the integer is the thing I'm going to measure things in. If I double the size of that integer, what happens to the computation? If I'm computing something over a list, typically the length of the list is going to be the thing I'm going to use to characterize the size of the problem. If I've got-- and we'll see this in a second-- a function that takes more than one argument, I get to decide what's the parameter I want to use. If I'm searching to see is this element in that list, typically, I'm going to worry about what's the size of the list, not what's the size of the element. But we have to specify what is that we're measuring. And we're going to see examples of that in just a second. OK. So now, we start thinking about that sounds great. Certainly fun computing something numeric. Sum of integers from 0 up to x. It's kind of obvious x is the size of my problem. How many steps does it take? I can count that. But in some cases, the amount of time the code takes is going to depend on the input. So let's take this little piece of code here. And I do hope by now, even though we flash up code, you're already beginning to recognize what does it do. Not the least of which, by the clever name that we chose. But this is obviously just a little function. It runs through a loop-- sorry, a for loop that takes i for each element in a list L. And it's checking to see is i equal to the element I've provided. And when it is, I'm going to return true. If I get all the way through the loop and I didn't find it, I'm going to return false. It's just saying is e in my input list L? How many steps is this going to take? Well, we can certainly count the number of steps in the loop. Right? We've got a set i. We've got to compare i and potentially we've got to return. So there's at most three steps inside the loop. But depends on how lucky I'm feeling. Right? If e happens to be the first element in the list-- it goes through the loop once-- I'm done. Great. I'm not always that lucky. If e is not in the list, then it will go through this entire loop until it gets all the way through the elements of L before saying false. So this-- sort of a best case scenario. This is the worst case scenario. Again, if I'm assigned and say well, let's run some trials. Let's do a bunch of examples and see how many steps does it go through. And that would be the average case. On average, I'm likely to look at half the elements in the list before I find it. Right? If I'm lucky, it's early on. If I'm not so lucky, it's later on. Which one do I use? Well, we're going to focus on this one. Because that gives you an upper bound on the amount of time it's going to take. What happens in the worst case scenario? We will find at times it's valuable to look at the average case to give us a rough sense of what's going to happen on average. But usually, when we talk about complexity, we're going to focus on the worst case behavior. So to say it in a little bit different way, let's go back to my example. Suppose you gave it a list L of some length. Length of L, you can call that len if you like. Then my best case would be the minimum running type. And in this case, it will be for the first element in the list. And notice in that case, the number of steps I take would be independent of the length of L. That's great. It doesn't matter how long the list is. If I'm always going to find the first element, I'm done. The average case would be the average over the number of steps I take, depending on the length of the list. It's going to grow linearly with the length of the list. It's a good practical measure. But the one I want to focus on will be the worst case. And here, the amount of time as we're going to see in a couple of slides, is linear in the size of the problem. Meaning if I double the length of the list in the worst case, it's going to take me twice as much time to find that it's not there. If I increase the length in the list by a factor of 10, in the worst case, it's going to take me 10 times as much time as it did in the earlier case to find out that the problem's not there. And that linear relationship is what I want to capture. So I'm going to focus on that. What's the worst case behavior? And we're about ready to start talking about orders of growth, but here then is what orders of growth are going to provide for me. I want to evaluate efficiency, particularly when the input is very large. What happens when I really scale this up? I want to express the growth of the program's runtime as that input grows. Not the exact runtime, but that notion of if I doubled it, how much longer does it take? What's the relationship between increasing the size of the input and the increase in the amount of time it takes to solve it? We're going to put an upper bound on that growth. And if you haven't seen this in math, it basically says I want to come up with a description that is at least as big as-- sorry-- as big as or bigger than the actual amount of time it's going to take. And I'm going to not worry about being precise. We're going to talk about the order of rather than exact. I don't need to know to the femtosecond how long this is going to take, or to exactly one operation how long this is going to take. But I want to say things like this is going to grow linearly. I double the size of the input, it doubles the amount of time. Or this is going to grow quadratically. I double the size of the input, it's going to take four times as much time to solve it. Or if I'm really lucky, this is going to have constant growth. No matter how I change the input, it's not going to take any more time. To do that, we're going to look at the largest factors in the runtime. Which piece of the program takes the most time? And so in orders of growth, we are going to look for as tight as possible an upper bound on the growth as a function of the size of the input in the worst case. Nice long definition. Almost ready to look at some examples. So here's the notation we're going to use. It's called Big O notation. I have to admit-- and John's not here today to remind me the history-- I think it comes because we used Omicron-- God knows why. Sounds like something from Futurama. But we used Omicron as our symbol to define this. I'm having such good luck with bad jokes today. You're not even wincing when I throw those things out. But that's OK. It's called Big O notation. We're going to use it. We're going to describe the rules of it. Is this the tradition of it? It describes the worst case, because it's often the bottleneck we're after. And as we said, it's going to express the growth of the program relative to the input size. OK. Let's see how we go from counting operations to getting to orders of growth. Then we're going to define some examples of ordered growth. And we're going to start looking at algorithms. Here's a piece of code you've seen before. Again, hopefully, you recognize or can see fairly quickly what it's doing. Computing factorials the iterative way. Basically, remember n factorial is n times n minus 1 times n minus 2 all the way down to 1. Hopefully, assuming n is a non-negative integer. Here, we're going to set up an internal variable called answer. And then we're just going to run over a loop. As long as n is bigger than 1, we're going to multiply answer by n, store it back into answer, decrease n by 1. We'll keep doing that until we get out of the loop. And we're going to return answer. We'll start by counting steps. And that's, by the way, just to remind you that in fact, there are two steps here. So what do I have? I've got one step up there. Set answer to one. I'm setting up n-- sorry, I'm not setting up n. I'm going to test n. And then I'm going to do two steps here, because I got a multiply answer by n and then set it to answer. And now similarly, we've got two steps there because I'm subtracting 1 from n and then setting it to n. So I've got 2 plus 4 plus the test, which is 5. I've got 1 outside here. I got 1 outside there. And I'm going to go through this loop n times. So I would suggest that if I count the number of steps, it's 1 plus 5n plus 1. Sort of what we did before. 5n plus 2 is the total number of steps that I use here. But now, I'm interested in what's the worst case behavior? Well, in this case, it is the worst case behavior because it doesn't have decisions anywhere in here. But I just want to know what's the asymptotic complexity? And I'm going to say-- oh, sorry-- that is to say I could do this different ways. I could have done this with two steps like that. That would have made it not just 1 plus 5n plus 1. It would have made it 1 plus 6n plus 1 because I've got an extra step. I put that up because I want to remind you I don't care about implementation differences. And so I want to know what captures both of those behaviors. And in Big O notation, I say that's order n. Grows linearly. So I'm going to keep doing this to you until you really do wince at me. If I were to double the size of n, whether I use this version or that version, the amount of time the number of steps is basically going to double. Now you say, wait a minute. 5n plus 2-- if n is 10 that's 52. And if n is 20, that's 102. That's not quite doubling it. And you're right. But remember, we really care about this in the asymptotic case. When n gets really big, those extra little pieces don't matter. And so what we're going to do is we're going to ignore the additive constants and we're going to ignore the multiplicative constants when we talk about orders of growth. So what does o of n measure? Well, we're just summarizing here. We want to describe how much time is needed to compute or how does the amount of time, rather, needed to computer problem growth as the size of the problem itself grows. So we want an expression that counts that asymptotic behavior. And we're going to focus as a consequence on the term that grows most rapidly. So here are some examples. And I know if you're following along, you can already see the answers here. But we're going to do this to simply give you a sense of that. If I'm counting operations and I come up with an expression that has n squared plus 2n plus 2 operations, that expression I say is order n squared. The 2 and the 2n don't matter. And think about what happens if you make n really big. n squared is much more dominant than the other terms. We say that's order n squared. Even this expression we say is order n squared. So in this case, for lower values of n, this term is going to be the big one in terms of number of steps. I have no idea how I wrote such an inefficient algorithm that it took 100,000 steps to do something. But if I had that expression for smaller values of n, this matters a lot. This is a really big number. But when I'm interested in the growth, then that's the term that dominates. And you see the idea or begin to see the idea here that when I have-- sorry, let me go back there-- when I have expressions, if it's a polynomial expression, it's the highest order term. It's the term that captures the complexity. Both of these are quadratic. This term is order n, because n grows faster than log of n. This funky looking term, even though that looks like the big number there and it is a big number, that expression we see is order n log n. Because again, if I plot out as how this changes as I make n really large, this term eventually takes over as the dominant term. What about that one? What's the big term there? How many people think it's n to the 30th? Show of hands. How many people think it's 3 to the n? Show of hands. Thank you. You're following along. You're also paying attention. How many people think I should stop asking questions? No show of hands. All right. But you're right. Exponentials are much worse than powers. Even something like this-- again, it's going to take a big value of n before it gets there, but it does get there. And that, by the way, is important, because we're going to see later on in the term that there are some problems where it's believed that all of the solutions are exponential. And that's a pain, because it says it's always going to be expensive to compute. So that's how we're going to reason about these things. And to see it visually, here are the differences between those different classes. Something that's constant-- the amount of time doesn't change as I change the size of the input. Something that linear grows as a straight line, as you would expect. Nice behavior. Quadratic starts to grow more quickly. The log is always better than linear because it slows down as we increase the size. n log n or log linear is a funky term, but we're going to see it's a very common complexity for really valuable algorithms in computer science. And it has a nice behavior, sort of between the linear and the quadratic. And exponential blows up. Just to remind you of that-- well, sorry-- let me show you how we're going to do the reasoning about this. So here's how we're going to reason about it. We've already seen some code where I started working through this process of counting operations. Here are the tools I want you to use. Given a piece of code, you're going to reason about each chunk of code separately. If you've got sequential pieces of code, then the rules are called the law of addition for order of growth is that the order of growth of the combination is the combination of the order of the growth. Say that quickly 10 times. But let's look at an example of that. Here are two loops. You've already seen examples of how to reason about those loops. For this one, it's linear in the size of n. I'm going to go through the loop n times doing a constant amount of things each time around. But what I just showed, that's order n. This one-- again, I'm doing just a constant number of things inside the loop-- but notice, that it's n squared. So that's order n squared. n times n. The combination is I have to do this work and then that work. So I write that as saying that is order of n plus order of n squared. But by this up here, that is the same as saying what's the order of growth of n plus n squared. Oh yeah. We just saw that. Says it's n squared. So addition or the law of addition let's be reasonable about the fact that this will be an order n squared algorithm. Second one I'm going to use is called the law of multiplication. And this says when I have nested statements or nested loops, I need to reason about those. And in that case, what I want to argue-- or not argue-- state is that the order of growth here is a multiplication. That is, when I have nested things, I figure out what's the order of growth of the inner part, what's the order growth of the outer part, and I'm going to multiply-- bleh, try again-- I'm going to multiply together those orders of growth, get the overall order of growth. If you think about it, it makes sense. Look at my little example here. It's a trivial little example. But I'm looping for i from 0 up to n. For every value of i, I'm looping for j from 0 up to n. And then I'm printing out A. I'm the Fonz. I'm saying heyyy a lot. Oh, come on. At least throw something, I mean, when it's that bad. Right? Want to make sure you're still awake. OK. You get the idea. But what I want to show you here is notice the order of growth. That's order n. Right? I'm doing that n times. But I'm doing that for each value of i. The outer piece here loops also n times. For each value of i, I'm doing order n things. So I'm doing order of n times order of n steps. And by that law, that is the same order of n times n or n squared. So this is a quadratic expression. You're going to see that a lot. Nested loops typically have that kind of behavior. Not always, but typically have that kind of behavior. So what you're going to see is there's a set of complexity classes. And we're about to start filling these in. Order one is constant. Says amount of time it takes doesn't depend on the size of the problem. These are really rare that you get. They tend to be trivial pieces of code, but they're valuable. Log n reflects logarithmic runtime. You can sort of read the rest of them. These are the kinds of things that we're going to deal with. We are going to see examples here, here, and here. And later on, we're going to come back and see these, which are really nice examples to have. Just to remind you why these orders of growth matter-- sorry, that's just reminding you what they look like. We've already done that. Here is the difference between constant log, linear, log linear squared, and exponential. When n is equal to 10, 100, 1,000 or a million. I know you know this, but I want to drive home the difference. Something that's constant is wonderful, no matter how big the problem is. Takes the same amount of time. Something that is log is pretty nice. Increase the size of the problem by 10, it increases by a factor of 2. From another 10, it only increases by a factor of another 50%. It only increases a little bit. That's a gorgeous kind of problem to have. Linear-- not so bad. I go from 10 to 100 to 1,000 to a million. You can see log linear is not bad either. Right? A factor of 10 increase here is only a factor of 20 increase there. A factor of 10 increase there is only a factor of 30 increase there. So log linear doesn't grow that badly. But look at the difference between n squared and 2 to the n. I actually did think of printing this out. By the way, Python will compute this. But it was taken pages and pages and pages. I didn't want to do it. You get the point. Exponential-- always much worse. Always much worse than a quadratic or a power expression. And you really see the difference here. All right. The reason I put this up is as you design algorithms, your goal is to be as high up in this listing as you can. The closer you are to the top of this list, the better off you are. If you have a solution that's down here, bring a sleeping bag and some coffee. You're going to be there for a while. Right? You really want to try and avoid that if you can. So now what we want to do, both for the rest of today in the last 15 minutes and then next week, is start identifying common algorithms and what is their complexity. As I said to you way back at the beginning of this lecture, which I'm sure you remember, it's not just to be able to identify the complexity. I want you to see how choices algorithm design are going to lead to particular kinds of consequences in terms of what this is going to cost you. That's your goal here. All right. We've already seen some examples. I'm going to do one more here. But simple iterative loop algorithms are typically linear. Here's another version of searching. Imagine I'll have an unsorted list. Arbitrary order. Here's another way of doing the linear search. Looks a little bit faster. I'm going to set a flag initially to false. And then I'm going to loop for i from 0 up to the length of L. I'm going to use that to index into the list, pull out each element of the list in turn, and check to see is it the thing I'm looking for. As soon as I find it, I'm going to send-- sorry-- set the flag to true. OK? So that when I return out of the loop, I can just return found. And if I found it to be true, if I never found it, found will still be false and I'll return it. We could count the operations here, but you've already seen examples of doing that. This is linear, because I'm looping n times if n is the length of the list over there. And the number of things I do inside the loop is constant. Now, you might say, wait a minute. This is really brain damaged, or if you're being more politically correct, computationally challenged. OK? In the sense of once I've found it, why bother looking at the rest of the list? So in fact, I could just return true right here. Does that change the order of growth of this algorithm? No. Changes the average time. I'm going to stop faster. But remember the order of growth captures what's the worst case behavior. And the worst case behavior is the elements not in the list I got to look at everything. So this will be an example of a linear algorithm. And you can see, I'm looping length of L times over the loop inside of there. It's taking the order one to test it. So it's order n. And if I were to actually count it, there's the expression. It's 1 plus 4n plus 1, which is 4n plus 2, which by my rule says I don't care about the additive constant. I only care about the dominant term. And I don't care about that multiplicative constant. It's order n. An example of a template you're going to see a lot. Now, order n where n is the length of the list and I need to specify that. That's the thing I'm after. If you think about it, I cheated. Sorry-- I never cheat. I'm tenure. I never cheat. I just mislead you badly. Not a chance. How do I know that accessing an element of the list takes constant time? I made an assumption about that. And this is a reasonable thing to ask about-- both what am I assuming about the constant operations and how do I know that's actually true? Well, it gives me a chance to point out something that Python does very effectively. Not all languages do. But think about a list. Suppose I've got a list that's all integers. I'm going to need some amount of memory to represent each integer. So if a byte is 8 bits, I might reserve 4 bytes or 32 bits to cover any reasonable sized integer. When I represent a list, I could simply have each of them in turn. So what do I need to know? I'm going to allocate out a particular length-- say 4 bytes, 32 bits, 32 sequential elements of memory to represent each integer. And then I just need to know where's the first part of the list, what's the address and memory of the first part of the list. And to get to the i-th element, I take that base plus 4 bytes times i. And I can go straight to this point without having to walk down the list. That's nice. OK? It says, in fact, I can get to any element of memory-- I'm sorry-- any element of the list in constant time. OK. Now, what if the things I'm representing aren't integers? They're arbitrary things and they take up a big chunk of space. Well, if the list is heterogeneous, we use a nice technique called indirection. And that simply says we, again, have a list. We know the address of this point. We know the address there for the i-th element of this list. But inside of here, we don't store the actual value. We store a pointer to where it is in memory. Just what these things are indicating. So they can be arbitrary size. But again, I can get to any element in constant time, which is exactly what I want. So that's great. OK. Now, suppose I tell you that the list is sorted. It's in increasing order. I can be more clever about my algorithm. Because now, as I loop through it, I can say if it's the thing I'm looking for, just return true. If the element of the list is bigger than the thing I'm looking for, I'm done. I don't need to look at the rest of the list, because I know it can't be there because it's ordered or sorted. I can just return false. If I get all the way through the loop, I can return false. So I only have to look until I get to a point where the thing in the list is bigger than what I'm looking for. It's the order of growth here. Again, the average time behavior will be faster. But the order of growth is I've got to do order of length of the list to go through the loop, order of one to do the test, and in the worst case, again, I still have to go through the entire list. So the order of growth here is the same. It is, again, linear in the length of the list, even though the runtime will be different depending whether it's sorted or not. I want you to hold on to that idea, because we're going to come back to the sorted list next week to see that there actually are much more efficient ways to use the fact that a list is sorted to do the search. But both of these versions same order growth, order n. OK. So lurching through a list-- right, sorry-- searching through a list in sequence is linear because of that loop. There are other things that have a similar flavor. And I'm going to do these quickly to get to the last example. Imagine I give you a string of characters that are all soon to be composed of decimal digits. I just want to add them all up. This is also linear, because there's the loop. I'm going to loop over the characters in the string. I'm going to cast them into integers, add them in, and return the value. This is linear in the length of the input s. Notice the pattern. That loop-- that in-iterative loop-- it's got that linear behavior, because inside of the loop constant number of things that I'm executing. We already looked at fact iter. Same idea. There's the loop I'm going to do that n times inside the loop a constant amount of things. So looping around it is order n. There's the actual expression. But again, the pattern I want you to see here is that this is order n. OK. Last example for today. I know you're all secretly looking at your watches. Standard loops, typically linear. What about nested loops? What about loops that have loops inside of them? How long do they take? I want to show you a couple of examples. And mostly, I want to show you how to reason about them. Suppose I gave you two lists composed of integers, and I want to know is the first list a subset of the second list. Codes in the handbook, by the way, if you want to go run it. But basically, the simple idea would be I'm going to loop over every element in the first list. And for each one of those, I want to say is it in the second list? So I'll use the same kind of trick. I'll set up a flag that's initially false. And then I'm going to loop over everything in the second list. And if that thing is equal to the thing I'm looking for, I'll set match to true and break out of the loop-- the inner loop. If I get all the way through the second list and I haven't found the thing I'm looking for, when I break out or come out of this loop, matched in that case, will still be false and all return false. But if up here, I found something that matched, match would be true. I break out of it. It's not false. Therefore, a return true. I want you look at the code. You should be able to look at this and realize what it's doing. For each element in the first list, I walk through the second list to say is that element there. And if it is, I return true. If that's true for all of the elements in the first list, I return true overall. OK. Order of growth. Outer loop-- this loop I'm going to execute the length of L1 times. Right? I've got to walk down that first list. If I call that n, it's going to take that n times over the outer loop. But what about n here? All of the earlier examples, we had a constant number of operations inside of the loop. Here, we don't. We've got another loop that's looping over in principle all the elements of the second list. So in each iteration is going to execute the inner loop up to length of L2 times, where inside of this inner loop there is a constant number of operations. Ah, nice. That's the multiplicative law of orders of growth. It says if this is order length L1. And we're going to do that then order length of L2 times, the order of growth is a product. And the most common or the worst case behavior is going to be when the lists are of the same length and none of the elements of L1 are in L2. And in that case, we're going to get something that's order n squared quadratic, where n is the length of the list in terms of number of operations. I don't really care about subsets. I've got one more example. We could similarly do intersection. If I wanted to say what is the intersection of two lists? What elements are on both list 1 and list 2? Same basic idea. Here, I've got a pair of nested loops. I'm looping over everything in L1. For that, I'm looping over everything in L2. And if they are the same, I'm going to put that into a temporary variable. Once I've done that, I need to clean things up. So I'm going to write another loop that sets up an internal variable and then runs through everything in the list I accumulated, making sure that it's not already there. And as long as it isn't, I'm going to put it in the result and return it. I did it quickly. You can look through it. You'll see it does the right thing. What I want it to see is what's the order of growth. I need to look at this piece. Then I need to look at that piece. This piece-- well, it's order length L1 to do the outer loop. For each version of e1, I've got to do order of length L2 things inside to accumulate them. So that's quadratic. What about the second loop? Well, this one is a little more subtle. I'm only looping over temp, which is at most going to be length L1 long. But I'm checking to see is that element in a list? And it depends on the implementation. But typically, that's going to take up to the length of the list to do it. I got to look to see is it there or not. And so that inner loop if we assume the lists are the same size is also going to take potentially up to length L1 steps. And so this is, again, quadratic. It's actually two quadratics-- one for the first nested loop, one for the second one, because there's an implicit second loop right there. But overall, it's quadratic. So what you see in general-- this is a really dumb way to compute n squared. When you have nested loops, typically, it's going to be quadratic behavior. And so what we've done then is we've started to build up examples. We've now seen simple looping mechanisms, simple iterative mechanisms, nested loops. They tend to naturally give rise to linear and quadratic complexity. And next time, we're going to start looking at more interesting classes. And we'll see you next time.
Info
Channel: MIT OpenCourseWare
Views: 158,180
Rating: undefined out of 5
Keywords: algorithmic complexity, big oh, runtime, complexity classes, efficiency, linear complexity, quadratic complexity
Id: o9nW0uBqvEo
Channel Id: undefined
Length: 51min 26sec (3086 seconds)
Published: Wed Feb 15 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.