CS50 2020 - Lecture 2 - Arrays

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Music] [Music] [Music] all right so this is cs50 and this is week two where and we're going to dive in a little more deeply to see this new language and we're also going to take a look back at some of the concepts we looked at last week so that you can better understand some of the features of c and some of the the steps you've been taking to make your code work so we'll peel back some of the layers of abstraction from last week so that you better understand really what's going on underneath the hood of the computer so of course last week we began with perhaps the most canonical of programs in c the most canonical of programs you can write pretty much in any language which is that which says quite simply hello world but recall that before we're actually running this program we have to convert it into the language the computers themselves speak which we defined last week as binary zeros and ones otherwise known as machine language in in this context so we have to go somehow from this source code to something more like this machine code the zeros and ones that the computer actually understands now you may recall too that we introduced a command for this and that command was called make and literally via this command make hello could we make a program called hello and make was a little fancy it assumed that if you want to make a program called hello it would look for a file called hello.c that just happens automatically for you and the end result of course was an additional file called hello that would end up getting put into your current directory so you could then do dot slash hello and be on your way but it turns out that make is actually automating a more specific set of steps for us that we'll see a little more closely now instead so on the screen here is exactly the same code that we wrote last week to say quite simply hello world and recall that any time you run make hello or make mario or make cash or may credit any of the problems that you might have tackled more recently you see some cryptic output on the screen hopefully no red or yellow error messages but even when all is well you see this white text which is indicative of all having been well and last week we just kind of ignored this and moved on and immediately did something like dot slash hello but today let's actually better understand what it is that we've been turning a blind eye to so that each week as it passes there's less and less that you don't understand the entirety of with respect to what's going on on your screen so again if i do ls here we'll see not only hello.c but also the executable program called hello that i actually created via make but look at this output there's some mention of something called clang here and then there's a lot of other words or cryptic phrases something in computer speak here that has all of these hyphens in front of them and it turns out that what make us doing for us is it's automating execution of a command more specifically called clang clang is actually the compiler that we alluded to last week a compiler being a program that converts source code to machine code we've actually been using clang this whole time but notice that clang requires a bit more sophistication you have to understand a bit more about what's going on in order to use it so let me go ahead and remove the program called hello i'm going to use the rm command that we saw briefly last time i'm going to confirm by hitting y and if i type ls again now hello.c is the only file that remains well temporarily let me take away the ability to use make and let's now use clang directly clang is another program installed in cs50 ide it's a very popular compiler that you can download onto your own macs and pcs as well but to run it is a little different i'm going to go ahead and say clang and then the name of the file that i want to compile hello.c being this one i'm going to go ahead and hit enter and now nothing happens seemingly but frankly as you've probably gleaned already when nothing bad seems to happen that implicitly tends to mean that something good happened your program compiled successfully but curiously if i type ls now you don't see the program hello you see this weird file name called a dot out and this is actually a historical remnant years ago when humans would use a compiler to compile their code the default file name that every program was given was a dot out for assembly output more on that in a moment but this is kind of a stupid name for a program it's not at all descriptive of what it does so it turns out that programs like clang can be configured at the command line the command line again refers to the blinking prompt where you can type commands so indeed i'm going to go ahead and remove this file now rm space a dot out and then confirm with y and now i'm back to where i began with just hello.c and let me go ahead now and do something a little different i'm going to do clang dash o hello and then the word hello.c and what i'm doing here is actually providing what we're going to start calling a command lined argument so these commands like make and rm sometimes can just be run all by themselves you just run type a single word and hit enter but very often we've seen that they take inputs in some sense you type make hello you type rm hello and the second word hello in those cases is kind of an input to the command otherwise now known as a command line argument it's an input to the command so here we have more command line arguments we've got the word clang which is the compiler we're about to run dash o which it turns out is shorthand notation for output so please output the following what do you want to output well the next word is hello and then the final word is hello.c so long story short this command now more verbose though it is is saying run clang output a file called hello and take as input a file called hello.c so when i run this command after hitting enter nothing again seems to happen but if i type ls i don't see that stupid default file name of a.out now i see the file name hello so this is how ultimately clang is helping me compile my code it's kind of automating all of those processes but recall that that's not the only type of program we ran last week or wrote last week we rather took code like this and began to enhance it with some additional lines so version two of hello world actually involved prompting the user for input using cs50's getstring function storing the output in a variable called name but recall that we also had to add cs50.h at the top of the file so let me go ahead and do that let me go ahead and remove hello because that's now the old version let me go in now and start updating my code here and go into my hello.c file include cs50.h now get myself a string called name but we could call it anything call the function getstring and ask what's your name question mark with a space at the very end just to create a gap and then down here instead of printing out hello world always let me print out hello percent s which is a placeholder recall and output the person's name so last week the way we compiled this program was just make hello no different from now but this week suppose i were to instead get rid of make only because it's sort of automating steps for me that i now want to understand in more detail i could compile this program again with clang dash o hello hello.c so just a re-application of that same idea of passing in three arguments dasho hello and hello.c but the catch now is that i'm actually going to see one of these red error messages and let's consider what this is actually saying there's still going to be a bunch of cryptic stuff here but notice as always we're going to see hopefully something that's a little familiar so undefined reference to get string i don't yet know what an undefined reference is necessarily i don't know what a linker command is but i at least recognize there's something going on with getstring and there's a reason for this it turns out that when using a library whether it's cs50s library or others as well it's sometimes not sufficient only to include the header file at the top of your own code sometimes you additionally have to tell the computer where to find the zeros and ones that someone has written to implement a function like getstring so the header file like cs50.h just tells the compiler that the function exists but there's a second mechanism that up until now has been automated for us that tells the computer where to find the actual zeros and ones that implement the functions in that header file so with that said i'm going to need to actually add another command line argument to this command and instead of doing clang dash o hello hello.c i'm going to additionally and admittedly cryptically do dash l cs50 at the end of this command which quite simply refers to link in the cs50 library so link is a term of art that we'll see what it means in more detail in just a moment but this additional final command line argument tells clang you already know that a function like getstring exists dash lcs50 means when compiling hello.c make sure to incorporate all of the machine code from cs50s library into your program as well in short it's something you have to do when you use certain libraries so now when i hit enter all seems to be well because nothing bad got printed if i type ls i see hello and voila i can do dot slash hello type in my name david and voila hello david so why didn't we do all of this last week and frankly we've made no fundamental progress all we've done is reveal what's going on underneath the hood but i'll claim that frankly compiling your code by typing out all of these verbose command line arguments just gets tedious quickly and so computer scientists and programmers more specifically tend to automate monotonous steps so what's happening ultimately with make is that all of this is being automated for us so when you typed make hello last week and henceforth you're welcome to continue using make as well notice that it generates this extra long command some of which we haven't even talked about but i do recognize clang at the beginning i recognize hello.c here i recognize dash l cs50 here but notice there's a bunch of other stuff as well not only the dash oh hello but also dash lm which refers to a math library dash l crypt which refers to a cryptography or encryption library in short we the staff have pre-configured make to just make sure that when you compile your code all of the requisite dependency libraries and so forth are available to you without having to worry about all of these command line arguments so henceforth you can certainly compile your code in this way using clang directly or you can come back full circle to where we were last week and just run make hello but there's a reason we run make hello because uh executing all of those steps manually tends to just get tedious quickly and so indeed what we've done here is compile our code and compiling means going from source code to machine code but today we reveal that there's a little more indeed going on underneath the hood this linking that i referred to and a couple of other steps as well so it turns out when you compile your code from source code to machine code there's a few more steps that are ultimately involved and when we say compiling we actually mean these four steps and we're not going to dwell on these kinds of low level details but it's perhaps enlightening just to see a brief tour of what's going on when you start with your source code and end up trying to produce machine code so let's consider this this is step one that the computer is doing for you when you compile your code so step one takes your own source code that looks a little something like this and it pre-processes your code top to bottom left to right and to pre-process your code essentially means that it looks for any lines that start with a hash symbol so hash include cs50.h hash includes standardio.h and what the preprocessing step does is it's kind of like a find and replace it notices oh here's a hash include line let me go ahead and copy the contents of that file cs50.h into your own code similarly when i encounter hash include standardio.h let me the so-called preprocessor open that file standardio.h and copy paste the contents of that file so that what's in the file now looks more like this so this is happening automatically you never have to do this manually but why is there this pre-processing step if you recall our discussion last week of these lines of code that tend to go at the top of your file does anyone perceive what the preprocessor is doing for me and why why do i write code that has these hash symbols like include cs50.h and include standardio.h but this preprocessor apparently is automatically replacing those lines with the actual contents of those files what are these things here in yellow now uh yeah jack what do you think is it defining all the functions for you using your code otherwise the computer wouldn't know what to do exactly it's defining all of the functions in my code so the computer knows what to do because remember that we ran into that sort of annoying bug last week whereby i was trying to implement a function called i think get positive int and recall that when i implemented that function at the bottom of my file the compiler was kind of dumb in that it didn't realize that it existed because it was implemented all the way at the bottom of my file so to jack's point by putting a mention of this function a hint if you will at the very top it's like training the compiler to know in advance that i don't know how it's implemented yet but i know getstring is going to exist i don't know how it's implemented yet but i know printf is going to exist so these header files that we've been including for the past week essentially contain all of the prototypes that is all of the hints for all the functions that exist in the library so that your code when compiled know from the top down that those functions will indeed exist so the preprocessor just saves us the trouble of having to copy and paste all of these prototypes if you will all of these hints ourselves so what happens after that step there what comes next well there might very well be other header files there might very well be other contents in those files but for now let's just assume that only in there is the prototype so now compiling actually has a more precise meaning that we'll define today to compile your code now means to take this c code and to convert it from source code here to another type of source code here now this is probably going to be the most cryptic stuff we ever see and this is not code you need to understand but what's on the screen here is what's called assembly code so long story short there's a lot of different computers in the world and specifically there's a lot of different types of cpus in the world central processing units the brains of a computer and a cpu understands certain commands and those commands tend to be expressed in this language called assembly code now i honestly don't really understand most of this myself it's certainly been a while even since i thought hard about assembly code but if i highlight a few operative characters here notice that there's mention of main getstring and printf so this is sort of like a lower level implementation of main of getstring and printf in a different language called assembly so you write the c code the computer though converts it to a more computer computer-friendly language called assembly code and decades ago humans wrote this stuff humans wrote assembly code but nowadays we have c and nowadays we have languages like python more on that in a few weeks that are just more user-friendly even if it didn't feel like that this past week assembly code is a little closer to what the computer itself understands but there's still another step there's this step called assembling and again all of this is happening when you simply run make and in turn this command clang to assemble your code means to take this assembly code and finally convert it to machine code zeros and ones so you write the source code the compiler assembles it into assembly code then it compiles it into assembly code then it assembles it into machine code until we have the actual zeros and ones but there's actually one final step just because your code that you wrote has been converted into zeros and ones it still needs to be linked in with the zeros and ones that cs50 wrote and that the designers of the c language wrote years ago when implementing the cs50 library in our case and the printf function in their case so this is to say that when you have code like this that's not only including the prototypes for functions like getstring and printf at the very top these lines here in yellow are what are ultimately converted into uh zeros and ones we now have to combine those zeros and ones with the zeros and ones from cs50.c which the staff wrote some time ago and even a file called standardio.c which the designers of c wrote years ago and technically it might be called something different underneath the hood but there's really three files that are getting combined when you write your program the first i just claimed once it's uh one it's it's pre-processed and compiled and assembled it's then in this form of all zeros and ones somewhere on the cs50 ide there's a whole bunch of zeros and ones representing cs50.c somewhere in cs50 ide there's another file representing the zeros and ones for standardio.c so this final fourth step aka linking just takes all of my zeros and ones all of cs50 zeros and ones all of printf's zeros and ones and links them all together into one big blob if you will that collectively represent your program hello so my god like that's quite a mouthful and so many steps and none of the steps have i described are really germane to you implementing mario's pyramid or cash or credit because what we've really been doing over the past week is taking all four of these fairly low level sophisticated concepts and if you will abstracting them away so that we just refer to this whole process as compiling so even though yes technically compiling is just one of the four steps what a programmer typically does when saying compiling is there just with a wave of the hand referring to all of those lower level details but it's stan but it is the case that there's multiple steps happening underneath the hood and this is what make an intern clang are doing for you automating this process of going from source code to assembly code to machine code and then linking it all together with any libraries you might have used so no longer take for granted what's happening hopefully that offers you a glimpse a bit more of what's actually happening when you compile your own code well let me pause there because that's quite a mouthful and see if there's any questions on pre-processing compiling or assembling or linking aka compiling and again we won't dwell at this low level we'll tend to now just abstract this all away if we can sort of agree that okay yes there's those steps but what's really important is the whole process not the minutia sophia i had a question about with the first step when we're replacing all the information at the top um is that information contained within the ide or where do we are there files saved somewhere in that id like where it's getting all this information from yeah really good question where all these files coming from so yes when you are using cs50 ide or frankly if you're using your own mac or your own pc and you have pre-installed a compiler into your mac or pc just like we have into cs50 ide what you get is a whole bunch of dot h files somewhere on the computer system you might also have a whole bunch of dot c files or compiled versions thereof somewhere on the system so yes when you download and install a compiler you are getting all of these libraries added for you and we pre-installed an additional library called cs50s library that additionally comes with its own dot h file and its own machine code as well so all of those files are somewhere in cs50 ide or equivalently in your own mac or pc if you're working locally and the compiler clang in this case just knows how to find that because one of the steps involved in installing your own compiler is making sure it's configured to know per sophia's question where all those files are uh uh uh basically i'm sorry if i'm mispronouncing it basley so uh whenever we're uh compiling hello for example is the uh compiler also compiling for example cs50 or does cs50 already exist in machine code somewhere beneath yeah really good question too so i was kind of skirting this part of sofia's question because technically speaking probably cs50.c is not installed on the system and technically standardio.c is probably not installed in the system y it just doesn't need to be it would be kind of inefficient that is slow if every time you compiled your own program you had additionally compiled cs50s program and standard ios program and so forth so it actually stands to reason that what computers typically do is they pre-compile all of those library files for you so that more efficiently they can just be linked in and you don't have to keep pre-processing compiling and assembling third-party code you only perform those steps on your own code and then link everything together and indeed that's the case uh it's all done in advance iris question from you um when we like replace the header files with prototypes are we only replacing it with the prototypes that get used or like are all the prototypes technically like substitute yeah so i was kind of sweeping that detail under the rug with my dot dot there's a whole lot of other stuff in those files you're getting the entire contents of those files even if the only thing you need is the prototype but and this is why i alluded to the fact too that technically there probably isn't a standard io.c file because there would be so much stuff in it there's probably not just one standard io.h file with everything in it there's probably some smaller files that get magically included as well but yes there is man there are many more lines of code in those files but that's okay once your computer your compiler is only going to use the lines that it actually cares about good question all right so with that said this past week undoubtedly was a bit frustrating in some ways because you probably ran into problems you ran into bugs mistakes in your own code you probably saw one or more yellow or red error messages and you might have struggled a little bit just to get your code to compile and again that's normal that will go away over time but honestly whenever i write c uh let's say 20 of the time i still have a compilation error let alone logical errors in my own code so this is just part of the experience of writing code humans make mistakes in all forms of life and that's ever more true in the context of code where again per our first two weeks precision is important as is correctness and it's hard sometimes to achieve both of those goals so let's consider now how you might be more empowered to debug your own code that is find problems in your own code and this word actually has some etymology this isn't necessarily the first bug but perhaps the most famous bug is this one pictured here uh from the research notebook of uh grace hopper a famous computer scientist who had discovered that there were some problems with the harvard mark ii computer a very famous computer nowadays that actually lives over soon in the new engineering school on campus used to live in the science center the computer was having problems and sure enough when the engineers took a look inside of this big mainframe computer there was actually a bug pictured here and taped to grace hopper's notebook so this wasn't necessarily the first use of the term bug but it is a very well known example of an actual bug in an actual computer nowadays we speak a little more metaphorically that a bug is just a mistake in one's program and we did give you a few tools last week for troubleshooting bugs help 50 allows you to better understand some of the cryptic error messages and that's just because the staff wrote this program that analyze the problem you're having and we try to translate it to just more human friendly speak we saw a tool called style 50 which helps you not with your correctness but just with the aesthetics of your code helping you better indent things and add white space that is blank lines or space characters so it's a little more user friendly to the human to read and then check 50 which of course the staff write so that we can give you immediate feedback on whether or not your code is correct per the problem sets or the lab specification but there's some other tools that you should have in your toolkit and we'll give those to you today and one frankly is sort of this universal debugging tool just called in the context of c printf so printf of course is just this function that prints stuff out onto the screen but that in and of itself is a wonderfully powerful tool via which you can chase down problems in your code and even after we leave c in a few weeks and introduce python in other languages almost every programming language out there has some form of printf maybe it's called print maybe it's called say as it was in scratch but some ability to display information or present information to a human so let's try to use this primitive this notion of printf to chase down a bug in one's code so let me go ahead and deliberately write a buggy program i'm going to even call the file buggy buggy0.c and at the top of this file i'm going to go ahead and include uh standardio.h no need for the cs50 library for this one and then i'm going to do int main void which we saw last week and we'll explain in more detail today and then i'm going to give myself a quick loop i just want to go ahead and print out oh i don't know like 10 hashes on the screen so i want to print like a vertical column kind of like one of those screenshots from super mario brothers not a pyramid just a single column of hashes uh and 10 of them so i'm going to do something like int i equals 0 because i feel like i learned in class that i generally should start counting from 0. then i'm going to have my condition in this for loop and i want to do this 10 times so i'm going to do it less than or equal to 10. then i'm going to go ahead and have my increment which quite simply can be expressed as i plus plus and then inside this loop i'm just going to go ahead and print out a single hash followed by a new line i'm going to save the program i'm going to compile it with clang dash o buggy zero buggy zero i mean no you don't have to use clang manually in this way it's a lot simpler to just abstract that away that's not a command to abstract that away and run make buggy zero and make will take care of the process of invoking clang for you i'm going to go ahead and run it seems to be compiling successfully so no need for help 50. it's already pretty well styled in fact if i run style 50 on this buggy zero i don't have any comments yet but at least it looks very nicely indented so i think i'm okay with that but let me add that comment and do uh print 10 hashes just to remind myself of my goal and now let me go ahead and run this dot slash buggy zero enter and i see okay good one two three four five six seven eight nine ten eleven i think all right so it's a stupid bug and maybe it's jumped out obviously to some of you but maybe it's a little more subtle to others of you but where do you begin suppose i were to run check 50 and check 50 were to say nope you printed out 11 hashes instead of 10 but my code looks right to me at least at first glance well how can i go about debugging this or solving this well again printf is your friend if you want to understand more about your own program use printf to temporarily print more information to the screen not that we want in the final version not that your tf wants to see but that you the programmer can temporarily see so before i print this hash let me print something a little more pedantic like this i is now percent i backslash n so i literally want to know just for my own mental math what is the value of i at this point before i print that hash now i'm going to go ahead and paste in the value of i so i'm using percent i as a placeholder i'm plugging in the value of the variable i i'm going to save my code now i'm going to recompile it with make buggy zero and i'm going to rerun it now and let me go ahead and increase the size of my window just so we can focus now on the output and i'm going to go ahead and run dot slash buggy zero enter okay so now i see not only my output but also co-mingled with that output some diagnostic output if you will some debugging output and it's just more pedantically telling me i is now zero i is now one i is now two dot dot dot i is now nine i is now ten okay i don't hate the fact that i is ten but i'm not loving the fact that if i started at 0 and printed a hash and i'm hitting 10 and printing another hash well obviously there's my problem so it might not have been all that much more obvious than looking at the code itself but by using printf you can just be a lot more clear to yourself what's going on so if now i see okay well if i start at 0 i have to go up to 10 i could change my code to do this to be less than 10 i could leave that alone and go from 1 through 10 but again programmer convention would be to go from zero up to ten so i think i'm good now and in fact now i'll go ahead and recompile this make buggy zero let me go ahead and increase the size of the window again just so i can temporarily see this and do dot slash buggy zero okay i start now at zero one two dot dot dot now i stop at nine and that of course gives me ten hashes so again i don't need this in the final output and i'm gonna go ahead and delete this now as temporary output but again having those instincts if you don't quite understand why your code is compiling but not running properly and you want to better see what the computer is clearly seeing its mind's eye use printf to just tell yourself what the value of some variable or variables are anywhere in your code that you want to see a little more detail all right let me pause for just a moment to see if there's any questions on this technique of just using printf to begin to debug your code and to see the values of variables in a way that's a little more explicit no all right well let me propose an even more powerful tool that admittedly takes a little getting used to but this is kind of one of those lessons uh trust me if you will that if you spend a few more minutes maybe even an hour so this week learning the following tool you will save yourself hours plural maybe even tens of hours over the course of the next many weeks because this tool can help you truly see what's going on inside of your code so this tool we're going to add to the list today is called debug 50. and while this one does end with 50 implying that it's a cs50 tool it's built on top of an industry standard tool known as gdb the gnu debugger that's a standard tool that a lot of different computer systems use to provide you with the ability to debug your code in a more sophisticated way than just using printf alone so let's go ahead and do this let me go back to the buggy version of this program which recall had me going from 0 through 10 which was too many steps a moment ago i proposed that we just use printf to see the value of i but frankly the bigger our programs get the more complicated they get the more output they need to have on the screen it's just going to get very messy quickly if you're printing out stuff that shouldn't be there right think back to mario mario's pyramid is this sort of graphical output and it would very quickly get ugly and kind of hard to understand your pyramid if you're co-mingling that pyramid with actual textual output from printf as well so debug 50 and in turn a debugger in any language is a tool that allows you to run your code step by step and look inside of variables and other pieces of memory inside of the computer while your program is running right now pretty much every program we run takes like a split second to run that's way too fast for me the human to wrap my mind around what's going on step by step a debugger allows you to run your program but much more slowly step by step so you can see what's going on so i'm going to go ahead now and run debug 50 dot slash hello so no sorry debug 50 dot slash buggy zero so i write debug 50 first a space and then dot slash in the name of the program that's already compiled that i want to debug so i'm going to go ahead and hit enter and notice that oh it was smart it noticed that i changed my code and i did a moment ago i reverted it back to the buggy version so let me fix this make buggy zero all right no errors now let me go ahead and run debug 50 again and if you haven't noticed this already sometimes i seem to type crazy fast i'm not necessarily typing that fast i'm going through my history in cs50 ide using your arrow keys up and down you can scroll back in time for all of the commands you've typed over the past few minutes or hours or even days and this will just start to save you keystrokes so i'm going to go ahead and knit up and now i don't have to bother typing this whole command again it's a helpful way to just save time i'm going to go ahead now and hit enter and now notice this error message i haven't set any break points set at least one breakpoint by clicking to the left of a line number and then rerun debug 50. well what's going on here well debug 50 needs me to tell the computer in advance at what line i want to break into and step through step by step so i can do that i'm going to go to the side of the file here as it says and you know what the first interesting file uh first interesting line is this one here line six so i clicked in the so-called gutter the left-hand side of the screen on line six and that automatically put a red dot there like a stop sign now one last time i'm gonna go ahead and run debug 50 dot slash buggy zero and hit enter and now notice this fancy new panel opens up on the right hand side and it's going to look a little cryptic at first but let's consider what has changed on the screen notice now that highlighted in this sort of off-yellow color is line six and that's because what debug 50 is doing is it's running my program but it has paused execution on line six so it's done everything from line one through five but now it's waiting for me on line six and what's interesting over here is this let me zoom in on this window over here and there's a lot going on here admittedly but let's focus for just a moment not on watch expressions not on call stack but only on local variables and notice i have a variable called i whose initial value is zero and it's of type int now this is kind of interesting because watch what i can do via these icons up here i can click on this step over line and start to step through my code line by line so let me go ahead and zoom out let me go ahead and click step over and watch what happens to the yellow highlighting it moves down to the next line but notice if i zoom in again up here the value of i has not changed now let me go ahead and step over again and notice the yellow highlighting doubles back that makes sense because i'm in a loop so it should be going back and forth back and forth but what next happens in a loop every time you go back to the beginning of the loop remember that your incrementation happens like the i plus plus so watch now closely in the top right hand corner when i click step over now notice that the value of i in my debugger has just been changed to 1. so i didn't have to use printf i didn't have to mess up the output of my screen i can literally see in this gui this graphical user interface on the right hand side what the value of i is now if i just start clicking a little more quickly notice that as the loop is executing again and again the value of i keeps getting updated and you know what i bet even though we started at zero if i do this enough times i will see that the value is 10 now thereby giving me another printf at the bottom thereby explaining the 11 total hashes that i saw so i haven't gotten any new information here but notice i've gotten unperturbed information i've not messily and sort of sloppily printed out all of these printf statements on the screen i'm just kind of watching a little more methodically what's happening to the state of my variable over on the the top right there all right let me pause here too to see if there's any questions on what this debugger does again you compile your code you run debug 50 on your code but only after setting a so-called break point where you decide in advance where do you want to pause execution of your code even though here i did it pretty much at the beginning of my program for bigger programs it's going to be super convenient to be able to pause like halfway through your code and not have to go through the whole thing peter question uh about the debugger uh what's the difference between step over and step into and step out and really good question let me come back to that in just a moment because we'll do one other example where step into and step out actually are germaine but before we do that any other questions about debug 50 before we reveal what step into and step over do for us as well all right well let's take peter's question right there let me go ahead and now and get out of the debugger and honestly i don't see an obvious way to get out of the debugger at the moment but ctrl c is your new friend today too pretty much any time you lose control of a program because the debugger's running and you've lost interest in it or maybe last week you wrote a program that has an infinite loop that just keeps going and going and going ctrl c will break out of that program but let's now write quickly another program that this time has a second function and we'll see one other feature of the debugger today i'm going to go ahead and create a new file now called buggy1.c again it's going to be deliberately flawed but i'm going to first going to go ahead and include cs50.h this time and i'm going to include standardio.h i'm going to do int main void and i'm going to go ahead and do the following give myself a variable called i and i'm going to try to get a negative int by calling a function called get negative int and then quite simply i'm going to print out this value percent i backslash n i semicolon now there's only one problem get negative int does not exist so like last week where we implemented get positive in this week i'll implement get negative int but i'm going to do it incorrectly at first now get negative int as the name implies needs to return an integer and even though we only spent brief time on this last week recall that you can specify the output of a function a custom function that you wrote by putting its so-called return value first on this line and then you can put the name of the function like get negative int and then in parentheses you can put the input to the function but if it takes no input you can literally write the word void which is a term of art that just means nothing goes here i'm going to go ahead now and implement get negative int and frankly i think it's going to be pretty similar to last week but my memory is a little hazy so again it will be deliberately flawed but i'm going to go ahead and declare a variable called n then i'm going to do the following i'm going to set n equal to get int and i'm just going to explicitly ask the user for negative integer followed by a space and then i'm going to keep doing this while n is less than zero and then at the very last line i'm going to return n so again i claim that this function will get me a negative in from the user and it's going to keep doing it again and again until the user cooperates however there is a bug and there's a couple of bugs in fact right now let me go ahead and make a deliberate mistake make buggy one enter and i see a whole bunch of errors here i could use help 50 on this but based on last week does anyone recall what the error here might be error implicit declaration of function get negative int is invalid in c99 so i don't know all of that but implicit declaration of function is something you're going to start to see more often if you make this mistake anyone recall what this means and what the fix is without resorting to help50 yeah uh jasmine what do you think so basically since you declared it after you already used it in your code it doesn't know what to read that ads when it's processing it so you have to move like the first line above when you actually start the code perfect and this is the only time i will claim that copy paste is acceptable and encouraged i'm going to copy the very first line only of that function and as javascript proposed i'm going to paste it at the very top of the file thereby giving myself a hint otherwise known as a prototype so i'll even label it as such to remind myself why it's there prototype of that function and here i'm going to go ahead and get negative integer from user and then this function is is left as written so i now have this prototype at the very top of my file which i think will indeed get rid of this error let me do a make buggy one again now i see that it indeed compiled okay but when i run it now dot slash buggy one let me go ahead and input a negative integer negative one negative two negative three i feel like the function should be happy with this and it's obviously not so there's a bug i'm going to go ahead and hit ctrl c to get out of my program because like otherwise it would run potentially forever and now i'm going to use debug 50. but debug 50 just got really interesting to peter's question earlier because now i have things i can step into i'm not writing all of my code in main there's this other function now called get negative in so let's see what happens now let me go ahead and set a break point on like the first interesting line of code line 10 and it's interesting only in the sense that everything else is kind of boilerplate at this point you just have to do it to get your program started i'm going to now go down here and i'm going to do debug 50 dot slash buggy one and in a moment it's going to open up that sidebar and i'm going to focus now not only on local variables like i did before notice that i is again equal to 0 here by default but i'm also going to reveal this my option here call stack so call stack is a fancy way of referring to all of the functions that your program at this point in time has executed and not yet returned from so right now there's only one thing on the call stack because the only function that is currently executing is of course main because why i set a break point at line 10 which is by definition inside of main but does peter's question earlier i feel like lines 10 and 11 frankly they look pretty correct right it's hard at this point to have screwed up lines 10 and 11 except syntactically because i'm getting a negative int i'm storing it in i and then i'm printing out the value of i on those two lines but what if instead i'm curious that about get negative in i feel like the bug logically it's got to be in there because that's the harder code that i wrote notice this time instead of clicking step over let me go ahead and click on step into which is one of the buttons peter alluded to and when i click step into notice that you sort of go down the rabbit hole and debug 50 jumps into the function get negative ant and it focuses on the first interesting line of code so do in and of itself really isn't that interesting int n isn't that interesting because it's not assigning a value to it even yet the first juicy line of code seems to be line 19 and that's why the debugger has jumped to that line now n equals get int feels pretty correct it's hard to misuse get int but notice now on the right hand side what has happened under call stack you now see two things not only main but also get negative int and a stack it's like a stack of trays in a cafeteria the first tray at the bottom is like main the second tray on the stack in the cafeteria is now get negative int and what's cool about this is that notice that right now i can see my local variables n and that's indeed the variable i used so i no longer see i i see n because i'm into the get negative in function and now if i keep clicking step over again and again after typing in a number let me type in negative one here now notice on the top right of the screen you can see in the debugger that n equals negative one i'm going to now go ahead and click step over and i think i'm going to end up in line 22. if the human has typed in a negative integer like negative one obviously that's negative let's proceed to line 22 but watch what happens when i click step over it actually seems to be going back to the do loop again and again and again as it will if i keep providing negative integers so my logic then should be well okay if n is negative one but my loop is still running like what should your logical takeaway here be if n is negative one and that is by definition a negative integer but my loop is still running like what could be your diagnostic conclusion if the debugger is essentially revealing this hint to you n is negative one but the loop is still going omar what would you conclude either the condition is wrong maybe some sort of boolean logic could be flawed perfect so obviously either the condition is wrong or there's something wrong with my boolean logic and boolean logic just refers to true or false so somewhere i'm saying true instead of false or i'm saying false instead of true and frankly the only place where i have code that's going to make this loop go again and again must logically be on line 21. so even if you're not quite sure how to fix it yet just by deduction you should realize that okay negative one is what's in the variable but that's not good enough the loop is still going i must have screwed up the loop and indeed let me just now call it out line 21 is indeed the source of the bug so we've isolated it out of 23 lines we've at least found the one line where i know the the solution has to be what's the solution how do i fix the logic now thanks to the debugger having sort of led me down this road how do i fix line 21 here what's the fix would you propose uh yeah jacob you would have to change it from while n is less than zero to while n is greater than zero exactly so instead of and less than 0 i want to say n greater than 0 and i think slight clarification i think i want to include 0 here because 0 is not negative and if i want a negative n i think what i'm probably going to want to say is while n is greater than or equal to zero keep doing the loop so i very understandably sort of just inverted the logic no big deal i'm thinking negatives and i did less than but the fix is easy the point is the debugger led you to this point now those of you who have programmed before probably saw the bug jumping out at you those of you who haven't programmed before probably would sometime would have figured out what the bug was because out of 23 lines it's got to be one of those but as our programs get more sophisticated and we start writing more lines of code debug 50 and debuggers in general will be your friend and i realize that this is easier said than done because at first when using a debugger you're going to feel like ah this is just i'm just going to use printf i'm just going to sort of fight through this because there's a bit of a learning curve you will gain back that time and more by just using a debugger as your first instinct when chasing down problems like this all right so that's it for debug 50. a new tool in your tool kit in addition to printf but debug 50 is hands down the more powerful of the two now some of you have wondered over the past like couple of weeks why there's this little rubber duck here and there actually is a reason for this too and there's one final debugging technique that in all seriousness we'll introduce you today too known as rubber duck debugging and you can google this there's a whole wikipedia article about it uh and this is kind of a thing in computer science circles for computer scientists or programmers to have like rubber ducks on their desk and the point here is that sometimes when trying to understand what is wrong in your code it helps to just talk it through and in an ideal world we would just talk to our our colleague or our partner on some project and just in hearing yourself vocalize what it is your code is supposed to very often that proverbial light bulb goes off and you're like oh wait a minute never mind i got it just because you heard yourself speaking illogically when you intended something actual logical now we don't often all have uh colleagues or partners or friends with whom we're working on a project with and we don't often have family members or friends who want to hear about our code of all things and so a wonderful proxy for that conversant partner would be literally a rubber duck and so here in healthier times we would be giving all of you rubber ducks uh here on stage we brought a larger one for us all to share if you've noticed in some of the wide shots on camera uh there's a duck who's been watching this whole time so that anytime i screw up i literally have someone i can sort of talk to if non-verbally in this case but we can't emphasize enough that in addition to printf in addition to the more sophisticated debug 50 talking through your problems with code is a wonderfully valuable thing and if your friends or family are willing to hear about some low-level code you're writing and some bug you're trying to solve great but in the absence of that talk to a stuffed animal in your room talk to an actual rubber duck if you have one talk even aloud or think a lot it's just a wonderful compelling habit to get into because just in hearing yourself vocalize what you think is logical will the illogical very often jump out at you instead all right so with that said that's been a lot let's go ahead here and take a five minute break give everyone a bit of a breather and when we come back we'll take a look now at some of the more powerful features of c now that we can trust that we can solve any problems with all of these new tools so we'll be back in five all right we are back so let's take a look underneath the hood so to speak of a computer because as fancy as these devices are and as powerful as they seem they're relatively simple in their capabilities and what they can actually do and let's reveal as much by way of last week's discussion of type so recall that c supports different data types so we saw char and string and int and so forth so to recap we had all of these well it turns out that each of these data types is defined on a typical computer system is taking up a fixed amount of space and it depends on the computer whether it's mac or pc or old or new just how much space is used typically by these data types but on cs50 ide the sizes of all of these types are as follows a bool true or false uses just one bite now that's actually a little wasteful because one bite is eight bits and gosh like for a bull you should only need one bit you can't work at the single bit level easily in c um and so we just typically spend one whole bite on a bull char is going to be one bite as well and that might sound familiar because last week when we talked about ascii we proposed that the total number of possible characters you can represent with a char was 256 because of eight bits and two to the eighth power so one char is one byte and that's fixed in c no matter what then there were all of these other data types there was float which is a real number with a decimal point that happens to use four bytes a double is also a real number with a decimal point but it uses eight bytes which gives you even more precision you can have more significant digits after the decimal point for instance ints we've used a bunch those are four bytes typically a long is twice as big and that just allows you to represent an even bigger number and some of you might have done that exactly on credit when storing a whole credit card number strings for now are a variable number of bytes it could be a short string of text a long string of text a whole paragraph so that's going to vary so we'll come back to this notion of string next time but today focus on just these primitive types if you will and here is a picture of what is inside of your computer so this is a piece of memory or ram random access memory and it might be a little smaller it might be a little bigger depending on whether it's a laptop or desktop or phone or the like but it's in memory or ram that programs are stored while they're running and it's where files are stored when they are open so typically if you save install programs or save files those are saved on what's generally called your hard drive or hard disk or solid state disk or cd or some other physical medium and that the update of which is that they don't require electricity to store your data long term ram is different it's volatile so to speak but it's much faster than a hard disk or a solid state disk even it's much faster because it's purely electronic and indeed there are no moving parts it's purely electronic as pictured here and so with ram you have the ability to open files and run programs more quickly because when you double-click a program to run it or you open a file in order to view or edit it it's stored temporarily in ram and long story short if your battery laptops if your laptop battery has ever died or your computer's gotten unplugged or your phone dies the reason that you and i tend to lose data the paragraph that you just wrote in the essay that you hadn't yet saved is because ram memory is volatile that is it requires electricity to continue powering it but for our purposes we're only going to focus on ram not so much long-term disk space yet because when you're running a program in c it is indeed by definition running in your computer's memory but the funny thing about something as simple as this picture is that each of these black rectangles is kind of a chip and in those chips are stored all of the zeros and ones the little switches that we alluded to in week zero so let's focus it on and just zoom in on just one of these chips now it stands to reason that i don't know how big this this stick of ram is maybe it's one gigabyte a billion bytes maybe it's four gigabytes maybe it's even smaller or bigger there's some number of bytes represented physically by this hardware so if we zoom in further let me propose that all right i don't know how many bytes are here but if there's some number of bytes whether it's a billion or two billion or fewer or more it stands to reason that we could just number all of these bytes we could sort of think of this physical device this memory as just being a grid top to bottom left to right and each of the squares i've just overlaid on this physical device might represent an individual byte and again in reality maybe there's more of them maybe there's fewer of them but it stands for reason no matter how many there are we can think of each of these as having a location like this is the first byte second byte third byte and so forth well what does it mean then for a char to take up one byte that means that if your computer's memory is running a program maybe that you wrote or i wrote that's using a char variable somewhere in it the char you're storing in that variable may very well be stored in the top left-hand corner physically of this piece of ram maybe it's there maybe it's elsewhere but it's just one physical square if you're storing something like an int which takes up four bytes well that frankly might take up all four squares along the top there or somewhere else if you're using a long that's going to take up twice as much space so representing an even bigger number in your computer's memory is going to require that you use all of the zeros and ones comprising these uh eight bytes instead but let's now move away from physical hardware let's abstract it away if you will and just now start to think of our memory as just this grid and technically it's not a two dimensional structure i could just as easily draw all of these bytes from left to right i could just fit fewer of them on the screen so we'll take the physical metaphor a bit further and just think of our computer's memory as this grid this grid of bytes and those bytes are each eight bits those bits are just zeros and ones so what we've really done is zoom in metaphorically on our computer's memory to start thinking about where things are going to end up in memory when you double click on a program on your mac or pc or in cs50 ide when you do dot slash hello or dot slash buggy zero or buggy one it's these bytes in your computer's memory that are filled with all of your variables values so let's consider an example here suppose i had written some code that involved declaring three scores maybe it's a class that's got like three tests and you want to average the use uh the student's score acros the student's grade across all three of those tests well let's go ahead and write a quick program that does exactly this in cs50 ide i'm going to create a program called scores.c and in scores.c i'm going to go ahead and include standardio.h i'm going to then do my int main void as usual and then inside of here i'm going to keep it very simple i'm going to give myself one int called one and just to be a little uh playful i'm going to set it equal to 72 like last week i'm going to give myself a second score and set it equal to 73 and then a third score call whose value is going to be 33 and then let me go ahead and print out the average of those three values by plugging in a placeholder for a floating point value right if you do if you add three integers together and divide them by three i may very well get a fraction or a real number with the decimal point so i'm going to use percent f instead of percent i because i don't want to truncate someone's grade otherwise if they have like a 99.9 they're not being rounded up to 100 they're going to get the 99 because of truncation as we discussed last week so how do i do now the math of an average well it's pretty straightforward score one plus score two plus score three in parentheses just like in math divided by three semicolon let me save that file let me do make scores at the bottom again we're not going to use clang manually no need to because it's a lot easier to run make but i did mess up here format specifies type double but the argument has type in so i don't quite understand that but it's drawing my attention to the percent f and the fact that my math looks like this so any thoughts here i don't think printf is going to help me here because i'm the bug is within the printf line i don't think that debug 50 is going to really help me here because i already know what line of code the bug is in this feels like an opportunity to like talk to the physical duck or some other inanimate object or we can perhaps think about what errors we ran into even last week arpan what do you think i think it's because all right it's telling you this because it's it's receiving a all the values are in integer type but you are telling it to be in floyd indeed so score one score two score three are all integers and the number three is literally an integer and so this time the compiler is smart enough to realize wait a minute you're trying to coerce an integer result into a floating point value but you haven't done any floating point arithmetic if you will so you know what there's a few ways to fix this last week recall we proposed that you could use a cast and you could explicitly cast one or more of those values to a float so i could do this for instance or i could cast all of these to floats or one of these to floats there's many different possibilities but frankly the simplest fix is just to divide for instance by 3.0 i can avoid some of the the headaches of casting from one to another by just making sure that there's at least one floating point value involved in this arithmetic so now let me recompile scores this time it compiles okay let me do dot slash scores and voila my average isn't so high 59.3333 right so what is actually going on inside of the computer in irrespective of the floating point arithmetic which was again a topic of last week well let's consider these three variables score one score two score three where are they actually being stored in the computer's memory well let's consider that grid again and again i'm going to start at top left for convenience but technically speaking and we'll see this down the road your computer's memory is just like this big canvas and values can end up in all different places but for today we'll keep it clean the first variable score one i claim is going to be here top left for simplicity but what's important about where score one that is 72 is being stored is it's come taking up four of these boxes each of these boxes recall represents one byte and an integer recall and cs50 ide is four bytes therefore i have used four bytes of space to represent the number 72. the number 73 in score 2 similarly it's gonna take up four boxes as is score three gonna take up four boxes as well but what's really going on underneath the hood here well if each of these squares represents a byte and each of those bytes is eight bits and a bit is just a zero or one what's really going underneath the hood is something like this somehow this electronic memory is storing electricity in just the right way so that it's storing this pattern of zeros and ones aka 72 and decimal this pattern of zeros and ones aka 73 in decimal this pattern of zeros and ones aka 33 in decimal but again we don't have to keep thinking about or dwelling on the binary level but this is only to say that everything we've discussed thus far is coming together now in this one picture because the computer is just storing these patterns for us and we are allocating space now thanks to our programming language via code like this but this code correct though it may be indeed 59.33333 and so forth was my average if my test scores were 72 73 and 33 but i feel like there's an opportunity for better design here so not just correctness not just style recall that design is this other metric of code quality and it's a little more subjective and it's a little more subject to debate among reasonable people but i don't really love what i was doing with this naming scheme and in fact if we look at the code there really wasn't much more to my program than these three lines i worry this program isn't particularly well designed what rubs you the wrong way perhaps about those three lines of code what could be better and even if you don't know the solution especially if you've never programmed before what kind of smells about those three lines this is actually a term of our code smell it's like something not loving that for some reason if you can't put your finger on it it's not the best design the code smells what's smelly if you will about score one score two score three ryan what do you think if you're doing an average calculation you don't need to add them up all together in the code you can add them up beforehand and store it as one variable absolutely if i'm computing the average i don't need to keep all three around i can just keep a sum and divide the whole sum by the total number i like that that instinct what else might you not like about the design of this code now score one score two score three score one score two score three might there be opportunities still for improvement i feel like anytime you start to see this repetition maybe andrew your thoughts not hardcode the three scores together okay so not hard code the three scores and what would you do instead maybe take an input or i would yeah i wouldn't i wouldn't write out the scores themselves yeah another good instinct it's kind of stupid that i've written a program compiled a program that only computes the average for some student who literally got those three test scores and no others like there's no dynamism here moreover it's a little lazy too that i called my variable score one score two score three i mean where does it end after that if i want to have a fourth test next semester now i have to go and have score four if i've got a fifth score five that starts to sort of be reminiscent of last week's copy paste which really wasn't the best practice and so let me propose that we clean this up and it turns out we can clean up this up by way of another topic another feature of c that's also present in other languages known as arrays and if you happen to use something called a list in scratch very similar in spirit to scratches lists but we didn't see those in in lecture that first week an array in c as in other languages is a sequence of values stored in memory back to back to back a sequence of contiguous values so to speak back to back to back so in that sense it's like a list of values from left to right if we use the metaphor of the picture we've been drawing so how might this be germaine here well it turns out that if you want to store a whole bunch of values but they're all kind of interrelated like they're all scores you don't have to resort to this sort of lazy score one score two score three score four score five at the score 99 depending on how many scores there are why don't you just call all of those numbers scores but use a slightly different syntax and that syntax gives you access to what are called arrays so the syntax here on the screen is an example of declaring space for three integers all at once and collectively referring to all of them as the word scores so there's no more scores one two and three all three of those scores are in a variable called scores and what's new here is the square brackets inside of which is a number that literally connotes how many integers do you want to store under the name scores so what does this allow me to do it allows me still to define three integers in that array so this array is going to be a chunk of memory back to back to back that i can put values in and the way i put those values is going to look syntactically like this i still use numbers but now i'm using a new notation and it's similar to what i resorted to before but it's a little more generalized now and dynamic now if i want to update the very first score in that array i literally write the name of the variable scores bracket 0 close bracket and then assign it the value if i want to get at the second score i do scores bracket one if i want the third score it scores bracket two and the only thing that's a little weird and a little uh takes some getting used to is the fact that we are zero indexing our arrays so in past examples like for loops and while loops i've sort of said it's a convention in programming to start counting from zero when it comes to arrays which are contiguous sequences of values in a computer's memory they have to start at zero so otherwise if you don't start counting at zero you're literally going to be wasting space by overlooking one value so now if we were to rename things on the screen instead of calling these four these three rectangles score one score two score three they're all called scores but if you wanna refer specifically to the first one you use this fancy bracket notation and the second one this bracket notation and the third one this bracket notation but notice the dichotomy when declaring the array when creating the array saying give me three ins you use bracket three where bracket three is the total number of values when you index into the array that is when you go to a specific location in that chunk of memory you similarly use numbers but now those are referring to their relative positions position 0 position 1 position 2. this is the total number of spaces this is the specific space first second and third all right so pictorially nothing has changed just our nomenclature really has so let me go ahead and start to improve this program taking in the advice that was offered too on how we can improve the design and get rid of the sort of the smelliness of it let me take the first uh let me take the easiest easiest of these approaches first by just getting rid of these three separate variables and instead giving me one variable called scores of size three an array of size three and then i don't need to declare score one score two again that's all going away that's all going away that's all going away now if i want to initialize that array with these three values i say scores bracket zero and down here i say scores bracket one and down here i say scores bracket two so i've added one line of code but notice the dynamism now if i want to have a fourth one i can just allocate here and then put in the value with another line of code or five or six or seven or eight i don't have to start copying and pasting all these different variable names by convention but i think if we take some of the advice that was offered a moment ago we can also clean this up by way of a loop or such as well so let's do that let me go ahead and give myself actually first the cs50 library so that i can use get int and let's take this first piece of advice which is let's start asking for a score using getint and i'm going to do this three times and i don't yeah i'm getting a little lazy i'm getting a little bored already so i'm going to copy paste and again that does not bode well in general when copying and pasting we can probably do better still but now i think i need to change just one more thing here when doing the math i want score zero plus scores one plus scores two but before i solve this problem here the logic is still the same but i'm now taking in dynamically three integers there's still a smell to it as well it's still not as well designed and so just to make clear what could i do be doing better now how could i clean up this code and make it not just correct not just well styled but better designed what remains here nina what do you think the code is like specific for only three scores so you could like ask an input toward like how many scores it wants at the very beginning and then so like having score bracket zero score vector one you could use a for loop that goes through from zero to um n minus one or like less than n that will like ask and it should be like one line of code instead yeah really good it's the fact that we have get in get in get in that's like the first sign that you're probably doing something sub optimally it might be correct but it's probably not well designed because i did literally resort to copy paste there's sort of a pattern here that i could certainly integrate into something like a loop so let me do that let me actually get rid of two of these lines of code let me go up here and do something like for int i get zero i less than three for now i plus plus let me open up this for loop let me indent that remaining line of code and instead of scores bracket zero this is where arrays get really powerful you can use a variable to index into an array that is to go to a specific location what do i want to use for my variable well i would think i here so now i've whittled my lines of code down from all three triplicate three nearly identical lines into just one really inside of a loop that's going to do the same thing for me again and again and as nina proposed too i don't have to hard code these threes all over the place maybe i could do something like this i could say something like in total gets get int and i might ask total number of scores and i could literally ask the human from the get-go how many total scores are there then i can even more powerfully use this variable total in multiple places so that now i'm doing my math much more dynamically this though i'm afraid nina this broke a bit i'm going to be a little more i need to exert a little more effort here on line 14 because now i can't hard code score 0 1 and 2 because if the number the total number of scores is more than that i need to do more addition if it's fewer than that i need to do less edition so i think we've introduced a bug but we can fix that but let me propose for just a moment let's not make it dynamic because i worry that's just made my life harder let's at least introduce one other features here first i'm going to go ahead up here and define a new feature of c today which is known as a constant if i know in advance that i want to declare a number that i want to use again and again and again without copying and pasting literally that number 3 i can give myself a constant int via const int say total equals three this declares what's called a constant in programming which is a feature of many languages whereby you declare a variable of sorts whose value can never change once you set it you cannot change it and that's a good thing because one it shouldn't change in the context of this program and two just in case you the human are fallible you don't want to accidentally change it when you don't intend so this is a feature of a programming language that sort of protects you from yourself so now i can sort of take an amalgam of my instincts and neenahs and use this variable total and actually another convention when declaring constants is to capitalize them just to make visually clear that there's something different or special about this variable so i'm going to change this to total and i'm going to use that value here and here and also down here but i'm afraid both nina and i have a little bit of cleanup here to do in that i still have hard-coded scores zero scores one and scores two and i want to add a changing number of values together so you know what i've got an idea let me go ahead and create a function that's going to compute an average for me so if i want to create my own function that computes an average i want it to return a floating point value just so that we don't rent a truncate any math i'm going to call this average and the input to this function is going to be the length of an array and the actual array and this is the last piece of funky syntax for now it turns out that when you want to pass an array as input to a custom function you literally use those square brackets again but you don't specify the size and the upside of this is that your function then can support an array that's got one space in it two spaces three a hundred it's more dynamic this way so how do i compute an average here i can do this a few different ways but i think what was suggested earlier makes sense where i can do some kind of summation so let me do in sum equals 0 because how do you compute the average of a bunch of numbers well you add them all together and you divide by the total well let's see how i might do that let me do for int i get 0 i less than what should this be well if i'm being passed as this custom function the length of the array and the actual array i think i can iterate from i up to length and then i plus plus on each iteration and then on each iteration i think i want to do sum plus whatever is in the array's ith location so to speak so again this is shorthand notation per last week for this sum equals whatever sum is plus whatever is in location i of the array and once i've done all of that i think what i can do is return the total sum divided by the length of the array and what i like about this whole approach assuming my code's correct and i don't think it is just yet notice what i can do back up in main now i can abstract away the notion of calculating an average and just do something like this uh with this line of code here so what did i just do a lot's going on but let's focus for a moment on line 14 here on line 14 i'm still just printing the average of some floating point placeholder but what i'm passing as input now is this function average whose inputs are going to be total which again is just this constant at the very top oh sorry i goofed i should have capitalized it which is just that constant at the very top and i'm passing in scores which again is just this array of all of those scores meanwhile in the function in the context of the function notice that the names of the inputs to a function do not need to match the names of the variables being passed into that function so even though in main they're called total and scores in the context of my function average i can call them x and y a and b or more generically length in array i don't know what the array is but it's an array of ints and i don't know how long it is but that value that answer is going to be in length but there's still a bug here there's still a bug and if we ignore maine for a moment this is a subtle one does anyone see a mistake that i've made probably for the third time now over the past two weeks what mistake subtle have i made here with my code only in this average function this one's a little more subtle but the goal is to compute the average of a whole bunch of integers and return the answer nicholas you've declared the uh the variable within the function i've declared the variable within the function that's okay because i've declared my variable sum here i think you mean but that's inside of the average function and i'm using some inside of the outermost curly braces that it that was defined so that's okay that's okay let's take another thought here uh olivia where might the bug still be the return types of flow but you're returning an n divided by an n perfect so i again made that same stupid mistake that's just going to get more obvious as time goes on that if i want to do floating point arithmetic just like the aryan rocket discussion the patriot missile like these kinds of details matter in a program now it's correct because i'm actually going to ensure that even though the uh the context here is much less important than those real world contexts just computing some average of scores i'm not going to accidentally truncate any of my values so again in the context here of this function average is just applying some of last week's principles i've got a variable i've got a loop and i'm doing some floating point arithmetic ultimately and i'm now creating a function that takes two inputs one is length and one is the length one is the array itself and the return type as olivia notes is a float so that my output is also well defined but what's nice about this again you can think of these functions as abstractions now i don't need to worry about how i calculate an average because i now have this helper function a custom function i wrote that can help me answer that question and here notice that the output of this average function will become an input into printf and the only other feature i've added to the mix here now are not only arrays which allow us to create multiple variables a variable number of variables if you will but also this notion of a constant if i find myself using the same number again and again and again this constant can help me keep my code clean and notice this if next year maybe another semester there's four scores or four tests i change it in one place i recompile boom i'm done a well-designed program does not require that you go reading through the entirety of it fixing numbers here and numbers there changing it in one place can allow me to improve this program make it support four tests next year instead of just the three but better still would be uh to take i think nina's advice before which was to maybe just use get int and ask the human for how many tests they actually have that too would work well let me pause here to see if there's any questions then about arrays or about constants or passing them around as inputs and outputs in this way uh yeah over to uh sophia i had a question about the use of a float and why like the use of one flow causes like the whole output to be a float why does that occur yeah really good question that's just how c behaves so long as there is one or more floating point values involved in a mathematical formula it is going to use that data type which is the more powerful one if you will rather than risk truncating anything so you just need one float to be participating in the the formula in question good question other questions on arrays or constance or this passing around of them yeah uh over to alexandra i have a question about the declaring of the the array scores when you declared it in maine you said int scores and in the brackets you have total can you declare it without the total really good questions short answer no so the way i did it is the way you do have to do it and in fact if i highlight what i did here now it currently says total if i get rid of that and i go back to our first version where i said something like 3 and 3 and three uh three over here uh you cannot do this which i think alexander is what you were proposing the computer needs to know how big the array is when you are creating it the exception to that is that when you're passing an array from one function to another you do not need to tell that custom function how big the array is because again you don't know in advance you're writing a fairly generic dynamic function whose purpose in life is to take any any array as input of integers and any length and respond accordingly with an average that matches the size of that thing and those of you as an aside who have programmed before especially in java unlike in java and certain other languages the length of an array is not built into the array itself if you do not pass in the length of an array to another function there is no way to determine how big the array is this is different from java and other languages where you can ask the array in some sense what is its length in c you have to pass both the array itself and its length around separately cena i just i'm still a little bit confused about how when we write that second command when does it go when is it when does it void in the parentheses and when do we uh define the ends because like as i remember when we did the get a negative number or get a positive number it was void but we still kind of gave it an input i'm just not completely sold on that sure good question let me go ahead and open up that previous example which was a little buggy but it has the right syntax here so here was the get negative in function from before and as soon as you know it was void as input so there was one comment you made where it still took input that was not so so get negative in did not take any input and case in point if we scroll up to main notice that when i called it on line 10 i said get a negative in open parenthesis close parenthesis with no inputs inside of those parentheses so this keyword void which we've seen a few times now last week and this week is just an explicit keyword in c that says do not put anything here which is to say it would be incorrect for me up here to do something like this like to pass in a number or to pass in a prompt or anything inside of those parentheses the fact that this function get negative in takes void as its input means it does not take any inputs whatsoever that's fine forget negative in the name of the function says it all like there's no need to parameterize or customize the behavior of get negative int itself you just want to get a negative int by contrast though with the function we just wrote average this function does make conceptual sense to take inputs because you can't just say give me the average like average of what like it needs to take input so it's to answer that question for you and the input in this case is the array itself of numbers and the length of that array so you can do the arithmetic and so cena hopefully that helps make the distinction you use void when you don't want to take input and you actually specify a comma separated list of arguments when you do want to take input all right so we focused up until now on integers really but let's let's simplify a little bit because it turns out that arrays and memory actually intersect to create some very familiar features of most any computer program namely text or strings more generally so suppose we we simplify further no more integers no more arrays of integers let's just start for a moment with a single character and write a program that just creates like a single brick from like that mario game let me go ahead and create a program here called brick dot c and in brick dot c i'm just going to include standardio.h int main void and more on this void a little later today char c gets quote on quote hash and then down here let me just go ahead and print very simply a placeholder percent c backslash n and then output c so this is a pretty stupid program its sole purpose in life is to print a single hash as you might have in a mario pyramid of height one so very simple let me go ahead and make brick it seems to compile okay let me run it with dot slash brick and voila we get a single brick but let's consider for just a moment exactly what just happened here and what actually was going on underneath the hood well you know what i'm kind of curious i remember from last week we could cast values from one thing to another what if i got a little curious and i didn't print out c which is this hash character as percent c which is a placeholder for a character what if i got a little crazy and said percent i i think i could probably coerce this char by casting it to an end so i can see its decimal equivalent i could see its actual ascii code so let me rebuild this with make brick now let me do dot slash brick and what number might we see last week we saw 72 a lot 73 and 33 for high this week you can see 35 it turns out is the code for an uh an ascii hash and you can see this for instance if i go to a website like let's go to ascii chart.com and sure enough if i go to the same chart from last week and i look for the hash symbol here its ascii code is 35. and it turns out in c if it's pretty straightforward to the computer that yes if this is a character i know i can convert it to an end you don't have to explicitly cast it you can instead implicitly cast one data type to another just from context here so printf and c are smart enough here to know okay you you're giving me a character in the form of variable c but you want to display it as an i uh a percent i an integer that's going to be okay and indeed i still see the number 35 so that's just simple casting but let's now put this into the context of today's picture how is that character laid out well quite simply if this is my memory again and we've gotten rid of all of the numbers c otherwise storing this hash is just storing is just being stored in one of these bytes it only requires one square because again a char is a single byte but equivalently 35 is the number that's actually being stored there but i wonder i wonder last week we spent quite a bit of time storing not just single characters but actual words like hi and other expressions and so what if i were to do something like this let me go back to my code and let me not quite yet practice what i just preached and let me give myself three variables this time c1 c2 and c3 and let me deliberately store in those three variables h i in all caps followed by an exclamation point and per last week when you're dealing with individual characters you must in c use single quotes when you're dealing with multiple characters otherwise known last week as strings use double quotes but that's why i'm using single quotes because we're only playing at the moment with single characters now let me go ahead and print these values out let me print out percent c percent c percent c and i'll put c1 c2 c3 so this is perhaps the stupidest way you could print out a full word like hi exclamation point in c by storing every single character in its own variable but so be it i'm just using these first primitive these first principles here i'm using percent c as my placeholder i'm printing out these characters so let me do make brick now compile is okay and if i do dot slash you know i really should have renamed this file but we'll rename it in a moment dot slash brick hi and let me go ahead and do this let me go ahead now and actually close the file and recall from last week if i want to rename my file from brick.c let's say to high.c i can use the move command mv and now if i open up this file sure enough there's high.c and i've fixed my renaming mistake all right so again if i now do make high and i do dot slash high voila i see the high but again this is kind of a stupid way of implementing a string but let's still look underneath the hood let me go ahead and get curious let me print out percent i percent i and percent i and let me include spaces this time just so i can see separation between the numbers let me make high again dot slash hi okay there's that 72 there's that 73 and there's that 33 from last week so that's interesting too so what's going on underneath the hood in the computer's memory well when i'm storing these three characters now i'm just storing them in three different boxes so c1 c2 c3 and when you look at it collectively it kind of looks like a whole word even though it's of course just these individual characters so what's underneath the hood of course though is 72 73 33 or equivalently in binary just this so the story is the same even though we're now talking about chars instead of integers but what happens when i do this what happens when i do string s gets quote unquote high using double quotes well let's change this program accordingly let me go ahead and do what we would have done last week string i'll call it s just for s for string hi in all caps i can simplify this next line let me use percent s as a placeholder for a string s but let's for now reveal what a string really is because string is a term of art every programming language has strings even if it doesn't technically have a data type called string c does not technically have a data type called string we have added this type to c by way of cs50s library but now if i do make hi notice that my code compiles okay and if i do dot slash hi enter voila i still see hi which is what i would have seen last week as well and if we depict this in the computer's memory because hi is three letters it's kind of like saying well give me three boxes and let me call this string s so this feels like a reasonable artist rendition of what s is if it's storing a three-letter word like hi but any time we have sequences of characters like this i feel like we're now seeing the capability of a proper programming language we introduced a little bit ago the notion of a string so maybe could someone redefine string as we've been using it in terms of some of today's nomenclature like what is a string there's an example of one hi taking up three boxes but how did we cs50 maybe implement string underneath the hood would you say what is it tucker uh well it's an array of characters and integers well if integers are used in the string but it's an array of basically single characters perfect if we now have the ability to express very nicely done tucker if we now have the ability to represent sequences of things integers for instance like scores well it stands to reason that we can take another primitive a very basic data type like a char and if we want to spell things with those chars like english words well let's just think of a string really as an array of characters an array of chars and indeed that's exactly what a string actually is so this thing here high exclamation point technically speaking is an array called s and this is s bracket zero this is s bracket one this is s bracket two it's just an array called s now we didn't use the word array last week because it's not as familiar as the notion of like a string of text for instance but a string is apparently just an array and if it's an array that means we can access if we want to the individual characters of that array by way of the square bracket notation from today but it turns out there's something a little special about strings as they're implemented recall in our example involving scores the only way we knew how long that array was was because i had a second variable called length that's or total that stored the total number of integers in that array that is to say in our scores example not only did we allocate the array itself we also kept track of how many things were in that array with two variables however up until now every time you and i have used the printf function and we have passed to that printf function a string like s we have only provided printf with the string itself or logically we have only provided printf with the array of characters itself and yet somehow printf is magically figuring out how long the string is after all when printf prints the value of s it is printing h i exclamation point and that's it it's not going and printing four characters or five or twenty right it stands to reason that there's other stuff in your computer's memory if you've got other variables or other programs running yet printf seems to be smart enough to know given an array how long the array is because quite simply it only prints out that single word so how then does a computer know where a string ends in memory if all a string is as a sequence of characters well it turns out that if your string is length three as is this one h i exclamation point technically a string implemented underneath the hood uses four bytes it uses four bytes it uses a fourth byte to be initialized to what we would describe as backslash zero which is a weird way of describing it but this just represents a special character otherwise known as the null character which is just a special value that represents the end of a string so that is to say when you still create a string quote unquote with double quotes h i exclamation point yes the string is length three but you're wasting or spending four total bytes on it why because this is a clue to the computer as to where high ends and where the next string maybe begins it is not sufficient to just start printing characters inside of printf one at a time left to right there needs to be the sort of equivalent of a stop sign at the end of the string saying that's it for this string well what are these values well let's convert them back to decimal 72 73 33 that fancy backslash zero was just a way of saying in character form it's zero more specifically it is eight zero bits inside of that square so to store a string the computer unbeknownst to you has been using one extra byte all zero bits otherwise written as backslash zero but otherwise known as literally the value zero so this thing otherwise colloquially known as null is just a special character and we can actually see it again if i go back to my ascii chart.com from before notice number zero is known as null n-u-l in all caps all right so with that said what is powerful then about strings once we have this capability well let me go ahead and do this let me go back into my code from a moment ago and let me go ahead and enhance this program a little bit just to get a little curious as to what's going on you know what i can do i bet what i can do here in this version here is this you know what if i want to print out all of these characters of s i can get a little curious again and print out percent c percent c percent c and if s is an array per today's syntax i can technically do s bracket 0 s bracket 1 s bracket 2 and then if i save this recompile my code with make high okay dot slash hi i still see hi but you know what let me get a little more curious let me use percent i so i can actually see those ascii codes let me go ahead and recompile with make high dot slash high there's the 7273 33 now let me get even more curious let me print a fourth value like this here s bracket three which is the fourth location mind you so if i now do make high and dot slash high voila now you see zero and what this hints at is actually a very dangerous feature of c you know suppose i'm curious at seeing what's beyond that i could technically do s bracket four the fifth location even though according to my picture there really shouldn't be anything at the fifth location at least not that i know about just yet but i can do it in c nothing's stopping me so let me do make high dot slash high and that's interesting apparently there's the number 37 what is the number 37 well let me go back to my ascii chart and let me conclude that number 37 is a percent sign so that's kind of weird because i didn't print out an explicit percent now i'm kind of poking around the computer's memory in places i shouldn't be looking in some sense in fact if i get really curious let's look not at location 4 how about location 40 like way off into that picture make high dot slash high 24 whatever that is i can look at location 400 recompile my code make hi dot slash hi and now it's zero again so this is what's both powerful and also dangerous about c you can touch look at change any memory you want you're essentially just on the honor system not to touch memory that doesn't belong to you and invariably especially next week are we going to start accidentally touching memory that doesn't belong to you and you'll see that it actually can cause computer programs to crash including programs on your own mac and pc yet another source of common bugs but now that we have this ability to store different strings or to think about strings as arrays well let's go ahead and consider how you might have multiple strings in a program so for instance if you were to store two strings in a program let's call them s and t respectively another programmer convention if you need two strings call the first one s then the second one t maybe i'm storing high then buy well what's the computer's memory going to look like well let's go let's do some digging high as before is going to be stored here so this whole thing refers to s and it's taking four bytes because the last one is that special null character that just is the stop sign that demarcates the end of the string by meanwhile is going to take up another b y e exclamation point five bytes because i need a fifth byte to represent another null character and this one deliberately wraps around though again this is just an artist's rendition there's not necessarily a grid in reality by exclamation point backslash zero now represents t so this is to say if i had a program like this where i had high and then buy and i started poking around the computer's memory just using the square bracket notation i bet i could start accessing the value of b or y or e just by looking a little past the string s so again as complicated as our programs get all that's going on underneath the hood is to just plop things down in memory in locations like these and so now that we have this ability or maybe this mental model for what's going on inside of a computer we can consider some of the features that you might want to now use in programs that you write so let me go ahead here and whip up a quick program for instance that goes ahead and let's say print out the total length of a string let me go ahead and do this i'm going to go ahead and create a new program here in cs50s ide and i'm going to call this one string dot c and i'm going to very quickly at the top include as usual cs50.h and i'm going to go ahead and include standardio.h and i'm going to give myself int main void and then in here i'm going to get myself a string so string s equals getstring and let me just ask the human for some input whatever it is then let me go ahead and print out literally the word output just so that i can actually see the result and then down here let me go ahead and print out that string for int i gets 0 i is less than huh i don't know what the length of the string is yet so let me just put a question mark there which is not valid code but we'll come back to this i plus plus and then inside of the loop i want to go ahead and print out every character one at a time by using my new array notation and then at the very end of this program i'm going to print a new line just to make sure the cursor is on its own line so this is a complete program that is now as of this week going to treat a string as an array ergo my syntax in line 10 that's using my new fancy square bracket notation but the only question i haven't answered yet is this how do i know when to stop printing the string how do i know when to stop well it turns out thus far when we're using for loops we've typically done something like just count from zero on up to some number this condition though is any boolean expression i just need to have a yes no or a true false answer so you know what i could do keep looping so long as character at location i and s does not equal backslash zero so this is now definitely some new syntax let me zoom in here but s bracket i just means the ith character in s or more specifically uh the character at position i in s bang equals so bang is how a programmer pronounces exclamation point because it's a little faster bang equals means does not equal so this is how you would do an equal sign with a slash through it in math it's in code exclamation point equal sign and then notice this funkiness backslash zero is again the null character but it's in single quotes because again it is by definition a character and for reasons we'll get into another time backslash zero is how you express it just like backslash n is kind of a weird escape character for the new line backslash zero is the character that is all zeros so this is kind of a different for loop i'm still starting at zero oh for i i'm still incrementing i as always but i'm now not checking for some pre-ordained length because just like a computer i do not know a priori where these strings end i only know that they end once i see backslash zero so when i now go down here and do make string it compiles okay dot slash string let me type in something like uh hello in all caps voila the output is hello again let me do it again buy in all caps and the output is bi so it's kind of a useless program and that it's just printing the same thing that i typed in but i'm conditionally using this boolean expression to decide whether or not to keep printing characters now thankfully c comes with a function that can answer this for me it turns out there is a function called stur lang so i can literally just say well figure out what the length of the string is the function is called stir lang for string length and it exists in a file called not surprisingly perhaps string.h string.h so now let me go ahead down here and do make string compiles okay dot slash string type in hello and it still works so this function sterling that does exist in a library via the header file string.h already exists someone else wrote it but how did they write it odds are they wrote the first version that i did by checking for that backslash zero but let me ask a subtle question here this program is correct it iterates over the whole length of the string and it prints out every character therein can anyone observe a poor design decision in this function this one's subtle but there's something i don't like about my for loop in particular and i'll isolate it to line nine i've not done something optimally on line nine there's an opportunity for better design any thoughts here on what i might do better uh yeah jonathan yeah to create basically another variable for the stream length and to sort of remember it yeah and why are you suggesting that if you want to use like a different value for the stream length or if it might fluctuate or change you want to just have a different variable as like a sort of placeholder value for it okay potentially but i will claim in this case that because the human has typed in the word once you type in the word it's not going to change but i think you're on the right direction you're going down the right direction because in this boolean expression here i less than the string length of s recall that this expression gets evaluated again and again and again every time through a for loop recall that you're constantly checking the condition the condition in this case is i less than the length of s the problem is that sterling in this case is a function which means there's some piece of code someone wrote probably similar to what i wrote a few minutes ago that you're constantly asking what's the length of the string what's the length of the string and recall from our picture the way you figure out the length of a string is you start at the beginning of the string and you keep checking am i at backslash zero okay am i at backslash zero okay so to figure out the length of high it's going to take me like one two three four steps right because i have to start at the beginning and i iterate from location zero on to the end to find out the length of buy it's going to take me like 5 steps because that's how long it's going to take me from left to right to find that backslash 0. so what i don't like about this line of code is why are you asking for the string length of s again and again and again and again it's not going to change in this context so jonathan's point is taken if we keep asking the user for more input but in this case we've only asked the human once so you know what let's take jonathan's advice and do like int n equals the string length of s and then maybe you know what we could do put n in this condition instead so now i'm asking the same question but i'm not foolishly inefficiently asking the same question again and again whereby same question requires a good amount of work to find the backslash zero again and again and again now there's some cleaning up we can do here too it turns out there's this other subtle feature of for loops if you want to initialize another variable to a value you can actually do this all at once and you can do so before the semicolon you can do comma n equals sterling of s and then you can use n just as i have here so it's not all that much better but it's a little cleaner in that now i've taken two lines of code and collapsed them into one they both have to be of the same data types but that's okay here because both i and n are so again the inefficiency here is that it was foolish before that i kept asking the same question again and again and again but now i'm asking the question once remembering it in a variable called n and only comparing i against that integer which does not actually change all right i know that too was a lot let's go ahead here and take a three-minute break just to stretch legs and whatnot in three minutes we'll come back and start to see applications now of all of these features ultimately so some problems that are going to lie ahead this week on the readability of language and also on cryptography so we'll see you in three minutes all right so we are back and this has been a whole bunch of low level details admittedly and where we're going with this ultimately this week and beyond is applications of some of these building blocks and one of those applications this coming week in the next problem set is going to be that of cryptography the art of scrambling or encrypting information and if you're trying to encrypt information like messages well those messages might very well be written in english or in ascii if you will and you might want to convert some of those ascii characters from one thing to another so that if your message is intercepted by some third party they can't actually decipher or figure out what it is that you've sent so i feel like we're almost toward we're almost at the ability we're in code we can start to convert one word to another or to scramble our text but we do need a couple of more building blocks so recall that we left off with this picture here where we had two words in the computer's memory high and by both with exclamation points but also both with these backslash zeros that you and i do not put there explicitly they just happen for you anytime you use the double quotes and anytime you use the getstring function so once we have those in memory you can think of them as s and t respectively but a string s or t is just an array so again you can also refer to all of these individual characters or chars via the new square bracket notation of today s bracket 0 s1 s2 s3 and then t bracket 0 t bracket 1 2 3 and 4 and then whatever else is in the computer's memory but you know what you can even do is this suppose that instead we wanted to have an array of words so before we had an array of scores an array of integers but now suppose we wanted in the context of some other program to have an array of words you can totally do that there's nothing stopping you from having an array of words and the syntax is going to be identical notice if i want an array of called words that has room for two strings i literally just say string words bracket two this means hey computer give me an array of size two each of whose members is going to be a string how do i populate that array same as before with the scores words bracket zero gets quote unquote high words bracket one gets quote unquote by so that is to say with this code could we create a picture similar to the one previously but i'm not calling these strings s and t now i'm calling them both words at two different locations zero and one respectively so we could redraw that same picture like this now this word is technically named words bracket zero and this one is referred to by words bracket one but again what is a string a string is an array and yet here we have an array of strings so we kind of sort of have an array of arrays so we've got an array of words but a word is just a string and a string is an array of characters so what i really have on the board is an array of arrays and so here and this will be the last weird syntax for today you can actually have multiple square brackets back to back so if your variable is called words and that variable is an array if you want to get the first word in the array you do words bracket 0. once you're at that word hi and you want to get the first character in that word you can similarly do bracket zero so the first bracket refers to what word do you want in the array the second bracket refers to what character do you want in that word so now the hot the i is at words bracket zero bracket one the exclamation point is at words bracket zero bracket two and the null characters at words bracket zero bracket three meanwhile the b is at words bracket one bracket zero one one one two one three one four so it's almost kind of like a coordinate system if you will it's a two dimensional array or an array of arrays so this is only to say that if we wanted to think of arrays of strings as individual characters we can we have that expressiveness now too in code so what more can i do now that i can manipulate things at this level let me do a program that'll be pretty applicable i think with some of our upcoming programs as well let me call this one uppercase let me quickly write a program whose purpose in life is just to convert an input word to uppercase and let's see how we can do this so let me go ahead and include cs50.h let me go ahead and include standardio.h let me also include this time string dot h which is going to give us functions like string length and then let me do int main void and then let me go ahead here and get a string from the user like before so i'll just gonna ask the user for a string and i want them to give me whatever the string should be before i uppercase everything then i'm just gonna go ahead and print out literally after just so i can see what happens after i capitalize everything in the string and now let me go ahead and do this 4 int i get 0 i less than string length of s i plus plus wait a minute i made that mistake before let's not repeat this question let's give myself a second variable n gets string length of s i less than n i plus plus so again this is now becoming boilerplate anytime you want to iterate over all of the characters in the string this probably is a reasonable place to start and then let me ask the question i want to iterate over every character in the string that the human has typed in and i want to ask myself a question just as we've done with any algorithm specifically i want to ask if the current letter is lowercase let me somehow convert it to uppercase else let me just print it out on change so how can i express that using last week and this week's building blocks well let me say something like this if the character at a location i in s or if the ith character in s is greater than or equal to a lowercase a and the ith character in s is less than or equal to a lowercase z what do i want to do let me go ahead and print out a character but that character should be what s bracket i but i'm not sure what to do here yet but let me come back to that else let me go ahead and just print out that character unchanged s bracket i so minus the placeholder the question marks i've put i'm kind of all the way there line 10 initializes i to zero it's going to count all the way up to n where n is the length of the string and it's going to keep incrementing i so we've seen that before and again that's going to become muscle memory before long line 12 is a little new but it uses building blocks from last week in this this week we have the new square bracket notation to get the i character in the string s greater than or equal to less than or equal to we saw at least one of those last week that just means greater than or equal to less than or equal to i mentioned ampersand ampersand last week which is the logical and operator which means you can check one condition and another and the whole thing is true if both of those are true this is a bit weird today but if you want to express is the current character between lowercase a and capital and lowercase z totally fine to implicitly treat a and z as numbers which they really are because again if we come back to our favorite ascii chart you'll see again that lowercase a has a number associated with it 97. lowercase z has a number associated with it 122. so if i really want it to be pedantic i could go back into my code and do something like well if this is greater than or equal to 97 and it's less than or equal to 122 but like bad design like i'm never going to remember that lowercase z is 122. like no one's gonna know that it makes the code less obvious go ahead and write it in a way that's a little more friendly to humans like this but notice this question mark how do i fill in this blank well let me go back to the ascii chart this is subtle but this is kind of cool and humans were definitely thinking ahead notice that capital lowercase a is 97 capital a is 65. lowercase b is 98 capital b is 66. and notice these two numbers 67 to 90 65 to 97. 66 to 98 67 to 99. it would seem that no matter what letters we compare lowercase and uppercase they're always 32 apart and that's consistent we could do it all for all 26 english letters so if they're always 32 apart you know what i could do if i want to take a lowercase letter which is what i'm thinking about in line 14 i could just subtract off 32 in this case it's not the cleanest because again i'm probably going to forget that math at some point but at least mathematically i think that that'll do the trick because 97 will become 65 98 will become 66 which is forcing those characters to lower case but they're not being printed as numbers i'm still using percent c to coerce it to be a char so if i didn't mess any syntax up here let me make uppercase okay dot slash uppercase and let me go ahead and type in for instance uh my name in all lowercase and voila uppercase now it's a little ugly i forgot my backslash n so let me go ahead and add one of those real quick just to fix the cursor let me recompile the code with make uppercase let me rerun the program with dot slash uppercase and now type in my name david let me do it again with brian and notice that it's capitalizing everything character by character using only today's building blocks this is correct it's pretty well styled because everything's nicely indented it's very readable even though it might look a little cryptic at first glance but i think i can do better and i can do better by using yet another library and here's where c and really programming in general gets powerful the whole point of using popular languages is because so many other people before you have solved problems that you don't need to solve again and i'm sure over the past like 50 years someone has probably written a function that capitalizes letters for me i don't have to do this myself and indeed there is another library that i'm going to include by way of its header file in c type dot h which is the language c and a bunch of type related things and in c type dot h it turns out there's a function called there's a couple of functions specifically let me get rid of all of this code and let me call a function called is lower and past is lower s bracket i and is lower as you might guess its purpose in life is to return essentially a boolean value true or false if that character is lower and if so well let me go ahead and print out a placeholder followed by the capitalization of that letter now before i had to do that annoying math with minus 32 and figure it out to upper bracket of parenthesis s bracket i and now i can otherwise just print out that character unchanged just as before s bracket i but now notice my program honestly it's definitely a little shorter it's a little simpler in that there's just less code and hopefully if the person that wrote is lower and too upper did a good job i know it's correct i'm just standing on their shoulders and frankly my code's more readable because i understand what is lower means whereas that crazy ampersand ampersand syntax and all of the additional code that was just a lot harder to wrap your mind around arguably so now if i go ahead and compile this make uppercase okay that seemed to work well and now i'm going to go ahead and do dot slash uppercase and type in my name and all lowercase again david seems to work brian seems to work and i could do this all day long it seems to still work but you know what i don't think i have to be even this explicit you know what i bet if the human who wrote two upper was smart i bet i can just blindly pass in any character to two upper and it's only going to uppercase it if it can be converted to uppercase otherwise it will pass it through unchanged so you know what let me get rid of all of this stuff and really tighten this program up and print out a placeholder for c and then two upper of s bracket i and sure enough if you read the documentation for this function it will handle the case where it's either lowercase or not lowercase and it will do the right thing so now if i recompile my code make uppercase so far so good dot slash uppercase david again voila it still works and notice truly just how much tighter how much cleaner how much shorter my code is and it's more readable in the sense that this function is pretty well named to upper is what it's indeed called but there is an important detail here two upper expects as input a character you cannot pass a whole word to it it is still necessary at this point for me to be using this loop and doing it character by character now how would you know this well you'll see multiple examples of this over the weeks to come but if i go to what's called the manual pages for the language c we have our own web-based version of them and we'll link this for you in the courses labs and problem sets as needed you can see a list of all of the available functions in c at least that are frequently used in cs50 and if we uncheck a box at the top we can see even more functions there's dozens maybe hundreds of functions most of which we will not need or use in cs50 but this is going to be true in any language you sort of pick up the building blocks that you need over time so we'll refer you to these kinds of resources so that you don't rely only on what we show in section and lecture but you have at your disposal these other functions and toolkits as well and we'll do the same with python in sql and other languages as well and so those are what we call again manual pages all right a final feature before we even think about cryptography and scrambling information as for problem set two so a command line argument i mentioned by name before it's like a word you can type after a program's name in order to provide an input at the command line so make hello hello is a command line argument to the program hello r m space a dot out a dot out was an argument a command line argument to the program rm when i wanted to remove it so we've already seen command line arguments in action but we haven't actually written any programs that allow you to accept words or other inputs from the so-called command line up until now all of the input you and i have gotten in our programs comes from getstring getint and so forth we have never been able to look at words that the human might very well have typed at the prompt when running your program but that's all about to change now let me go ahead and create a program called argv dot c and this it'll become clear why in just a moment i'm going to go ahead and include shall we say standardio.h and then i'm going to give myself int main void and then i'm just going to very simply go back and change the void so just as our own custom functions can take inputs and we saw that with get negative int we saw that with average today so does main potentially take inputs up till now though we've been saying void and we told you to say void last week and we told you to say void in problem set one but now it turns out that c does allow you to put other inputs into main you can either say nope main does not take any command line arguments but if it does you can say literally int arg c and string argv with square brackets so it's a little cryptic and technically you don't have to type it precisely this way but human convention would you have you do it at least for now in this way this says that main your function main takes an integer as one input and not a string but an array of strings as input and arg c is shorthand notation for argument count argument count is an integer that's going to represent the number of words that your users type at the prompt argv is short for argument vector vector is a fancy way of saying list it is a variable that's going to store in an array all of the strings that a human types at the prompt after your own program's name so we can use this for instance as follows suppose that i want to let the user type their own name at the command prompt i don't want to use getstring i don't have to prompt the human later for their name i want them to be able to run my program and give me their name all at once just like make just like rm and clang and other programs we've seen so i'm going to do this if argc equals equals 2 so if the number of arguments to my program is two go ahead and print out hello percent s and plug in whatever is at arg v one so more on this in just a moment else if argc is not equal to two let's just go with last week's default hello world so what is this program's purpose in life if the human types two words at the prompt i want to say hello david hello brian hello so and so otherwise if they don't type two words at the prompt they're i'm just going to say the default hello world so let me compile this make argv and i didn't get it right here unknown type string unknown type string all right i goofed if i'm using string recall that now i need to start using the cs50 library and again we'll see all the more why in the coming weeks as we take those training wheels off but now i'm going to do this again make argv there we go now it works dot slash argv enter hello world that's pretty much equivalent to what we did last week but notice if i type in for instance rv1 david enter it says hello david if i type in rv brian it says that if i type in brian u it says hello world so what's going on well the way you write programs in c that accept zero or more command line arguments that is words at the prompt after your program's name is you change what we have been doing all this time from void to be this into argc string arc v with square brackets and what the computer is going to do for you automatically is it going to store in argc a number of the total number of words that the human typed in not just the arguments technically all of the words including your own program's name it's then going to fill this array of strings aka argv with all of the words the human typed at the prompt so not just the arguments like brian or david but also the name of your program so if the human typed in two total words which they did argv brian r v david then i want to print out hello followed by a placeholder and then whatever value is at r v one and i'm deliberately not doing zero if i did zero based on the verbal definition i just gave if i recompile this program i don't want to see this hello argv so the program's own name is automatically always stored for you at the first location in that array but if you want the first useful piece of information you actually would after recompiling the code here access it at bracket one and so in this way do we see an argv that we can actually access individual words but notice this too suppose i want to print out all of the individual characters in someone's input you know what i bet i could even do this let me go ahead and do this instead of just printing out hello let me do for int i gets zero n equals the string length of arg v whoops arg v one and then over here i'm gonna do i is less than n i plus plus all right so i'm going to iterate over all of the characters in the first real word in argv and what am i going to do well let me go ahead and print out a character that's at rv1 but at location i so i said a moment ago with our picture that we could think of an array of strings as really just being an array of arrays and so i can employ that syntax here by going into argv1 to get me the word like david or brian or so forth and then further index into it with more square brackets that get me the d the a the v the i the d and so forth and just to be super clear let me put a new line character there just so we can see explicitly what's going on and let me go ahead now and just delete this hello world because i don't want to see any hellos i just want to see the word the human typed in make argv whoops what did i do wrong oh i use sterling when i shouldn't have because i haven't included string dot h at the top okay now if i recompile this code and recompile make argv there we go dot slash argv david you'll see one character per line and if i do the same with brian's name or anyone's name and change it to brian i'm printing one character at a time so again i'm not sure why you would want to do that but in this case my goal simply was to not only iterate over the characters in that first word but print them out so again just by applying twice over this time this principle can we actually see that a program has access to the individual characters in each of these strings all right and one last explanation before we introduce the crypto and application thereof this thing here this thing here does anyone have any idea as to why maine last week and this week seems to return an int even though it's not an average function it's not a get positive in function it's not get negative in somehow for some reason main keeps returning an int even though we have never seen this int in action what might this mean this is the one last piece that we promised last week we would eventually explain what might this mean and this one's a tough one brian who do we have how about uh great is it usually the functions in the end uh have returned zero and that means that the function stops and that zeros is like is the integer that pops out of the main function yeah and this is one this one's subtle in that if you had programmed before odds are and i'm guessing you have grad you've seen this in use before we humans though in the real world of like using macs and pcs you've actually seen numbers integers in weird places frankly almost any time your computer freezes or you see an error message odds are you see in english or some spoken language in the error message but you very often see a numeric code for instance if you're having zoom trouble you'll often see the number five in the error window in zooms program and five just means you're having network issues so programmers often associate integers with things that can go wrong in a program and it's great grid notes they use zero to connote that nothing has gone wrong that all is well so let me write one final program here just called exit.c that puts this to the test let me go ahead and write a program in a file called exit dot c that's going to introduce what we're going to call an exit status this is a subtlety that will be useful as our programs get a little more complicated i'm going to go in here and do include cs50.h and i'm going to go ahead and include standardio.h and i'm going to give myself the longer version of main so int argc string argv with the square brackets and in here i'm going to say if argc does not equal to the human is not doing what i want them to and i'm going to yell at them in some way i'm going to say missing command line argument so any kind of error message that i want the human to see on the screen i'm just going to tell them with that message but i'm going to very subtly return the number one i'm going to return an error code and the human is not necessarily going to see this code but if we were to have a graphical user interface or some other feature to this program that would be the number they see in the error window that pops up just like zoom might show you the number five if something has gone wrong similarly if you've ever visited a web page frankly and you the web page doesn't exist you see the integer 404 that's not technically the exact same incarnation of this but it is representative of programmers using numbers to represent errors so that one you probably have seen here i'm going to go ahead though and by default say hello percent s just like before passing in whatever is in rgv1 so same program as before but i'm not going to do any of this lame like oh hello world if the human doesn't type in their name as i expect instead i am going to check did the human give me two words at the command line if not i'm going to print missing command line argument and then return this exit code otherwise if all is well i'm going to go ahead and return explicitly zero this is another number that the human you and i are never going to see but we could have access to it and frankly for course purposes uh check 50 can have access to this and graphical user interfaces when we get to those can have access to these values so zero is great notes is just all as well but one would mean that something goes wrong so let me go ahead and make exit which is kind of appropriate as we're wrapping up here and let me go ahead and do dot slash exit missing command line argument is what's displayed if i go ahead and say exit david now i see hello david or exits brian i'll see exit brian now this is not a technique you'll need to use often but you can actually see these return values if you want if i run exit and i see this error message i can very weirdly say echo dollar sign question mark which is a very admittedly cryptic way of saying what was my exit status and if you hit enter you'll see one by contrast if i run exit of david and i actually see hello david and i do echo dollar sign question mark now i will see zero so again this is not a technique you and i will use very frequently but it's a capability of a program and it's a capability of c that you do now have access to and so in writing programs moving forward what we will often do in labs and in problem sets and the like is ask you to return from maine either zero or one or maybe two or three or four based on the problems that might have gone wrong in your program that you have detected and responded to appropriately so it's a very effective way of handling errors in a standard way so that you know that you are being proactive about detecting mistakes so what kinds of mistakes might we handle this week and what kinds of problems might we solve well today was entirely about deconstructing what a string is last week it was just a sequence of text a chunk of text today it's now an array of characters and we have new syntax in c for accessing those characters we also today have access to more libraries more header files the documentation therefore so that we can actually solve problems without writing as much code ourselves we can use other people's code in the form of these libraries so one problem we will solve this coming week by way of problem set two is that of readability like when you're reading a book or an essay or a paper or anything what is it that makes it like a third grade reading level or a 12th grade reading level or university reading level well all of us probably have an intuitive sense right like if it's big font and short words it's probably y for younger kids and if it's really complicated words with big vocabulary and things we don't know maybe it's meant but for university audiences but we can quantify this a little more formulaically not necessarily the only way but we'll give you a few definitions so for instance here's a famous sentence mr and mrs dursley of number four privet drive we're proud to say that they were perfectly normal thank you very much and so forth well what is it about this text that puts harry potter at grade 7 reading level well it probably has to do with the vocabulary words but it probably has to do with the lengths of the sentences the amount of punctuation perhaps the total number of characters that you might count up you can imagine quantifying it just based generically on the look and the aesthetics of the text what about this in computational linguistics authorship attribution is the task of predicting the author of document of unknown authorship this task is generally performed by the analysis of stylometric features particular this is brian's senior thesis so this is not a seventh grade reading level this was actually rated at grade 16 so brian's pretty sophisticated when it comes to writing theses but there too you could perhaps glean from the sophistication of the sentences the length thereof and the words they're in there's something we could perhaps quantify so as to apply numbers and indeed that's one way you could assess the readability of a text even if you don't have access to a dictionary with which to figure out which are the actual big or small words and what about cryptography so it's incredibly common these days and so important these days for you and i to use cryptography not necessarily uh using algorithms we ourselves come up with but rather using software like whatsapp and signal and telegram and messenger and others that support encryption between you and the third party your friend or family or at least minimally the website with which you're interacting so cryptography is the art of scrambling information or hiding information and if that information is text well frankly as of this third week of cs50 we already have the requisite building blocks for not only representing text but we saw today manipulating it even just uppercasing characters allows us to start mutating text well what does it mean to encrypt information well it's like our black box from last week you have some input you want some output the input we're going to start calling plain text the message you want to send from yourself to someone else ciphertext is the output that you want and so in between there there's going to be what we're going to call a cipher a cipher is an algorithm that encrypts or scrambles its input so as to produce output that a third party can't understand and hopefully that cipher that algorithm is a reversible process so that when you receive the scrambled ciphertext you can figure out what it was that the the person sent to you but the key to using cryptography pun intended is to also have a secret key so if you think back to grade school maybe you were flirting with someone in class and you sent them a note on a piece of paper well hopefully you didn't just say like i love you on the piece of paper and then pass it through all of your friends or let alone the teacher to to the ultimate recipient maybe you did something like an a becomes a b a b becomes a c a c becomes a d like you kind of apply an algorithm to like add one to all of the letters so that if the teacher does intercept it and look at it they probably don't have enough care in the world to figure out what this is it's just going to look like nonsense but if your friend knows that you changed a to b b to c by adding one to every letter they could reverse that process and decrypt it so the key for instance might be literally the number one the message literally might be i love you but what would the ciphertext be or the output well let's consider i love you is a string which as of today is an array of characters so what useful what use is that well let's consider exactly that phrase as though it's an array it's an array of characters we know from last week characters are just integers decimal integers thanks to ascii and in turn unicode so it turns out i we already know is 73 and if we looked up all the others on a chart l is 76 79 86 69 89 79 85 so we could relatively easily and see you might have to check your notes and check my sample code and so forth but relatively easily in c convert i love you to the corresponding integers by just casting so to speak chars to integers i could very easily mathematically using the plus operator in c start to add one to every one of these characters thereby encrypting my message but i could send my friend these numbers but i might as well make it a little more user friendly and cast it back from integers to chars so now it would seem that the ciphertext for i love you if using a key of one and one just means change a to b not a to c just move it by one place this is the ciphertext for an encrypted message of i love you and so the whole process becomes one is the input is the key i love you is the input as the plain text and the output ultimately is this unpronounceable phrase that again if the teacher or some friend intercepts they probably don't know what's going on and indeed this is the essence of cryptography the algorithms that you protect are emails and texts and financial information and health information is hopefully way more sophisticated than that particular algorithm and it is but it reduces to the same process an input key and an input text followed by some output the so-called ciphertext and this has been with us for decades now in some form sometimes even mechanical form back in the day you could actually get these little circular devices that have letters on the alphabet on one side other letters on the alphabet on the other side and if you rotate one or the other a might line up with b b might line up with c so you can have even a physical incarnation of cryptography just as was popular in a movie that seems to play endlessly on tv at least here in the u.s around christmas time and you might recognize if you've seen a christmas story one such look so we'll use just a couple of minutes of our final moments together to take a look at this real world incarnation of cryptography that undoubtedly you can probably see on tv this fall be it known to on sunday that ralph parker is hereby appointed a member of the little orphanage secret circle and is entitled to all the honors and benefits occurring there too signed little orphan annie counter-signed pierre andre in ink honors and benefits already at the age of nine [Music] come on let's get on with it i don't need all that jazz about smugglers and pirates listen tomorrow night for the concluding adventure of the black pirate ship now it's time for any secret message for you members of the secret circle remember kids only members of any secret circle can decode any secret message remember annie is depending on you set your pins to be too here is the message 12 11. i am in my first secret meeting 5 14 11 18 16. oh pierre was in great voice tonight i could tell that tonight's message was really important 3 25 that's a message from annie herself remember don't tell anyone [Music] 90 seconds later i'm in the only room in the house where a boy of nine could sit in privacy and decode aha b i went to the next e the first word is b s it was coming easier now uh [Music] oh be sure to be sure to what what was little orphanage trying to say be sure to watch randy have got to go will you please come out all right ma i'll be right out i was getting closer now the tension was terrible what was it the fate of the planet may hang in the balance [Music] sheet almost there my fingers flew my mind was a steel trap every pore vibrated it was almost clear yes yes yes yes be sure to drink your ovalt ovaldeen a crummy commercial [Music] son of a all right that's it for cs50 we will see you next time [Music] [Applause] [Music] [Music] [Applause] [Music] you
Info
Channel: CS50
Views: 245,453
Rating: 4.9751983 out of 5
Keywords: cs50, harvard, computer, science, david, j., malan
Id: tI_tIZFyKBw
Channel Id: undefined
Length: 144min 58sec (8698 seconds)
Published: Thu Dec 31 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.